python - 如何从pyspark中的字符串中删除特定字符？

首页 > 解决方案 > 如何从pyspark中的字符串中删除特定字符？

问题描述

我正在尝试从字符串中删除特定字符，但无法获得任何适当的解决方案。你能帮我怎么做吗？

我正在使用 pyspark 将数据加载到数据框中。具有我要删除的额外字符的列之一。

例子：

|"\""warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"|

但结果我只想要：

|"warfarin was discontinued 3 days ago and xarelto was started when the INR was 2.7, and now the INR is 5.8, should Xarelto be continued or stopped?"|

我正在使用以下代码将数据帧写入文件：

df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').save(output_path, escape='\"', sep='|',header='True',nullValue=None)

标签： pythonpandasdataframepyspark

解决方案

您可以使用其他一些转义字符而不是“\”，您可以将其更改为其他任何字符。如果您可以选择将文件保存为任何其他格式，则首选 parquet（或 orc）而不是 csv。

python - 如何从pyspark中的字符串中删除特定字符？

问题描述

解决方案

推荐阅读