首页 > 解决方案 > 将单引号添加到 dataFrame 列值

问题描述

DataFrame持有一列QUALIFY,其值如下。

QUALIFY
=================
ColA|ColB|ColC
ColA
ColZ|ColP

此列中的值由 分割"|"。我希望此列中的值类似于'ColA','ColB','ColC' ...

使用下面的代码,我可以替换|,',. 如何在值的开头和结尾添加单引号?

newDf = df_qualify.withColumn('QUALIFY2', regexp_replace('QUALIFY', "\\|", "\\','"))

标签: dataframeapache-sparkpysparkdatabricks

解决方案


拆分列|,然后将结果数组连接回字符串:

import pyspark.sql.functions as F
import pyspark.sql.types as T

def str_list(x):
    return str(x).replace("[", "").replace("]", "")

str_udf = F.udf(str_list, T.StringType())

df = df.withColumn("arr_split", F.split(F.col("QUALIFY"), "\|")) # escape character
df = df.withColumn("QUALIFY2", str_udf(F.col("arr_split")))

我的示例输出框架:

df.drop("arr_split").show() # Please ignore a and b columns
+---+---+--------------+--------------------+
|  a|  b|           abc|            QUALIFY2|
+---+---+--------------+--------------------+
|  1|  1|col1|col2|col3|'col1', 'col2', '...|
|  2|  2|col1|col2|col3|'col1', 'col2', '...|
|  3|  3|col1|col2|col3|'col1', 'col2', '...|
|  4|  4|col1|col2|col3|'col1', 'col2', '...|
|  5|  5|col1|col2|col3|'col1', 'col2', '...|
+---+---+--------------+--------------------+

推荐阅读