首页 > 解决方案 > 如何将数据框写入具有带引号的列的数据库中?

问题描述

用模式说 -

'struct<address:string,"name":string>'

一列名称说"name"是带引号。现在在编写 df 它给出:

name expected at the position ..  but '"' is found.

使用以下示例代码:

df
.write
.format(format)
.options(options)
........

标签: sqldatabasescaladataframeapache-spark

解决方案


我的回答如下。我们只是用转义字符重命名列。您应该在写入数据库之前执行此操作,即 df.write 语句。

from pyspark.sql.functions import *
from pyspark.sql.types import *

values = [("112 Street, Pune","Stacky"),
          ("220 Street, Mumbai","John")]
rdd = sc.parallelize(values)
schema = StructType([StructField("address", StringType(), True),                             
StructField("name", StringType(), True)])

data = spark.createDataFrame(rdd, schema)
data.show(20,False)

+------------------+------+
|address           |name  |
+------------------+------+
|112 Street, Pune  |Stacky|
|220 Street, Mumbai|John  |
+------------------+------+

data = data.withColumnRenamed("name","\"name\"")
data.show()

+------------------+------+
|           address|"name"|
+------------------+------+
|112 Street, Pune  |Stacky|
|220 Street, Mumbai|  John|
+------------------+------+

推荐阅读