首页 > 解决方案 > PySpark sql将具有json列的dataFrame写入mysql JDBC

问题描述

我正在尝试将 DataFrame 批量写入 mysql JDBC 数据库。我正在使用 databricks/pyspark.sql 将 DataFrames 写入表。该表有一列接受 json 数据(二进制数据)。我确实将 json 对象转换为具有以下结构的 StructType:

json对象结构和转换为数据框:

schema_dict = {'fields': [
    {'metadata': {}, 'name': 'dict', 'nullable': True, 'type': {"containsNull": True, "elementType":{'fields': [
      {'metadata': {}, 'name': 'y1', 'nullable': True, 'type': 'integer'},
      {'metadata': {}, 'name': 'y2', 'nullable': True, 'type': 'integer'}
    ],"type": 'struct'}, "type": 'array'}}
], 'type': 'struct'}

cSchema = StructType([StructField("x1", IntegerType()),StructField("x2", IntegerType()),StructField("x3", IntegerType()),StructField("x4", TimestampType()), StructField("x5", IntegerType()), StructField("x6", IntegerType()),
                     StructField("x7", IntegerType()), StructField("x8", TimestampType()), StructField("x9", IntegerType()), StructField("x10", StructType.fromJson(schema_dict))])
df = spark.createDataFrame(parsedList,schema=cSchema)

输出数据框:

df:pyspark.sql.dataframe.DataFrame
x1:integer
x2:integer
x3:integer
x4:timestamp
x5:integer
x6:integer
x7:integer
x8:timestamp
x9:integer
x10:struct
    dict:array
        element:struct
              y1:integer
              y2:integer

现在我正在尝试使用 mysql 表将此数据帧写入 mysql 表。

import urllib 
from pyspark.sql import SQLContext
from pyspark.sql.functions import regexp_replace, col
sqlContext = SQLContext(sc)
sqlContext

driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql://dburl?rewriteBatchedStatements=true"
trial = "dbname.tablename"
user = "dbuser"
password = "dbpassword"
properties = {
    "user": user,
    "password": password,
    "driver": driver
}
df.write.jdbc(url=url, table=trial, mode="append", properties = properties)

我收到此错误:

An error occurred while calling o2118.jdbc.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 176.0 failed 4 times, most recent failure: Lost task 15.3 in stage 176.0 (TID 9528, 10.168.231.82, executor 5): java.lang.IllegalArgumentException: Can't get JDBC type for struct<dict:array<struct<y1:int,y2:int>>>

关于如何将具有 json 列的数据框写入 mysql 表的任何想法?或者如何解决这个问题?

我正在使用 Databricks 5.5 LTS(包括 Apache Spark 2.4.3、Scala 2.11)

标签: pysparkpyspark-sqldatabrickspyspark-dataframes

解决方案


推荐阅读