首页 > 解决方案 > 使用 pyspark 读取复杂的 json 模式

问题描述

我正在将 json 文档读入数据框。但是,它的格式很复杂。我能够使用explode 函数来获取值。

root
 |-- Name: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Adap: string (nullable = true)
 |    |    |-- Vid: string (nullable = true)
 |-- Information: struct (nullable = true)
 |    |-- Caption: string (nullable = true)
 |    |-- No: string (nullable = true)
 |-- License: struct (nullable = true)
 |    |-- Out: struct (nullable = true)
 |    |    |-- ID: string (nullable = true)
 |    |-- In: struct (nullable = true)
 |    |    |-- INS: string (nullable = true)

虽然 Json 很大,我不想手动编写所有内容。我所做的方式适用于所有值:

mdmDF.withColumn("Name", explode("Name")).select(col("Name")["Adap"].alias("Name.Adap")))

其次,我尝试了:虽然它每列只给我一个数据框。!

Name= mdmDF.selectExpr("explode(Name) AS Name").selectExpr("Name.*")

+------------------+----------+
|    Name          |      Adap|      
+------------------+----------+
|NVIDIA            |         0|
+------------------+----------+

我想要的是:

+------------------+----------+----------+----------+----------+----------+
|    adap          |      vid |  Caption |     no   |     Out  |    In    |
+------------------+----------+----------+----------+----------+----------+
|NVIDIA            |         0|      test|      1   |     etx  |    val   |
+------------------+----------+----------+----------+----------+----------+

标签: pysparkpyspark-sql

解决方案


推荐阅读