首页 > 解决方案 > 从嵌套的 json 文件中提取数据

问题描述

这是数据的模式,想要从中提取“来自”。尝试使用 df3 =df.select(df.transcript.data.from.alias("Type")) 并获得无效的语法错误。

这个怎么提取。

root
 |-- contactId: long (nullable = true)
 |-- mediaLegId: string (nullable = true)
 |-- transcript: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- action: string (nullable = true)
 |    |    |-- data: struct (nullable = true)
 |    |    |    |-- chatId: string (nullable = true)
 |    |    |    |-- customerInfo: struct (nullable = true)
 |    |    |    |    |-- customerIdentifierToken: string (nullable = true)
 |    |    |    |    |-- customerIdentifierType: string (nullable = true)
 |    |    |    |    |-- customerName: string (nullable = true)
 |    |    |    |    |-- initialQuestion: string (nullable = true)
 |    |    |    |-- entryPoint: string (nullable = true)
 |    |    |    |-- from: string (nullable = true)
 |    |    |    |-- lang: string (nullable = true)
 |    |    |    |-- parkDuration: long (nullable = true)
 |    |    |    |-- parkNote: string (nullable = true)
 |    |    |    |-- participant: struct (nullable = true)
 |    |    |    |    |-- disconnectReason: string (nullable = true)
 |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |-- participantId: string (nullable = true)
 |    |    |    |    |-- preferences: struct (nullable = true)
 |    |    |    |    |    |-- language: string (nullable = true)
 |    |    |    |    |-- state: string (nullable = true)
 |    |    |    |    |-- userName: string (nullable = true)
 |    |    |    |-- reconnected: boolean (nullable = true)
 |    |    |    |-- relatedData: string (nullable = true)
 |    |    |    |-- text: string (nullable = true)
 |    |    |    |-- timestamp: long (nullable = true)
 |    |    |    |-- transcriptText: string (nullable = true)
 |    |    |    |-- transferNote: string (nullable = true)

| | | |-- 转录文本:字符串(可为空=真)| | | |-- 转帐注:字符串(可为空=真)

标签: pysparkapache-spark-sqlpyspark-sqlpyspark-dataframes

解决方案


尝试像这样使用它

from pyspark.sql import functions as F

df.select(F.explode("transcript").alias('transcript')).select('transcript.*').select("data.*").select("from").show()

推荐阅读