pyspark - 从嵌套的 json 文件中提取数据
问题描述
这是数据的模式,想要从中提取“来自”。尝试使用 df3 =df.select(df.transcript.data.from.alias("Type")) 并获得无效的语法错误。
这个怎么提取。
root
|-- contactId: long (nullable = true)
|-- mediaLegId: string (nullable = true)
|-- transcript: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- action: string (nullable = true)
| | |-- data: struct (nullable = true)
| | | |-- chatId: string (nullable = true)
| | | |-- customerInfo: struct (nullable = true)
| | | | |-- customerIdentifierToken: string (nullable = true)
| | | | |-- customerIdentifierType: string (nullable = true)
| | | | |-- customerName: string (nullable = true)
| | | | |-- initialQuestion: string (nullable = true)
| | | |-- entryPoint: string (nullable = true)
| | | |-- from: string (nullable = true)
| | | |-- lang: string (nullable = true)
| | | |-- parkDuration: long (nullable = true)
| | | |-- parkNote: string (nullable = true)
| | | |-- participant: struct (nullable = true)
| | | | |-- disconnectReason: string (nullable = true)
| | | | |-- displayName: string (nullable = true)
| | | | |-- participantId: string (nullable = true)
| | | | |-- preferences: struct (nullable = true)
| | | | | |-- language: string (nullable = true)
| | | | |-- state: string (nullable = true)
| | | | |-- userName: string (nullable = true)
| | | |-- reconnected: boolean (nullable = true)
| | | |-- relatedData: string (nullable = true)
| | | |-- text: string (nullable = true)
| | | |-- timestamp: long (nullable = true)
| | | |-- transcriptText: string (nullable = true)
| | | |-- transferNote: string (nullable = true)
| | | |-- 转录文本:字符串(可为空=真)| | | |-- 转帐注:字符串(可为空=真)
解决方案
尝试像这样使用它
from pyspark.sql import functions as F
df.select(F.explode("transcript").alias('transcript')).select('transcript.*').select("data.*").select("from").show()
推荐阅读
- javascript - 当我尝试使用 .keys() 时,我从 nodemon 收到“Joi.object.keys 不是函数”错误消息
- javascript - 如何使 ac-pushbutton(英雄卡的按钮)在 webchat botframework 中动态选择/单击时消失
- session - AbpSession - Nullable 对象必须有一个值
- mysql - 向用户授予远程访问权限
- snowflake-cloud-data-platform - 需要有关雪花优化器的信息
- python - 拼音 Levenshtein 距离的字符串子类
- python - 在python中安装cmake后dlib安装导致错误
- spring - 将@Cacheable 或@Cache 放在实体上方是否足以让Hibernate 开始缓存?
- css - 方形渐变的多个实例
- python - Python中按值调用和按引用调用的查询