python - 使用指定列级别在 pyspark 中展开嵌套的 json 文件
问题描述
我创建了一个使用 pypsark 分解嵌套 json 文件的代码,但我希望他设置每一列的级别
|-- City: string (nullable = true)
|-- RecordNumber: integer (nullable = true)
|-- State: string (nullable = true)
|-- ZipCodeType: string (nullable = true)
|-- Zipcode: integer (nullable = true)
|-- Adress: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- House: string (nullable = true)
| | |-- Street: string (nullable = true)
| | |-- Appartement: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- number: string (nullable = true)
| | | | |-- level: string (nullable = true)
这段代码爆炸了我的数据框
#Display json file
df=df.select("City", "State","RecordNumber","ZipCodeType","Zipcode", psf.explode("Adress").alias("Adress"))\
.select("City","State" ,"RecordNumber","ZipCodeType","Zipcode" , "Adress.*")\
.select("City","State" ,"RecordNumber","ZipCodeType","Zipcode" , "House", "Street", psf.explode("Appartement").alias("Appartement"))\
.select("City","State" ,"RecordNumber","ZipCodeType","Zipcode" , "House", "Street", "Appartement.*")
df.show()
这是之后的架构
root
|-- City: string (nullable = true)
|-- State: string (nullable = true)
|-- RecordNumber: integer (nullable = true)
|-- ZipCodeType: string (nullable = true)
|-- Zipcode: integer (nullable = true)
|-- House: string (nullable = true)
|-- Street: string (nullable = true)
|-- number: string (nullable = true)
|-- level: string (nullable = true)
我想更新此代码以显示例如 House 作为 Adress.House , Adress.Street , Adress.Appartement.number , Adress.Appartement.level 就像我只是更改架构的名称
解决方案
推荐阅读
- android - android studio 没有启动 java.lang.NoSuchMethodError
- r - 将 43435.010567129626 转换为 r 中的时间戳
- image - 有没有办法调整批量图像的大小?
- python - 命令引发异常:RuntimeError: asyncio.run() 无法从正在运行的事件循环中调用
- arrays - Powershell中的数组和For循环
- r - 如何从长度不等的列表中 cbind 元素?
- json - 使用转义序列解析 JSON 字符串
- while-loop - 关于结束while循环的说明
- java - Java将字符串拆分为列表
- ios - 保存过滤的 FetchedResults 时 CoreData 崩溃