apache-spark - Pyspark:遍历多行嵌套 json 以构建数据框
问题描述
伙计们,我需要一些帮助来遍历 pyspark 中的以下 json ......并构建一个数据框:
{
"success": true,
"result": {
"0x00e01a648ff41346cdeb873182383333d2184dd1": {
"id": 130,
"name": "xn--mytherwallet-fvb.com",
"url": "http://xn--mytherwallet-fvb.com",
"coin": "ETH",
"category": "Phishing",
"subcategory": "MyEtherWallet",
"description": "Homoglyph",
"addresses": [
"0x00e01a648ff41346cdeb873182383333d2184dd1",
"0x11e01a648ff41346cdeb873182383333d2184dd1"
],
"reporter": "MyCrypto",
"status": "Offline"
},
"0x858457daa7e087ad74cdeeceab8419079bc2ca03": {
"id": 1200,
"name": "myetherwallet.in",
"url": "http://myetherwallet.in",
"coin": "ETH",
"category": "Phishing",
"subcategory": "MyEtherWallet",
"addresses": ["0x858457daa7e087ad74cdeeceab8419079bc2ca03"],
"reporter": "MyCrypto",
"ip": "159.8.210.35",
"nameservers": [
"ns2.eftydns.com",
"ns1.eftydns.com"
],
"status": "Active"
}
}
}
我需要构建一个代表地址列表的数据框。
解决方案
我将您的 JSON 格式化为SPARK-Readable格式。
{"success": true, "result": {"0x00e01a648ff41346cdeb873182383333d2184dd1": {"id": 130, "name": "xn--mytherwallet-fvb.com", "url": "http://xn--mytherwallet-fvb.com", "coin": "ETH", "category": "Phishing", "subcategory": "MyEtherWallet", "description": "Homoglyph", "addresses": ["0x00e01a648ff41346cdeb873182383333d2184dd1", "0x11e01a648ff41346cdeb873182383333d2184dd1"], "reporter": "MyCrypto", "status": "Offline"}, "0x858457daa7e087ad74cdeeceab8419079bc2ca03": {"id": 1200, "name": "myetherwallet.in", "url": "http://myetherwallet.in", "coin": "ETH", "category": "Phishing", "subcategory": "MyEtherWallet", "addresses": ["0x858457daa7e087ad74cdeeceab8419079bc2ca03"], "reporter": "MyCrypto", "ip": "159.8.210.35", "nameservers": ["ns2.eftydns.com", "ns1.eftydns.com"], "status": "Active"}}}
阅读 JSON
val df = spark.read.json("/my_data.json")
df.printSchema()
df.show(false)
输出
root
|-- result: struct (nullable = true)
| |-- 0x00e01a648ff41346cdeb873182383333d2184dd1: struct (nullable = true)
| | |-- addresses: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- category: string (nullable = true)
| | |-- coin: string (nullable = true)
| | |-- description: string (nullable = true)
| | |-- id: long (nullable = true)
| | |-- name: string (nullable = true)
| | |-- reporter: string (nullable = true)
| | |-- status: string (nullable = true)
| | |-- subcategory: string (nullable = true)
| | |-- url: string (nullable = true)
| |-- 0x858457daa7e087ad74cdeeceab8419079bc2ca03: struct (nullable = true)
| | |-- addresses: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- category: string (nullable = true)
| | |-- coin: string (nullable = true)
| | |-- id: long (nullable = true)
| | |-- ip: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- nameservers: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- reporter: string (nullable = true)
| | |-- status: string (nullable = true)
| | |-- subcategory: string (nullable = true)
| | |-- url: string (nullable = true)
|-- success: boolean (nullable = true)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|result |success|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|[[WrappedArray(0x00e01a648ff41346cdeb873182383333d2184dd1, 0x11e01a648ff41346cdeb873182383333d2184dd1),Phishing,ETH,Homoglyph,130,xn--mytherwallet-fvb.com,MyCrypto,Offline,MyEtherWallet,http://xn--mytherwallet-fvb.com],[WrappedArray(0x858457daa7e087ad74cdeeceab8419079bc2ca03),Phishing,ETH,1200,159.8.210.35,myetherwallet.in,WrappedArray(ns2.eftydns.com, ns1.eftydns.com),MyCrypto,Active,MyEtherWallet,http://myetherwallet.in]]|true |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
推荐阅读
- xamarin.forms - 在 xamarin.forms 中为 android 自定义标签页
- python - Python/API:处理服务器不返回的错误
- android - 为 Google 登录自定义新的 Android MaterialButton
- prolog - prolog - 生成整数“unwindably”
- php - 根据季节更改着陆点 (PHP)
- vba - Microsoft Access VBA:找不到项目或库
- javascript - 如何在Angular表单中连续设置两个输入字段
- java - 在netty中是否可以在同一个tcp连接中获得多个请求?
- java - 循环和流——在这种情况下怎么做?
- c++ - c++类动态分配自身实例