python - JSON 到 Pandas DF
问题描述
我有一个来自 Azure 防火墙的数据集(防火墙日志),我以 JSON 格式存储在 Blob 存储中。JSON 如下所示。
{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1551130Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"TCP request from 172.16.1.218:54652 to 172.17.1.219:8080. Action: Allow"}}
{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1268490Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"UDP request from 172.16.1.218:53067 to 8.8.8.8:53. Action: Allow"}}
一天有几百万行要通过再次对源 IP 分组允许或拒绝的端口,所以我认为使用 JN 分析这些数据是可行的。
问题:
我尝试了下面的代码,但在尝试展平我想要的“msg”的“属性”时遇到了问题。
import json
import pandas as pd
# load data using Python JSON module
with open('FWLog/FWLog2.json','r') as f:
data = json.loads(f.read())
# Flatten data
df_nested_list = pd.json_normalize(data, record_path =['properties'])
错误:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-61-3500c0d62d55> in <module>
7 # load data using Python JSON module
8 with open('FWLog/FWLog2.json','r') as f:
----> 9 data = json.loads(f.read())
10 # Flatten data
11 df_nested_list = pd.json_normalize(data, record_path =['properties'])
~\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:
359 cls = JSONDecoder
~\anaconda3\lib\json\decoder.py in decode(self, s, _w)
338 end = _w(s, end).end()
339 if end != len(s):
--> 340 raise JSONDecodeError("Extra data", s, end)
341 return obj
342
JSONDecodeError: Extra data: line 2 column 1 (char 386)
解决方案
您可以lines=True
使用pd.read_json
:
df = pd.read_json("your_file.txt", lines=True)
df_final = pd.concat([pd.DataFrame(df.pop("properties").to_list()), df], axis=1)
print(df_final)
印刷:
msg category time resourceId operationName
0 TCP request from 172.16.1.218:54652 to 172.17.... AzureFirewallNetworkRule 2021-01-31T00:00:00.1551130Z /SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER... AzureFirewallNetworkRuleLog
1 UDP request from 172.16.1.218:53067 to 8.8.8.8... AzureFirewallNetworkRule 2021-01-31T00:00:00.1268490Z /SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER... AzureFirewallNetworkRuleLog
推荐阅读
- asp.net-core - 使用 Entity Framework Core 的 Blazor 并发问题
- android - 无法启动活动 ComponentInfo {className} java.lang.IllegalStateException:未找到颜色!(Android JetPack 撰写)
- amazon-web-services - 如何在 AWS Lambda 中使用 tensorflow 2.0?
- php - Laravel 保持缓存但已清除
- linux - Linux 在 sudo xclip -b 上获得权限被拒绝
我想复制文件中的所有数据(存储在 中
var/lib/docker/volumes/...
),但总是收到错误“ Permission Denied ”。我曾经xclip -b <filepath
复制文件。操作系统 Ubuntu 18 服务器版。 - java - Java 13+ SocketOption IP_TOS (setTrafficClass)
- amcharts - Amchart 4:当 value = 0 时删除网格线
- react-native - 即使应用程序打开或关闭,如何在“react-native”应用程序中阅读短信?
- c++ - Visual Studio 断点:中断时箭头指向的位置
- python - 如何使用 CFLAGS 和 CXXFLAGS 使用“pipenv install”安装 gdal