python - 将嵌套字典(json 文件)转换为数据框
问题描述
我有以下 json 文件 -
{
"quiz": {
"sport": { "q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12",
"test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
},
"summary": "good example",
"viewer rating": 6
}
我想将其转换为 DataFrame。像这样的东西-
quiz q1 q2 question options answer test_dict summary viewer rating
sport q1 NaN Which one.. [list] Huston.. NaN good example 6
maths q1 NaN 5 + 7 = ? [list] 12 {"a":1.. good example 6
maths NaN q2 12 - 8 = ? [list] 4 NaN good example 6
我尝试使用
file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])
但我收到以下错误 -
TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.
问题是,它不是列表,而是字典本身。所以,我也试过这样做 -
pd.json_normalize(data, max_level=2)
但是,我没有得到预期的输出。我只是得到一排。有人可以给我一些指示吗?
解决方案
您可以使用列表推导:
import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
for a, b in d['quiz'].items() for q, v in b.items()]
df = pd.DataFrame(r)
输出:
quiz q1 question ... viewer rating test_dict q2
0 sport q1 Which one is correct team name in NBA? ... 6 NaN NaN
1 maths q1 5 + 7 = ? ... 6 {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}} NaN
2 maths NaN 12 - 8 = ? ... 6 NaN q2
推荐阅读
- prometheus - 未使用 Grafana 中选择的“全部”过滤器更新链式变量
- json - 使用 Angular 将 JSON 对象中的嵌套 JSON 对象作为字符串传递
- charts - 如何在 tradingview 小部件图表中默认合并指标
- solver - 中止:无法为 `float_times(X_INTRODUCED_44_, X_INTRODUCED_45_, X_INTRODUCED_46_)` 约束创建线性公式
- android - 注解处理器如何获取gradle扩展?
- css - 我想在我的页面中连续有 3 个 div,我尝试了 flex、grid、inline,但我无法这样做
- react-hooks - 使用 React Hooks 显示组件渲染时间和日期
- carla - 如何获取视野内车辆的信息
- android - 在预定义的 XY 坐标处设置 ImageView 和 TextView - Xamarin Android
- javascript - javascript中变量更改时如何运行代码