首页 > 解决方案 > 将嵌套字典(json 文件)转换为数据框

问题描述

我有以下 json 文件 -

{
    "quiz": {
       "sport": { "q1": {
                "question": "Which one is correct team name in NBA?",
                "options": [
                    "New York Bulls",
                    "Los Angeles Kings",
                    "Golden State Warriros",
                    "Huston Rocket"
                ],
                "answer": "Huston Rocket"
            }
        },
        "maths": {
            "q1": {
                "question": "5 + 7 = ?",
                "options": [
                    "10",
                    "11",
                    "12",
                    "13"
                ],
                "answer": "12",
                "test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
            },
            "q2": {
                "question": "12 - 8 = ?",
                "options": [
                    "1",
                    "2",
                    "3",
                    "4"
                ],
                "answer": "4"
            }
        }
    },
    "summary": "good example",
    "viewer rating": 6
}

我想将其转换为 DataFrame。像这样的东西-

quiz   q1   q2   question     options  answer   test_dict  summary       viewer rating
sport  q1   NaN  Which one..  [list]   Huston.. NaN        good example  6
maths  q1   NaN  5 + 7 = ?    [list]   12       {"a":1..   good example  6
maths  NaN  q2   12 - 8 = ?   [list]   4        NaN        good example  6

我尝试使用

file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])

但我收到以下错误 -

TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.

问题是,它不是列表,而是字典本身。所以,我也试过这样做 -

pd.json_normalize(data, max_level=2)

但是,我没有得到预期的输出。我只是得到一排。有人可以给我一些指示吗?

标签: pythonjsonpandasdataframedictionary

解决方案


您可以使用列表推导:

import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 + 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
          for a, b in d['quiz'].items() for q, v in b.items()]

df = pd.DataFrame(r)

输出:

    quiz   q1                                question  ... viewer rating                                   test_dict   q2
0  sport   q1  Which one is correct team name in NBA?  ...             6                                         NaN  NaN
1  maths   q1                               5 + 7 = ?  ...             6  {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}  NaN
2  maths  NaN                              12 - 8 = ?  ...             6                                         NaN   q2

推荐阅读