首页 > 解决方案 > 如何在 Python 中将 json str 转换为数据框

问题描述

更新 Json 示例:

{
"header":{"time_cost_ms":3.638,"time_cost":0.003638,"core_time_cost_ms":3.6,"ret_code":"succ"},
"norm_str":"Women's March finally replaces three original leaders after anti-Semitism accusations",
"lang":"en",
"word_list":[
    {"str":"Women","hit":[0,5,0,1],"tag":"NNS"},
    {"str":"'s","hit":[5,2,1,2],"tag":"POS"},
    {"str":"March","hit":[8,5,3,1],"tag":"NNP"},
    {"str":"finally","hit":[14,7,4,1],"tag":"RB"},
    {"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
    {"str":"three","hit":[31,5,6,1],"tag":"CD"},
    {"str":"original","hit":[37,8,7,1],"tag":"JJ"},
    {"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
    {"str":"after","hit":[54,5,9,1],"tag":"IN"},
    {"str":"anti","hit":[60,4,10,1],"tag":"NN"},
    {"str":"-","hit":[64,1,11,1],"tag":"HYPH"},
    {"str":"Semitism","hit":[65,8,12,1],"tag":"NNP"},
    {"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"phrase_list":[
    {"str":"Women's March","hit":[0,13,0,4],"tag":"NNP"},
    {"str":"finally","hit":[14,7,4,1],"tag":"RB"},
    {"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
    {"str":"three","hit":[31,5,6,1],"tag":"CD"},
    {"str":"original","hit":[37,8,7,1],"tag":"JJ"},
    {"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
    {"str":"after","hit":[54,5,9,1],"tag":"IN"},
    {"str":"anti-Semitism","hit":[60,13,10,3],"tag":"NN"},
    {"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"entity_list":[
    {"str":"Women’s March","hit":[0,13,0,4],"type":{"name":"org.generic","i18n":"organization","path":"\/"},"meaning":{"related":["Black Lives Matter", "Planned Parenthood", "women's rights", "MoveOn", "indivisible", "activism", "Greenpeace", "Stand Up America", "feminism"]},"tag":"org.generic","tag_i18n":"organization"},
    {"str":"three","hit":[31,5,6,1],"type":{"name":"quantity.generic","i18n":"quantity","path":"\/math.n_exp\/"},"meaning":{"value":[3]},"tag":"quantity.generic","tag_i18n":"quantity"}
],
"syntactic_parsing_str":"",
"srl_str":"",
"engine_version":"0.3.0"

}

有没有办法将数据转换为数据框?我想将结果与原始数据集合并。

还请帮助解决“字符串索引必须是整数”问题

标签: pythonjsonpandas

解决方案


我会假设您由于标头键而从 API 收到了这个 json。

让我们先加载 json 文件:

with open(<json file path>, 'r') as json_file:
    json_example = json.loads(json_file)

pd.json_normalize()如果您提供示例,则可能无法按预期工作。您需要为其提供"phrase_list"密钥的内容:

df = pd.json_normalize(json_example['phrase_list'])

结果:

|    | str           | hit             | tag   |
|---:|:--------------|:----------------|:------|
|  0 | Women's March | [0, 13, 0, 4]   | NNP   |
|  1 | finally       | [14, 7, 4, 1]   | RB    |
|  2 | replaces      | [22, 8, 5, 1]   | VBZ   |
|  3 | three         | [31, 5, 6, 1]   | CD    |
|  4 | original      | [37, 8, 7, 1]   | JJ    |
|  5 | leaders       | [46, 7, 8, 1]   | NNS   |
|  6 | after         | [54, 5, 9, 1]   | IN    |
|  7 | anti-Semitism | [60, 13, 10, 3] | NN    |
|  8 | accusations   | [74, 11, 13, 1] | NNS   |

然后你可以分解hit列以获得一个干净的表:

df = df.explode("hit")

结果:

|    | str           |   hit | tag   |
|---:|:--------------|------:|:------|
|  0 | Women's March |     0 | NNP   |
|  0 | Women's March |    13 | NNP   |
|  0 | Women's March |     0 | NNP   |
|  0 | Women's March |     4 | NNP   |
|  1 | finally       |    14 | RB    |
|  1 | finally       |     7 | RB    |
|  1 | finally       |     4 | RB    |
|  1 | finally       |     1 | RB    |
|  2 | replaces      |    22 | VBZ   |
|  2 | replaces      |     8 | VBZ   |
|  2 | replaces      |     5 | VBZ   |
|  2 | replaces      |     1 | VBZ   |
|  3 | three         |    31 | CD    |
.
.
.

推荐阅读