python - 如何在 Python 中将 json str 转换为数据框
问题描述
更新 Json 示例:
{
"header":{"time_cost_ms":3.638,"time_cost":0.003638,"core_time_cost_ms":3.6,"ret_code":"succ"},
"norm_str":"Women's March finally replaces three original leaders after anti-Semitism accusations",
"lang":"en",
"word_list":[
{"str":"Women","hit":[0,5,0,1],"tag":"NNS"},
{"str":"'s","hit":[5,2,1,2],"tag":"POS"},
{"str":"March","hit":[8,5,3,1],"tag":"NNP"},
{"str":"finally","hit":[14,7,4,1],"tag":"RB"},
{"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
{"str":"three","hit":[31,5,6,1],"tag":"CD"},
{"str":"original","hit":[37,8,7,1],"tag":"JJ"},
{"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
{"str":"after","hit":[54,5,9,1],"tag":"IN"},
{"str":"anti","hit":[60,4,10,1],"tag":"NN"},
{"str":"-","hit":[64,1,11,1],"tag":"HYPH"},
{"str":"Semitism","hit":[65,8,12,1],"tag":"NNP"},
{"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"phrase_list":[
{"str":"Women's March","hit":[0,13,0,4],"tag":"NNP"},
{"str":"finally","hit":[14,7,4,1],"tag":"RB"},
{"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
{"str":"three","hit":[31,5,6,1],"tag":"CD"},
{"str":"original","hit":[37,8,7,1],"tag":"JJ"},
{"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
{"str":"after","hit":[54,5,9,1],"tag":"IN"},
{"str":"anti-Semitism","hit":[60,13,10,3],"tag":"NN"},
{"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"entity_list":[
{"str":"Women’s March","hit":[0,13,0,4],"type":{"name":"org.generic","i18n":"organization","path":"\/"},"meaning":{"related":["Black Lives Matter", "Planned Parenthood", "women's rights", "MoveOn", "indivisible", "activism", "Greenpeace", "Stand Up America", "feminism"]},"tag":"org.generic","tag_i18n":"organization"},
{"str":"three","hit":[31,5,6,1],"type":{"name":"quantity.generic","i18n":"quantity","path":"\/math.n_exp\/"},"meaning":{"value":[3]},"tag":"quantity.generic","tag_i18n":"quantity"}
],
"syntactic_parsing_str":"",
"srl_str":"",
"engine_version":"0.3.0"
}
有没有办法将数据转换为数据框?我想将结果与原始数据集合并。
还请帮助解决“字符串索引必须是整数”问题
解决方案
我会假设您由于标头键而从 API 收到了这个 json。
让我们先加载 json 文件:
with open(<json file path>, 'r') as json_file:
json_example = json.loads(json_file)
pd.json_normalize()
如果您提供示例,则可能无法按预期工作。您需要为其提供"phrase_list"
密钥的内容:
df = pd.json_normalize(json_example['phrase_list'])
结果:
| | str | hit | tag |
|---:|:--------------|:----------------|:------|
| 0 | Women's March | [0, 13, 0, 4] | NNP |
| 1 | finally | [14, 7, 4, 1] | RB |
| 2 | replaces | [22, 8, 5, 1] | VBZ |
| 3 | three | [31, 5, 6, 1] | CD |
| 4 | original | [37, 8, 7, 1] | JJ |
| 5 | leaders | [46, 7, 8, 1] | NNS |
| 6 | after | [54, 5, 9, 1] | IN |
| 7 | anti-Semitism | [60, 13, 10, 3] | NN |
| 8 | accusations | [74, 11, 13, 1] | NNS |
然后你可以分解hit
列以获得一个干净的表:
df = df.explode("hit")
结果:
| | str | hit | tag |
|---:|:--------------|------:|:------|
| 0 | Women's March | 0 | NNP |
| 0 | Women's March | 13 | NNP |
| 0 | Women's March | 0 | NNP |
| 0 | Women's March | 4 | NNP |
| 1 | finally | 14 | RB |
| 1 | finally | 7 | RB |
| 1 | finally | 4 | RB |
| 1 | finally | 1 | RB |
| 2 | replaces | 22 | VBZ |
| 2 | replaces | 8 | VBZ |
| 2 | replaces | 5 | VBZ |
| 2 | replaces | 1 | VBZ |
| 3 | three | 31 | CD |
.
.
.
推荐阅读
- javascript - 如何让 Promise.all 按预期执行我的承诺?
- java - 组合两个具有不同异常类型的函数(java泛型)
- php - Laravel 5.4 Image Intervention 在 S3 上上传 0 字节的图像
- r - reshape 或 dcast long to wide 没有 value.var 2 列
- java - 使用 JNI 从 C++ 调用方法时,CallStaticObjectMethod 始终返回 null
- strftime - “strftime”中“strf”的含义
- javascript - 如何在滑块中的图像之间转换
- sql - microsoft access 上的 INNER JOIN 中的语法错误
- css - Highcharts 饼图标签使用样式模式勾勒出轮廓缺失
- javascript - Puppeteer 元素点击坐标片状