python - 从 JSON 文件中删除“”,这样它们就不会中断字符串
问题描述
我有一个巨大的 JSON 文件,看起来像这样(我复制粘贴了很多,但错误出现在此示例的开头):
{
"data":[
{
"title":"Title1",
"paragraphs":[
{
"context":"In this text, one of the "words" is between quotation marks",
"qas":[
{
"answers":[
{
"answer_start":515,
"text":"String text"
}
],
"question": "Why something something?",
"id":"5733be284776f41900661182"
},
{
"answers":[
{
"answer_start":505,
"text":"String something text"
}
],
"question": "Why?",
"id":"5733be284776f4190066345"
}
]
},
{
"context":"Context2",
"qas":[
{
"answers":[
{
"answer_start":515,
"text":"String text"
}
],
"question": "Why something something?",
"id":"5733be284776f41900661182"
},
{
"answers":[
{
"answer_start":505,
"text":"String something text"
}
],
"question": "Why?",
"id":"5733be284776f4190066345"
}
]
}
]
},
{
"title":"Title2",
"paragraphs":[
{
"context":"Context10",
"qas":[
{
"answers":[
{
"answer_start":585,
"text":"String text"
}
],
"question": "Why something something?",
"id":"5733be284776f41900661682"
},
{
"answers":[
{
"answer_start":545,
"text":"String something text"
}
],
"question": "Why?",
"id":"5733be284776f41900663"
}
]
},
{
"context":"Context7",
"qas":[
{
"answers":[
{
"answer_start":525,
"text":"String text"
}
],
"question": "Why something something?",
"id":"5733be284776f41982"
},
{
"answers":[
{
"answer_start":595,
"text":"String something text"
}
],
"question": "Why?",
"id":"5733be284776f419005"
}
]
}
]
}
],
"version":"1.1"
}
当我在 Python 中处理这个文件时(我想改变它的结构),字符串中的引号会破坏字符串,所以它给了我一个错误。我在 Python 中尝试过replace
,但这是有问题的,因为我不希望""
分隔字符串消失。我也不能手动删除它们,因为文件很大。
这是更改结构的代码,但我想这是每个 JSON 文件的问题:
import json
with open('file.json', 'r') as fh:
data = json.load(fh)
result = []
for article in data["data"]:
for paragraph in article["paragraphs"]:
for qa in paragraph["qas"]:
answers = {"text": [answer["text"] for answer in qa["answers"]]}
result.append({
"id": qa["id"],
"context": paragraph["context"],
"question": qa["question"],
"answers": answers
})
with open('output.json', 'w') as fh:
json.dump(result, fh)
解决方案
推荐阅读
- javascript - Node JS 处理器架构
- android - 设置json在线图片为壁纸
- c# - C# PowerModeChanged 并不总是触发
- swagger-ui - 为 c# 对象上的属性名称关闭 CamelCase
- sql - SQL Server 中 IF 内的 ALTER FUNCTION
- c - 将 main 中的数组转换为全局,稍后由 main 更改?
- html - 我对我的搜索栏图标定位感到困惑
- javascript - 结果出现时的显示顺序
- python - 合并从coverage.py生成的两个不同框架的html覆盖率报告
- sql - SELECT * FROM 的小型数据库表 (Oracle) 的执行时间仍然很长