python - 如何根据键的值拆分大型json文件?
问题描述
我有一个大的 json 文件,我想根据键“元数据”进行拆分。记录的一个例子是
{"text": "The primary outcome of the study was hospital mortality; secondary outcomes included ICU mortality and lengths of stay for hospital and ICU. ICU mortality was defined as survival of a patient at ultimate discharge from the ICU and hospital mortality was defined as survival at discharge or transfer from our hospital.", "label": "conclusion", "metadata": "18982114"}
json文件中有很多记录,其中键“元数据”为“18982114”。如何提取所有这些记录并将它们存储到单独的 json 文件中?理想情况下,我正在寻找一种不包括加载和循环文件的解决方案,否则每次查询时都会非常麻烦。我认为使用 shell 命令可能是可行的,但不幸的是我不是 shell 命令方面的专家......所以我非常感谢非循环快速查询解决方案,谢谢!
==================================================== =========================
以下是文件的一些示例(包含 5 条记录):
{"text": "Finally, after an emergency laparotomy, patients who received i.v. vasoactive drugs within the first 24 h on ICU were 3.9 times more likely to die (OR 3.85; 95% CI, 1.64 -9.02; P\u00bc0.002). No significant prognostic factors were determined by the model on day 2.", "label": "conclusion", "metadata": "18982114"}
{"text": "Kinetics ofA TP Binding to Normal and Myopathic", "label": "conclusion", "metadata": "10700033"}
{"text": "Observed rate constants, k0b,, were obtained by fitting the equation I(t)=oe-kobs+C by the method of moments, where I is the observed fluorescence intensity, and I0 is the amplitude of fluorescence change. 38 ", "label": "conclusion", "metadata": "235564322"}
{"text": "The capabilities of modern angiographic platforms have recently improved substantially.", "label": "conclusion", "metadata": "2877272"}
{"text": "Few studies have concentrated specifically on the outcomes after surgery.", "label": "conclusion", "metadata": "18989842"}
工作是使用元数据“18982114”快速检索记录的文本
解决方案
让我们假设我们有以下 JSON 内容example.json
:
{
"1":{"text": "Some text 1.", "label": "xxx", "metadata": "18982114"},
"2":{"text": "Some text 2.", "label": "yyy", "metadata": "18982114"},
"3":{"text": "Some text 3.", "label": "zzz", "metadata": "something else"}
}
您可以执行以下操作:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
# 1. read json content from file
my_json = None
with open("example.json", "r") as file:
my_json = json.load(file)
# 2. filter content
# you can use a list instead of a new dictionary if you don't want to create a new json file
new_json_data = {}
for record_id in my_json:
if my_json[record_id]["metadata"] == str(18982114):
new_json_data[record_id] = my_json[record_id]
# 3. write a new json with filtered data
with open("result.json"), "w") as file:
json.dump(new_json_data, file)
这将输出以下result.json
文件:
{"1": {"text": "Some text 1.", "label": "", "metadata": "18982114"}, "2": {"text": "Some text 2.", "label": "", "metadata": "18982114"}}
推荐阅读
- firebase-hosting - 将大量文件上传到 FireBase 托管时出错
- curl - 连接何时挂起与失败?
- laravel - 使用所有元素验证 Laravel 中的数组
- lua - Wireshark Lua Dissector 响应请求
- python - 理解python语法 - 变量后跟括号
- r - 在 R studio 中安装 Keras
- postgresql - 使用 Kettle 转换运行时 Postgresql 查询发生更改
- pyqt5 - 公证 MacOS 应用程序后加载 python lib 时出错
- amazon-web-services - 文件偶尔无法上传到 s3
- python - 以独特的方式展平包含嵌套字典的列表的数据框列