首页 > 解决方案 > 如何根据键的值拆分大型json文件?

问题描述

我有一个大的 json 文件,我想根据键“元数据”进行拆分。记录的一个例子是

{"text": "The primary outcome of the study was hospital mortality; secondary outcomes included ICU mortality and lengths of stay for hospital and ICU. ICU mortality was defined as survival of a patient at ultimate discharge from the ICU and hospital mortality was defined as survival at discharge or transfer from our hospital.", "label": "conclusion", "metadata": "18982114"}

json文件中有很多记录,其中键“元数据”为“18982114”。如何提取所有这些记录并将它们存储到单独的 json 文件中?理想情况下,我正在寻找一种不包括加载和循环文件的解决方案,否则每次查询时都会非常麻烦。我认为使用 shell 命令可能是可行的,但不幸的是我不是 shell 命令方面的专家......所以我非常感谢非循环快速查询解决方案,谢谢!

==================================================== =========================

以下是文件的一些示例(包含 5 条记录):

{"text": "Finally, after an emergency laparotomy, patients who received i.v. vasoactive drugs within the first 24 h on ICU were 3.9 times more likely to die (OR 3.85; 95% CI, 1.64 -9.02; P\u00bc0.002). No significant prognostic factors were determined by the model on day 2.", "label": "conclusion", "metadata": "18982114"}

{"text": "Kinetics ofA TP Binding to Normal and Myopathic", "label": "conclusion", "metadata": "10700033"}

{"text": "Observed rate constants, k0b,, were obtained by fitting the equation I(t)=oe-kobs+C by the method of moments, where I is the observed fluorescence intensity, and I0 is the amplitude of fluorescence change. 38 ", "label": "conclusion", "metadata": "235564322"}

{"text": "The capabilities of modern angiographic platforms have recently improved substantially.", "label": "conclusion", "metadata": "2877272"}

{"text": "Few studies have concentrated specifically on the outcomes after surgery.", "label": "conclusion", "metadata": "18989842"}

工作是使用元数据“18982114”快速检索记录的文本

标签: pythonjson

解决方案


让我们假设我们有以下 JSON 内容example.json

{
    "1":{"text": "Some text 1.", "label": "xxx", "metadata": "18982114"},
    "2":{"text": "Some text 2.", "label": "yyy", "metadata": "18982114"},
    "3":{"text": "Some text 3.", "label": "zzz", "metadata": "something else"}
}

您可以执行以下操作:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json

# 1. read json content from file
my_json = None
with open("example.json", "r") as file:
  my_json = json.load(file)

# 2. filter content
#    you can use a list instead of a new dictionary if you don't want to create a new json file
new_json_data = {}
for record_id in my_json:
    if my_json[record_id]["metadata"] == str(18982114):
        new_json_data[record_id] = my_json[record_id]

# 3. write a new json with filtered data
with open("result.json"), "w") as file:
    json.dump(new_json_data, file)

这将输出以下result.json文件:

{"1": {"text": "Some text 1.", "label": "", "metadata": "18982114"}, "2": {"text": "Some text 2.", "label": "", "metadata": "18982114"}}

推荐阅读