首页 > 解决方案 > Python:用 JSON 字典替换字符串

问题描述

我需要在 Python 中创建一个脚本,用于基于 json 字典替换 json 文件中的字符串。该文件包含有关专利的信息,如下所示:

{
  "US-8163793-B2": {
    "publication_date": "20120424",
    "priority_date": "20090420",
    "family_id": "42261969",
    "country_code": "US",
    "ipc_code": "C07D417/14",
    "cpc_code": "C07D471/04",
    "assignee_name": "Hoffman-La Roche Inc.",
    "title": "Proline derivatives",
    "abstract": "The invention relates to a compound of formula (I) wherein A, R 1 -R 6  are as defined in the description and in the claims. The compound of formula (I) can be used as a medicament."
  }
}

最初,我使用了一个软件,该软件基于实体(例如 COMPANY)来识别所有写法不同但相同的单词。例如,公司“BMW”可以称为“BMW Ag”,也可以称为“BMW Group”。而这本词典的结构是这样的(只是部分表示,否则会很长):

{
  "RESP_META" : {
  ,"RESP_WARNINGS" : null
  ,"RESP_PAYLOAD": 
    {
      "BIOCHEM": [
        { 
          "hitID": "D011392",
          "name": "L-Proline",
          "frag_vector_array": [
            "16#{!Proline!} derivatives"
          ],
          ...,
          "sectionMeta": {
            "8": "$.US-8163793-B2.title|"
          }
        },
        { 
          (next hit...)
        },
        ...
      ]
    }

考虑到密钥为我"sectionMeta"提供了专利 ID,例如abstract,,title或被替换,它总是介于 之间,例如,并且那个词应该被替换为,例如。.assignee_name"frag_vector_array"{!!}{! Proline!}"name"L-Proline

我尝试了一些东西来替换公司名称,但我认为我走错了路。这是我开始的代码:

import json

patents = json.load(open("testset_patents.json"))

companies = json.load(open("termite_output.json"))

print(companies)

companies = companies['RESP_PAYLOAD']

# loop through companies data
for company in companies.values():
    company_list = company["COMPANY"]

    for comp in company_list:
        comp_name = comp["name"]

        # update patents "name" in "assignee_name"
        for patent in patents.values():
            patent['assignee_name'] = comp_name

    print(patents)

    # save output in new file
    with open('company_replacement.json', 'w') as fp:
        json.dump(patents, fp)

欢迎任何和所有的帮助。

标签: pythonjson

解决方案


推荐阅读