首页 > 解决方案 > 从 JSON 文件中删除“”,这样它们就不会中断字符串

问题描述

我有一个巨大的 JSON 文件,看起来像这样(我复制粘贴了很多,但错误出现在此示例的开头):

{
  "data":[
    {
      "title":"Title1",
      "paragraphs":[
        {
          "context":"In this text, one of the "words" is between quotation marks",
          "qas":[
            {
             "answers":[
               {
                "answer_start":515,
                "text":"String text"
               }
              ],
             "question": "Why something something?",
             "id":"5733be284776f41900661182"
             },
             {
             "answers":[
               {
                "answer_start":505,
                "text":"String something text"
               }
              ],
             "question": "Why?",
             "id":"5733be284776f4190066345"
             }
            ]
           },
           {
          "context":"Context2",
          "qas":[
            {
             "answers":[
               {
                "answer_start":515,
                "text":"String text"
               }
              ],
             "question": "Why something something?",
             "id":"5733be284776f41900661182"
             },
             {
             "answers":[
               {
                "answer_start":505,
                "text":"String something text"
               }
              ],
             "question": "Why?",
             "id":"5733be284776f4190066345"
             }
            ]
           }
          ]
         },
         {
         "title":"Title2",
      "paragraphs":[
        {
          "context":"Context10",
          "qas":[
            {
             "answers":[
               {
                "answer_start":585,
                "text":"String text"
               }
              ],
             "question": "Why something something?",
             "id":"5733be284776f41900661682"
             },
             {
             "answers":[
               {
                "answer_start":545,
                "text":"String something text"
               }
              ],
             "question": "Why?",
             "id":"5733be284776f41900663"
             }
            ]
           },
           {
          "context":"Context7",
          "qas":[
            {
             "answers":[
               {
                "answer_start":525,
                "text":"String text"
               }
              ],
             "question": "Why something something?",
             "id":"5733be284776f41982"
             },
             {
             "answers":[
               {
                "answer_start":595,
                "text":"String something text"
               }
              ],
             "question": "Why?",
             "id":"5733be284776f419005"
             }
            ]
           }
          ]
         }
         ],
          "version":"1.1"
         }

当我在 Python 中处理这个文件时(我想改变它的结构),字符串中的引号会破坏字符串,所以它给了我一个错误。我在 Python 中尝试过replace,但这是有问题的,因为我不希望""分隔字符串消失。我也不能手动删除它们,因为文件很大。

这是更改结构的代码,但我想这是每个 JSON 文件的问题:

import json

with open('file.json', 'r') as fh:
    data = json.load(fh)

result = []

for article in data["data"]:
    for paragraph in article["paragraphs"]:
        for qa in paragraph["qas"]:
            answers = {"text": [answer["text"] for answer in qa["answers"]]}
            result.append({
                "id": qa["id"], 
                "context": paragraph["context"], 
                "question": qa["question"], 
                "answers": answers
            })

with open('output.json', 'w') as fh:
    json.dump(result, fh)

标签: pythonjsonstringcharacter

解决方案


推荐阅读