首页 > 解决方案 > 从json数组python中删除冗余键值

问题描述

我有一个 json 文件包含一个对象数组,文件内的数据是这样的:

[
 {‘name’: ‘A’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to A’},
 {‘name’: ‘B’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to B’},
 {‘name’: ‘C’,
 ‘address’: ‘some address related to A’,
 ‘details’: ‘some details related to C’}
]

我想删除多余的键值,所以输出应该是这样的:

  [
   {‘name’: ‘A’,
   ‘address’: ‘some address related to A’,
   ‘details’: ‘some details related to A’},
   {‘name’: ‘B’,
   ‘details’: ‘some details related to B’},
   {‘name’: ‘C’,
   ‘details’: ‘some details related to C’}
  ]

所以,我试过这个代码在这个链接中找到它:

import json

with open(‘./myfile.json’) as fp:
    data= fp.read()
  
unique = []
for n in data:
    if all(unique_data["address"] != data for unique_data["address"] in unique):
        unique.append(n)

#print(unique)   
with open(“./cleanedRedundancy.json”, ‘w’) as f:
     f.write(unique)

但它给了我这个错误:

TypeError: string indices must be integers

标签: pythonarraysjson

解决方案


我做了有/没有文件支持的解决方案,默认情况下没有,因为你的情况支持文件更改use_files = Falseuse_files = True我的脚本中。

我希望您要删除具有相同 (key, value) 对的重复项。

在线尝试!

import json

use_files = False
# Only duplicates with next keys will be deleted
only_keys = {'address', 'complex'}

if not use_files:
    fdata = """
    [
     {
       "name": "A",
       "address": "some address related to A",
       "details": "some details related to A"
     },
     {
       "name": "B",
       "address": "some address related to A",
       "details": "some details related to B",
       "complex": ["x", {"y": "z", "p": "q"}],
       "dont_remove": "test"
     },
     {
       "name": "C",
       "address": "some address related to A",
       "details": "some details related to C",
       "complex": ["x", {"p": "q", "y": "z"}],
       "dont_remove": "test"
     }
    ]
    """

if use_files:
    with open("./myfile.json", 'r', encoding = 'utf-8') as fp:
        data = fp.read()
else:
    data = fdata

entries = json.loads(data)

unique = set()
for e in entries:
    for k, v in list(e.items()):
        if k not in only_keys:
            continue
        v = json.dumps(v, sort_keys = True)
        if (k, v) in unique:
            del e[k]
        else:
            unique.add((k, v))

if use_files:
    with open("./cleanedRedundancy.json", "w", encoding = 'utf-8') as f:
        f.write(json.dumps(entries, indent = 4, ensure_ascii = False))
else:
    print(json.dumps(entries, indent = 4, ensure_ascii = False))

输出:

[
    {
        "name": "A",
        "address": "some address related to A",
        "details": "some details related to A"
    },
    {
        "name": "B",
        "details": "some details related to B",
        "complex": [
            "x",
            {
                "y": "z",
                "p": "q"
            }
        ],
        "dont_remove": "test"
    },
    {
        "name": "C",
        "details": "some details related to C",
        "dont_remove": "test"
    }
]

推荐阅读