首页 > 解决方案 > 嵌套 json 对象的相似性

问题描述

我有大约 1000 个嵌套的 json 对象(参见示例)。

每个级别都可能出现差异。我想计算两个对象的相似度。发生差异的级别越高,数据集的差异就越大。有些字段是相关的,例如名称和 ID。

任何想法?任何可以接管工作的模块?非常感谢,任何帮助。

json示例:

{ "items";[
    {
    "id":"abcd",
    "Name":"Name",
    "Infos": {
        "info1":"info1",
        "info2":"info2"
    },
    "data":{
        "data1":"info1",
        "data2":"info2"
    },
    "packs": [
    {
    "Name":"Name",
    "description":"description"
    },
    {
    "Name1":"Name1",
    "description1":"description1"
    } } {
    "id":"abcd",
    "Name":"Name",
    "Infos": {
        "info1":"info1",
        "info4":"info4"
    },
    "data":{
        "data1":"info1",
        "data2":"info2"
    },
    "packs": [
    {
    "Name3":"Name3",
    "description":"description"
    },
    {
    "Name3":"Name3",
    "description1":"description1"
    } }

标签: python

解决方案


def difference(x,y):
  if isinstance(x, list) and isinstance(y, list):
    return 1+min( difference(a, b) for a, b in zip(x,y) )
  elif isinstance(x, dict) and isinstance(y, dict): 
    return 1+min( difference(a, b) for a, b in zip(x.values(), y.values() )
  else:
    return 1 if x!=y else 0

那是你需要的吗?


推荐阅读