首页 > 解决方案 > Python,JSON数据的递归函数在输出中缺少值

问题描述

我创建了一个递归函数来遍历 JSON 数据,它可以由键和值对组成,其中值是列表或字典。在这种情况下,我想使用递归函数最终返回不是列表或字典的键和单个值。我正在尝试将当前的每个值存储在作为字典的collect变量中,以查看数据通过的位置。但是,缺少值有两个问题:

1 有一个在collect中未捕获的 'id' 和 'fee' 的示例。

2 遍历列表值时,仅捕获最后一个索引。本例中使用的键名附加了索引号,因此每个键名应该是唯一的。

你能找出这些问题的原因吗?如果您能提供以下见解,将不胜感激。

以下是重现该问题的步骤。

可以将数据复制到文本文件中并保存为 myfile.JSON。

{
  "output": {
    "id": "ABC",
    "fee": 155.47,
    "details": [
      {
       "sales": 1000,
       "cost": 200.50,
       "card": [
         {
            "a": 0.01,
            "up": 100.25555,
            "down": 90.25555
         },
         {
           "b": 0.02, 
           "up": 101.25, 
           "down": 80.25
         }
        ]
       },
      {
        "sales": 1100,
        "cost": 300.75,
        "card": [
          {
           "a": 0.01,
          "up": 110.75111,
          "down": 80.7111
          },
          {
           "b": 0.02, 
           "up": 102.25111, 
           "down": 70.50111
           }
          ]
         }
       ],
   "percent": 0.25,
   "sales_start": 1000}}



# The following loads the file to a variable

import json

with open ('myfile.json',"r") as f:
    data = json.load(f)


# Create the recursion function

collect = {}

def myfunc(x, max_level, level=0, keystr='output'):
    global collect
    collect = {}

    if level <= max_level:
        level += 1
        if isinstance(x, dict):
            for k, v in x.items():
                knames = keystr + '-' + k
                if isinstance(v, (dict,list)):
                    myfunc(x[k], max_level, level, knames)  
                    collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
                else:
                    collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})
                
        elif isinstance(x, list):
            for i in range(len(x)):
                if isinstance(x[i], (dict, list)):
                    knames = keystr + '-idx'+str(i)
                    myfunc(x[i], max_level, level, knames)
                    collect.update({'Case3' + '-' + knames : x[i]})
                else:
                    collect.update({'Case4' + '-' + knames : x[i]})
        else:
            for k, v in x.items():
                knames = keystr + '-' + k
                collect.update({'Case5' + '_' + knames + '-lev'+str(level): v})  
                return x    

创建的函数有一个 max_level,用于控制我希望为其返回键值对的嵌套深度。将字符串拼接在一起将有助于识别我们在嵌套中的深度并创建唯一的键名。

# Run the recursion function and output collect

    x = data['output']
    myfunc(x, 0)
    collect

这将输出以下内容:

{'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

这是问题 #1,缺少上面的输出:'id': 'ABC', 'fee': 155.47

# Re-run function creation portion above to reset collection, then the following
myfunc(x, 1)
collect

max_level 控制要嵌套到 JSON 中的深度。以上返回以下内容:

{'Case3-output-details-idx1': {'sales': 1100,
  'cost': 300.75,
  'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
   {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]},
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

这是问题 #2,上面的输出仅返回“Case3-output-details-idx1”并且缺少“Case3-output-details-idx0”。对于任意数量的索引值,仅返回最后一个。

--------下面的确认,虽然没有递归函数。

下面的代码模拟递归并显示正确的输出。我的数据文件要大得多,并且会有多个文件具有不同的字典和列表组合,因此需要使用递归。

x = data['output']
max_level = 0
level = 0
collect = {}
keystr = 'output'



if level <= max_level:
    level += 1
    if isinstance(x, dict):
        for k, v in x.items():
            knames = keystr + '-' + k
            if isinstance(v, (dict,list)):
            
                collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
            else:
                collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})

    elif isinstance(x, list):
        for i in range(len(x)):
            if isinstance(x[i], (dict, list)):
                knames = keystr + '-idx'+str(i)
            
                collect.update({'Case3' + '-' + knames : x[i]})
            else:
                collect.update({'Case4' + '-' + knames : x[i]})
else:
        for k, v in x.items():
            knames = keystr + '-' + k
            collect.update({'Case5' + '_' + knames + '-lev'+str(level): v})  

collect

输出如下,显示返回的 id 和费用:

{'Case2-output-id-lev1': 'ABC',
 'Case2-output-fee-lev1': 155.47,
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}





# Trying to simulate looking one level into the nest

x = data['output']['details']
max_level = 1
level = 1
keystr = 'output'
collect = {}


if level <= max_level:
    level += 1
    if isinstance(x, dict):
        for k, v in x.items():
            knames = keystr + '-' + k
            if isinstance(v, (dict,list)):
            
                collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
            else:
                collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})

    elif isinstance(x, list):
        for i in range(len(x)):
            if isinstance(x[i], (dict, list)):
                knames = keystr + '-idx'+str(i)
            
                collect.update({'Case3' + '-' + knames : x[i]})
            else:
                collect.update({'Case4' + '-' + knames : x[i]})
    else:
        for k, v in x.items():
            knames = keystr + '-' + k
            collect.update({'Case5' + '_' + knames + '-lev'+str(level): v}) 

collect

上面的代码使用“Case3-output-idx0”和“Case3-output-idx1”输出以下内容

{'Case3-output-idx0': {'sales': 1000,
  'cost': 200.5,
  'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
   {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
 'Case3-output-idx1': {'sales': 1100,
  'cost': 300.75,
  'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
   {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}}            

非常感谢您的审阅。

标签: pythonjsonrecursion

解决方案


在进一步查看问题后,使用 myfunc() 的递归似乎覆盖了变量collect,即使它被指定为函数中的全局变量:

global collect
collect = {}

通过删除它并在函数外部使用常规全局变量,collect变量能够捕获所有值。下面的代码有助于缩小问题范围。

### Issue 1 simplified
# Comment out myfunc() recursion allows id and fee to pass through

collect = {}

def myfunc(x, max_level, level=0, keystr='output'):
    global collect
    collect = {}
    if level <= max_level:
        level += 1
        if isinstance(x, dict):
            for k, v in x.items():
                knames = keystr + '-' + k
                if isinstance(v, (dict,list)):
                    #myfunc(x[k], max_level, level, knames)  
                    collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
                else:
                    collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})

x = data['output']
myfunc(x, 0)
collect


# Output with id and fee

{'Case2-output-id-lev1': 'ABC',
 'Case2-output-fee-lev1': 155.47,
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

当 myfunc() 没有在下面注释掉时,递归似乎会覆盖最初存储在collect中的内容,即 id 和 fee 值。

collect = {}

def myfunc(x, max_level, level=0, keystr='output'):
    global collect
    collect = {}
    if level <= max_level:
        level += 1
        if isinstance(x, dict):
            for k, v in x.items():
                knames = keystr + '-' + k
                if isinstance(v, (dict,list)):
                    myfunc(x[k], max_level, level, knames)  
                    collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
                else:
                    collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})


x = data['output']
myfunc(x, 0)
collect                    


# Output missing id and fee

{'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

为了解决这个问题,可以删除函数内部的全局变量。

### collect outside function

collect = {}

def myfunc(x, max_level, level=0, keystr='output'):
    if level <= max_level:
        level += 1
        if isinstance(x, dict):
            for k, v in x.items():
                knames = keystr + '-' + k
                if isinstance(v, (dict,list)):
                    myfunc(x[k], max_level, level, knames)  
                    collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
                else:
                    collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})

x = data['output']
myfunc(x, 0)
collect            

# Outputs id and fee

{'Case2-output-id-lev1': 'ABC',
 'Case2-output-fee-lev1': 155.47,
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

现在使用其余的代码:

collect = {}

def myfunc(x, max_level, level=0, keystr='output'):
    if level <= max_level:
        level += 1
        if isinstance(x, dict):
            for k, v in x.items():
                knames = keystr + '-' + k
                if isinstance(v, (dict,list)):
                    myfunc(x[k], max_level, level, knames)  
                    collect.update({'Case1' + '-' + knames + '-lev'+str(level): v})
                else:
                    collect.update({'Case2' + '-' + knames + '-lev'+str(level): v})

        elif isinstance(x, list):
            for i in range(len(x)):
                if isinstance(x[i], (dict, list)):
                    knames = keystr + '-idx'+str(i)
                    myfunc(x[i], max_level, level, knames)
                    collect.update({'Case3' + '-' + knames : x[i]})
                else:
                     collect.update({'Case4' + '-' + knames : x[i]})
        else:
            for k, v in x.items():
                knames = keystr + '-' + k
                collect.update({'Case5' + '_' + knames + '-lev'+str(level): v})  

x = data['output']
myfunc(x, 0)
collect       


# Outputs

{'Case2-output-id-lev1': 'ABC',
 'Case2-output-fee-lev1': 155.47,
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000}

然后我们可以在 JSON 中进一步嵌套一层,其中返回 'Case2-output-details-idx0 和 'Case2-output-details-idx1 值的每个索引值,这解决了问题 #2。现在返回所有值。

x = data['output']
myfunc(x, 2) # Nest in one layer further
collect

{'Case2-output-id-lev1': 'ABC',
 'Case2-output-fee-lev1': 155.47,
 'Case1-output-details-lev1': [{'sales': 1000,
   'cost': 200.5,
   'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
    {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
  {'sales': 1100,
   'cost': 300.75,
   'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
    {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}],
 'Case2-output-percent-lev1': 0.25,
 'Case2-output-sales_start-lev1': 1000,
 'Case2-output-details-idx0-sales-lev3': 1000,
 'Case2-output-details-idx0-cost-lev3': 200.5,
 'Case1-output-details-idx0-card-lev3': [{'a': 0.01,
   'up': 100.25555,
   'down': 90.25555},
  {'b': 0.02, 'up': 101.25, 'down': 80.25}],
 'Case3-output-details-idx0': {'sales': 1000,
  'cost': 200.5,
  'card': [{'a': 0.01, 'up': 100.25555, 'down': 90.25555},
   {'b': 0.02, 'up': 101.25, 'down': 80.25}]},
 'Case2-output-details-idx1-sales-lev3': 1100,
 'Case2-output-details-idx1-cost-lev3': 300.75,
 'Case1-output-details-idx1-card-lev3': [{'a': 0.01,
   'up': 110.75111,
   'down': 80.7111},
  {'b': 0.02, 'up': 102.25111, 'down': 70.50111}],
 'Case3-output-details-idx1': {'sales': 1100,
  'cost': 300.75,
  'card': [{'a': 0.01, 'up': 110.75111, 'down': 80.7111},
   {'b': 0.02, 'up': 102.25111, 'down': 70.50111}]}}

推荐阅读