首页 > 解决方案 > 使用python从JSON文件中的嵌套列表中获取元素

问题描述

我正在阅读一个 JSON 文件并尝试从每个文件中提取以下信息log并将其存储在 Excel 表中。

["name","TOM DOE"]
["value","132"]

我的 python 代码能够从 JSON 中提取一些其他必需的信息。但是,由于 JSON 包含多个嵌套列表和字典,我无法提取两个信息并存储它。请任何人都可以帮助我使用python代码。

这是我的python代码:

import json
import pandas
class ConvertToExcel:

    def Process(self):
        with open('C:/Users/Desktop/SampleTestFiles/new17.json') as json_file:
            dataarray = json.load(json_file)

        data1 = []
        logcount = 0
        for data in dataarray:
            logcount = logcount + 1
            for i in range(len(data['log'])):
                code = data['log'][i]['code']
                message = data['log'][i]['message']
                try:
                    cid = data['log'][i]['message']['cid']
                except:
                    cid = 0
                text = data['log'][i]['message']['text']
                refs = data['log'][i]['refs']
                for k in range(len(data['log'][i]['refs'])):
                    try:
                        hrefs = data['log'][i]['refs'][k]['href']
                    except:
                        hrefs =''
                    try:
                        hrefsFacts = data['log'][i]['refs'][k]['href']
                    except:
                        hrefsFacts =''

                    # print(type(data['log'][i]['refs'][k]['href']['properties']))
                    # res1 = 'name' in chain(*data['log'][i]['refs'][k]['href']['properties'])

                    # for elem in len(data['log'][i]['refs'][k]['href']['properties']):
                    #     for item in elem:
                    #         if(item == 'val'):
                    #             hrefspropertiesvalue = item
                    #         else:
                    #             pass

                hrefspropertiesvalue = 'a'
                level = data['log'][i]['level']
                data1.append((logcount,i, code, message,cid, text, refs,hrefs,hrefsFacts, hrefspropertiesvalue, level))

        pandas.DataFrame(data1, columns=['Log', 'Innerlog', 'code', 'message','cid','text','refs','hrefs','hrefsFacts','hrefspropertiesvalue', 'level']).to_excel("output.xlsx")

A = ConvertToExcel()
A.Process()

并且,JSON:

[{ "log": [
   {
   "code": "nikv.F1.all.1",
   "message": {
    "cid": "61785360",
    "filing_url": "C:\\Users\\farizaleta\\Desktop\\test-428-2016Q4F1.abcd",
    "severity": "error",
    "text": "[nikv.F1.all.1] The values of 6,075,786 for the elements nikv:OtherChargestested is duplicated in the filing 2 times.\n\nElement : nikv:OtherChargestested\nPeriod : 2016-01-01 to 2016-12-31\n\n\nRule Id:nikv.F1.all.1 - test-428-2016Q4F1.abcd 4122"
   },
   "refs": [
    {
     "href": "test-428-2016Q4F1.abcd#f-743",
     "sourceLine": 4122,
     "properties": [
      [
       "label",
       "Other charges, tested"
      ],
      [
       "namespace",
       "http://nikv.com/form/2002-01-01/nikv"
      ],
      [
       "name",
       "TOM DOE"
      ],
      [
       "QName",
       "nikv:OtherChargestested"
      ],
      [
       "contextRef",
       "c-01",
       [
        [
         "entity",
         "C002089",
         [
          [
           "scheme",
           "http://tested.com/entity/identification/scheme"
          ]
         ]
        ],
        [
         "startDate",
         "2016-01-01"
        ],
        [
         "endDate",
         "2016-12-31"
        ],
        [
         "dimensions",
         "(1)",
         [
          [
           "nikv:OfficerAxis",
           "<nikv:OfficerDomain>0-1</nikv:OfficerDomain>\n\t\t\t\t\n"
          ]
         ]
        ]
       ]
      ],
      [
       "unitRef",
       "u-02",
       [
        [
         "measure",
         "iso4217"
        ]
       ]
      ],
      [
       "decimals",
       "INF"
      ],
      [
       "precision",
       "None"
      ],
      [
       "xsi:nil",
       "false"
      ],
      [
       "value",
       "132"
      ]
     ],
     "objectId": "91269"
    }
   ],
   "level": "error"
  }]
  }]

下面突出显示的列需要填充到 excel 输出中: 在此处输入图像描述

标签: pythonjson

解决方案


您可以像这样遍历属性:

for type, val, *_ in data['log'][i]['refs'][k]['href']['properties']:

*_忽略子列表中的任何额外元素properties

然后您可以测试是否typenamevalue并设置适当的变量。

    def Process(self):
        with open('C:/Users/Desktop/SampleTestFiles/new17.json') as json_file:
            dataarray = json.load(json_file)

        data1 = []
        logcount = 0
        for data in dataarray:
            logcount = logcount + 1
            for x in data['log']:
                code = x['code']
                message = x['message']
                try:
                    cid = x['message']['cid']
                except:
                    cid = 0
                text = x['message']['text']
                refs = x['refs']
                name = ''
                value = ''
                for ref in x['refs']:
                    try:
                        hrefs = ref['href']
                    except:
                        hrefs =''
                    try:
                        hrefsFacts = ref['href']
                    except:
                        hrefsFacts =''

                    for type, val, *_ in ref['properties']:
                        if type = 'name':
                            name = val
                        elif type = 'value':
                            value = val

                hrefspropertiesvalue = 'a'
                level = x['level']
                data1.append((logcount,i, code, message,cid, text, refs,hrefs,hrefsFacts, hrefspropertiesvalue, level, name, value))

        pandas.DataFrame(data1, columns=['Log', 'Innerlog', 'code', 'message','cid','text','refs','hrefs','hrefsFacts','hrefspropertiesvalue', 'level', 'name', 'value']).to_excel("output.xlsx")

我还简化了所有要使用的循环,for <variable> in <list>而不是for <indexvariable> in range(len(<list>)).


推荐阅读