首页 > 解决方案 > 如何迭代 json 输出并在数据框中获得理想的结果?

问题描述

我正在连接 API 并尝试获取数据。以 JSON 格式提取的数据,但我想将其转换为具有某些记录的数据框。请你帮助我好吗?

例子:

queue_id = 'a72dba75-0bc6-4a65-b120-8803364f8dc3'

queryResult: {'results': [{'data': [{'interval': '2021-10-11T11:46:25.000Z/2021-10-12T11:46:25.000Z',
                        'metrics': [{'metric': 'nOffered',
                                     'qualifier': None,
                                     'stats': {'count': 1,
                                               'count_negative': None,
                                               'count_positive': None,
                                               'current': None,
                                               'denominator': None,
                                               'max': None,
                                               'min': None,
                                               'numerator': None,
                                               'ratio': None,
                                               'sum': None,
                                               'target': None}}],
                        'views': None},
                       {'interval': '2021-10-13T11:46:25.000Z/2021-10-14T11:46:25.000Z',
                        'metrics': [{'metric': 'nOffered',
                                     'qualifier': None,
                                     'stats': {'count': 2,
                                               'count_negative': None,
                                               'count_positive': None,
                                               'current': None,
                                               'denominator': None,
                                               'max': None,
                                               'min': None,
                                               'numerator': None,
                                               'ratio': None,
                                               'sum': None,
                                               'target': None}}],
                        'views': None},
                       {'interval': '2021-10-14T11:46:25.000Z/2021-10-15T11:46:25.000Z',
                        'metrics': [{'metric': 'nOffered',
                                     'qualifier': None,
                                     'stats': {'count': 3,
                                               'count_negative': None,
                                               'count_positive': None,
                                               'current': None,
                                               'denominator': None,
                                               'max': None,
                                               'min': None,
                                               'numerator': None,
                                               'ratio': None,
                                               'sum': None,
                                               'target': None}}],
                        'views': None},
                       {'interval': '2021-10-15T11:46:25.000Z/2021-10-16T11:46:25.000Z',
                        'metrics': [{'metric': 'nOffered',
                                     'qualifier': None,
                                     'stats': {'count': 1,
                                               'count_negative': None,
                                               'count_positive': None,
                                               'current': None,
                                               'denominator': None,
                                               'max': None,
                                               'min': None,
                                               'numerator': None,
                                               'ratio': None,
                                               'sum': None,
                                               'target': None}}],
                        'views': None}],
              'group': {'mediaType': 'voice',
                        'queueId': '73643cff-799b-41ae-9a67-efcf5e593155'}}]}

我想在数据框中得到以下格式-

                queue_id            nOffered_count  nOffered_sum       interval_start   interval_end                       
0  a72dba75-0bc6-4a65-b120-8803364f8dc3   6         None        2021-10-11T11:46:25.000Z  2021-10-12T11:46:25.000Z
1  a72dba75-0bc6-4a65-b120-8803364f8dc3   1         None        2021-10-12T11:46:25.000Z  2021-10-13T11:46:25.000Z
2  a72dba75-0bc6-4a65-b120-8803364f8dc3  12         None        2021-10-13T11:46:25.000Z  2021-10-14T11:46:25.000Z
3  a72dba75-0bc6-4a65-b120-8803364f8dc3   6         None        2021-10-14T11:46:25.000Z  2021-10-15T11:46:25.000Z
4  a72dba75-0bc6-4a65-b120-8803364f8dc3   6         None        2021-10-15T11:46:25.000Z  2021-10-16T11:46:25.000Z

nOffered_sum&nOffered_count列名就像 - metric_sum&metric_count

我尝试了下面的代码,但没有给出正确的结果 -

        column_names = []
        final_data_lst = []
        
        if(query_result.results != None):
            for item in query_result.results:
                data_lst = []
                for lst_data in item.data:
                    print("####################################")
                    print(lst_data)
                    print("####################################")
                    for met in lst_data.metrics:
                        metric_name = met.metric
                        column_names.append('Queue_Id')
                        column_names.append(metric_name+'_count')
                        column_names.append(metric_name+'_sum')
                        column_names.append('Interval Start')
                        column_names.append('Interval End')
                        data_lst.append(queue_id)
                        data_lst.append(met.stats.count)
                        data_lst.append(met.stats.sum)
                        data_lst.append(lst_data.interval.split('/')[0])
                        data_lst.append(lst_data.interval.split('/')[1])

标签: pythonjsonpython-3.xpandasdataframe

解决方案


根据需要使用列名的方法之一:

import pandas
from collections import defaultdict
queue_id = 'a72dba75-0bc6-4a65-b120-8803364f8dc3'
out = defaultdict(list)
for data in queryResult['results']:
    for d in data.get('data'):
        out['queue_id'].append(queue_id)
        for metric in d.get('metrics'):
            out[metric['metric']+"_count"].append(metric['stats']['count'])
            out[metric['metric']+"_sum"].append(metric['stats']['sum'])
        interval = d['interval'].split('/')
        out['interval_start'].append(interval[0])
        out['interval_end'].append(interval[1])

df = pandas.DataFrame(out)
print (df)

推荐阅读