首页 > 解决方案 > 从 Python 上的多个 JSON 文件创建数据框

问题描述

我必须从一系列 JSON 文件中创建一个数据框。到目前为止,这是我在某些背景下所拥有的

#Importing helper libraries
import sys
import json


from helpers.helper_functions import execute_bigquery

#importing standard libraries
import requests

#get data from bigquery
authors_df = execute_bigquery(f"""
    SELECT author
    FROM `XXX`
    LIMIT 1000
    """)

#for each row
for index, row in authors_df.iterrows():
    #get the author
    author = row['author']

基本上,作者是我想从中收集数据的 1,000 个 ID 的列表(例如1232456093273等)。

我想为这些作者提供的信息可以从一个链接中获得,该链接因作者而异

    #build the url
    url = f'http://keystone-db.default.svc.cluster.local:5000/keystonedb/profiles/resonance/categorization?profileId={author}&regionId=1'    

    #get the json value
    json_value = requests.get(url).json()

    #display it
    print(json.dumps(json_value['resonanceCategorizations']['1']['fullData'], indent=2))

以下是前 2 位作者"45866207""54502344") 的部分输出:

45866207
[
  {
    "seed": 24868793,
    "globalSegmentId": 26895,
    "globalSegmentName": "Luxury Accessories & Jewellery",
    "regionId": 15,
    "resonance": 0.8028571009635925,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 76611584,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.8028001189231873,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 40893487,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.7982199192047119,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 74701069,
    "globalSegmentId": 17912,
    "globalSegmentName": "Heritage Designer Brands",
    "regionId": 15,
    "resonance": 0.6809910535812378,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 936905156,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.6566575169563293,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 14831515,
    "globalSegmentId": 17801,
    "globalSegmentName": "Mining & Resources",
    "regionId": 1,
    "resonance": 0.6080579161643982,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 36544806,
    "globalSegmentId": 18392,
    "globalSegmentName": "Rugby",
    "regionId": 12,
    "resonance": 0.5898635983467102,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 26494583,
    "globalSegmentId": 26895,
    "globalSegmentName": "Luxury Accessories & Jewellery",
    "regionId": 15,
    "resonance": 0.5888025760650635,
    "isGlobal": true,
    "globalRegion": 1


    }
    ]
54502344
[
  {
    "seed": 255420441,
    "globalSegmentId": 18187,
    "globalSegmentName": "Luxury Cars",
    "regionId": 18,
    "resonance": 0.9264420866966248,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 2650413864,
    "globalSegmentId": 18187,
    "globalSegmentName": "Luxury Cars",
    "regionId": 18,
    "resonance": 0.9237868189811707,
    "isGlobal": true,
    "globalRegion": 1
  },
  ...

对于列表中的任何其他作者也是如此。

我想要获得的是一种方法,可以为 JSON 列表的第一个元素中的每个作者变量、列表的第二个元素中的所有变量和第三个元素中的所有变量提取并将它们放入具有 1,000 行的数据集中(每个作者一个)。

这是我想要的输出(1,000 行对应于 1,000 个作者和 21 个变量:列表中前 3 个元素中的每一个的 7 个变量或“键”):

     Author     seed_1     GlobalSegmentId_1 ... seed_2     GlobalSegmentId_2 .... seed_3 ... globalregion_3      
     45866207  24868793    26895                 76611584     17899    .....
     54502344  255420441    ....   .....
      ....    ....

标签: pythonjsonloops

解决方案


推荐阅读