首页 > 解决方案 > Python 将 JSON 标准化为 DataFrame

问题描述

一段时间以来,我一直在尝试规范化这个 JSON 数据,但我陷入了一个非常基本的步骤。我想答案可能很简单。我会接受提供的任何帮助。

import json
import urllib.request
import pandas as pd

url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())
    #data = json.dumps(data, indent=4)

df = pd.json_normalize(data = data['campsites'], record_path= 'availabilities', meta = 'campsites')
print(df)

我的预期 df 结果如下:

预期的数据帧输出: 在此处输入图像描述

标签: pythonjsondataframenormalize

解决方案


一种方法(不使用pd.json_normalize)是遍历唯一露营地的列表,并将每个露营地的数据转换为 DataFrame。然后可以使用 连接特定于露营地的 DataFrame 列表pd.concat

具体来说:

## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]

## function that returns a DataFrame for each campsite,
## renaming the index to 'date'
def campsite_to_df(data, campsite):
  out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
  out_df = out_df.rename({'index': 'date'}, axis = 1)
  return out_df

## generate a list of DataFrames, one per campsite
df_list = [campsite_to_df(data, cs) for cs in unique_campsites]

## concatenate the list of DataFrames into a single DataFrame,
## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],
                              ascending = True)

## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities',
                   'max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'}, axis = 1)

推荐阅读