首页 > 解决方案 > Pandas change row to column and extract. Tried pivot but see Index contains duplicate entries, cannot reshape

问题描述

I have the below df:

name values id
page_fans {'value': 111, 'end_time': '2021-09-13T07:00:00+0000'} 247111
page_fans {'value': 233, 'end_time': '2021-09-14T07:00:00+0000'} 247111
page_fans {'value': 551, 'end_time': '2021-09-15T07:00:00+0000'} 247111

but I'm trying to do this:

page_fans end_time id
111 '2021-09-13T07:00:00+0000'} 247111
233 '2021-09-14T07:00:00+0000'} 247111
551 '2021-09-15T07:00:00+0000'} 247111

The below is what I've done so far to create / clean up the df:

row =  {'name': 'page_fans', 'period': 'day', 'values': [{'value': 111, 'end_time': '2021-09-13T07:00:00+0000'}, {'value': 233, 'end_time': '2021-09-14T07:00:00+0000'}, {'value': 551, 'end_time': '2021-09-15T07:00:00+0000'}], 'title': 'Lifetime Total Likes', 'description': 'Lifetime: The total number of people who have liked your Page. (Unique Users)', 'id': '247111/insights/page_fans/day'}

pat_id = r'(\d+)'
df = pd.io.json.json_normalize(row)
f['id'] = (df['id'].astype(str).str.extract(pat_id))
df = df.explode('values')

I've tried to use df.transpose() and

pivoted = df.pivot(columns='name').reset_index() but I see the error:

Index contains duplicate entries, cannot reshape

标签: pythonpandas

解决方案


You can convert all dict as a DataFrame with pd.Series (O_o):

out = df['values'].apply(pd.Series) \
                  .join(df['id']) \
                  .rename(columns={'value': 'page_fans'})

Output:

>>> out
   page_fans                  end_time      id
0        111  2021-09-13T07:00:00+0000  247111
1        233  2021-09-14T07:00:00+0000  247111
2        551  2021-09-15T07:00:00+0000  247111

With your other sample, try the code below and select what you want to keep from your original dataframe:

df = pd.DataFrame(row)
df['id'] = df['id'].str.extract(r'^(\d+)')

Output:

>>> df['values'].apply(pd.Series).join(df)

   value                  end_time       name  ...                 title                                        description      id
0    111  2021-09-13T07:00:00+0000  page_fans  ...  Lifetime Total Likes  Lifetime: The total number of people who have ...  247111
1    233  2021-09-14T07:00:00+0000  page_fans  ...  Lifetime Total Likes  Lifetime: The total number of people who have ...  247111
2    551  2021-09-15T07:00:00+0000  page_fans  ...  Lifetime Total Likes  Lifetime: The total number of people who have ...  247111

[3 rows x 8 columns]

推荐阅读