python - Pandas change row to column and extract. Tried pivot but see Index contains duplicate entries, cannot reshape
问题描述
I have the below df:
name | values | id |
---|---|---|
page_fans | {'value': 111, 'end_time': '2021-09-13T07:00:00+0000'} | 247111 |
page_fans | {'value': 233, 'end_time': '2021-09-14T07:00:00+0000'} | 247111 |
page_fans | {'value': 551, 'end_time': '2021-09-15T07:00:00+0000'} | 247111 |
but I'm trying to do this:
page_fans | end_time | id |
---|---|---|
111 | '2021-09-13T07:00:00+0000'} | 247111 |
233 | '2021-09-14T07:00:00+0000'} | 247111 |
551 | '2021-09-15T07:00:00+0000'} | 247111 |
The below is what I've done so far to create / clean up the df:
row = {'name': 'page_fans', 'period': 'day', 'values': [{'value': 111, 'end_time': '2021-09-13T07:00:00+0000'}, {'value': 233, 'end_time': '2021-09-14T07:00:00+0000'}, {'value': 551, 'end_time': '2021-09-15T07:00:00+0000'}], 'title': 'Lifetime Total Likes', 'description': 'Lifetime: The total number of people who have liked your Page. (Unique Users)', 'id': '247111/insights/page_fans/day'}
pat_id = r'(\d+)'
df = pd.io.json.json_normalize(row)
f['id'] = (df['id'].astype(str).str.extract(pat_id))
df = df.explode('values')
I've tried to use df.transpose() and
pivoted = df.pivot(columns='name').reset_index()
but I see the error:
Index contains duplicate entries, cannot reshape
解决方案
You can convert all dict as a DataFrame with pd.Series
(O_o):
out = df['values'].apply(pd.Series) \
.join(df['id']) \
.rename(columns={'value': 'page_fans'})
Output:
>>> out
page_fans end_time id
0 111 2021-09-13T07:00:00+0000 247111
1 233 2021-09-14T07:00:00+0000 247111
2 551 2021-09-15T07:00:00+0000 247111
With your other sample, try the code below and select what you want to keep from your original dataframe:
df = pd.DataFrame(row)
df['id'] = df['id'].str.extract(r'^(\d+)')
Output:
>>> df['values'].apply(pd.Series).join(df)
value end_time name ... title description id
0 111 2021-09-13T07:00:00+0000 page_fans ... Lifetime Total Likes Lifetime: The total number of people who have ... 247111
1 233 2021-09-14T07:00:00+0000 page_fans ... Lifetime Total Likes Lifetime: The total number of people who have ... 247111
2 551 2021-09-15T07:00:00+0000 page_fans ... Lifetime Total Likes Lifetime: The total number of people who have ... 247111
[3 rows x 8 columns]
推荐阅读
- scala - 如何管理函数式编程中的状态层次结构?
- http-headers - `content-security-policy: default-src https:;` 以内联 1:1 阻止资源加载
- reactjs - 在浏览器中获取 refresh_token 时遇到问题。无法读取 XML 响应
- c# - 将 IntPtr 作为函数的参数传递会导致内存泄漏吗?
- racket - 如何“要求”没有#lang 标题行的 Racket 模块?
- windbg - 是否可以在 windbg 中调用 win32 调用?
- swift - 以自定义尺寸阅读图库视频
- mysql - 无法使用容器密码使用 mysql 不起作用
- mongodb - Robo3T 不显示数据库
- javascript - 将多个textarea的值添加到另一个textarea提交