python - Pandas JSON 规范化并合并回表格
问题描述
我导入了 JSON 文件并将其规范化到这一点。
Team Goals
0 A {'Time': '06:09', 'P': [{'Nr': 1}, {'Nr': 5}], 'Nr': 2, 'Scored': 'Game'}
0 A {'Time': '11:07', 'P': [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}], 'Nr': 7, 'Scored': 'Game'}
1 B {'Time': '20:04', 'P': [{'Nr': 9}, {'Nr': 2}], 'Nr': 4, 'Scored': 'Game'}
1 B {'Time': '35:38', 'P': [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}], 'Nr': 4, 'Scored': 'Game'}
1 B {'Time': '63:21', 'P': {'Nr': 4}, 'Nr': 9, 'Scored': 'Penalty'}
这就是我想要实现的目标:
Team Goals.Time Goals.P1 Goals.P2 Goals.P3 Goals.Nr Goals.Scored
0 A '06:09' 1 5 NA 2 'Game'
0 A '11:07' 2 3 1 7 'Game'
1 B '20:04' 9 2 NA 4 'Game'
1 B '35:38' 3 8 4 4 'Game'
1 B '63:21' 4 NA NA 9 'Penalty'
我试过 pandas json_normalize,但它省略了 Team 列。
import pandas as pd
pd.json_normalize(df['Goals'])
Time P Nr Scored P.Nr
0 06:09 [{'Nr': 1}, {'Nr': 5}] 2 'Game' NaN
1 11:07 [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}] 7 'Game' NaN
2 20:04 [{'Nr': 9}, {'Nr': 2}] 4 'Game' NaN
3 35:38 [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}] 4 'Game' NaN
4 63:21 NaN 9 'Game' 34.0
有没有办法规范化并保留团队列? 之后不可能合并它们,因为索引不同并且它们没有公共列
df['Team']
0 A
0 A
1 B
1 B
1 B
解决方案
# save df as [dict, ... ,dict] format
data = df.to_dict(orient='record')
dfn = pd.json_normalize(data)
print(dfn)
# Team Goals.Time Goals.P Goals.Nr Goals.Scored \
# 0 A 06:09 [{'Nr': 1}, {'Nr': 5}] 2 Game
# 1 A 11:07 [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}] 7 Game
# 2 B 20:04 [{'Nr': 9}, {'Nr': 2}] 4 Game
# 3 B 35:38 [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}] 4 Game
# 4 B 63:21 NaN 9 Penalty
# Goals.P.Nr
# 0 NaN
# 1 NaN
# 2 NaN
# 3 NaN
# 4 4.0
# handle Goals.P is null
cond = dfn['Goals.P'].isnull()
dfn.loc[cond, 'Goals.P'] = dfn.loc[cond, 'Goals.P.Nr'].map(lambda x: [{'Nr':x}])
print(dfn)
# Team Goals.Time Goals.P Goals.Nr Goals.Scored \
# 0 A 06:09 [{'Nr': 1}, {'Nr': 5}] 2 Game
# 1 A 11:07 [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}] 7 Game
# 2 B 20:04 [{'Nr': 9}, {'Nr': 2}] 4 Game
# 3 B 35:38 [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}] 4 Game
# 4 B 63:21 [{'Nr': 4.0}] 9 Penalty
# Goals.P.Nr
# 0 NaN
# 1 NaN
# 2 NaN
# 3 NaN
# 4 4.0
def expand_col(Ser, prefix=''):
"""pandas expand list column
"""
obj = Ser.explode().reset_index()
obj['cnt'] = 1
obj['tag'] = obj.groupby('index')['cnt'].cumsum()
df_obj = obj.set_index(['index', 'tag'])['Goals.P'].map(lambda x: x.get('Nr')).unstack()
df_obj.columns.name = None
df_obj.index.name = None
df_obj.columns = prefix + df_obj.columns.astype(str)
df_obj.index = Ser.index
return df_obj
df_obj = expand_col(dfn['Goals.P'], prefix='Goals.P')
print(df_obj)
# Goals.P1 Goals.P2 Goals.P3
# 0 1.0 5.0 NaN
# 1 2.0 3.0 1.0
# 2 9.0 2.0 NaN
# 3 3.0 8.0 4.0
# 4 4.0 NaN NaN
df_result = pd.concat([dfn, df_obj], axis=1)
print(df_result)
# Team Goals.Time Goals.P Goals.Nr Goals.Scored \
# 0 A 06:09 [{'Nr': 1}, {'Nr': 5}] 2 Game
# 1 A 11:07 [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}] 7 Game
# 2 B 20:04 [{'Nr': 9}, {'Nr': 2}] 4 Game
# 3 B 35:38 [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}] 4 Game
# 4 B 63:21 [{'Nr': 4.0}] 9 Penalty
# Goals.P.Nr Goals.P1 Goals.P2 Goals.P3
# 0 NaN 1.0 5.0 NaN
# 1 NaN 2.0 3.0 1.0
# 2 NaN 9.0 2.0 NaN
# 3 NaN 3.0 8.0 4.0
# 4 4.0 4.0 NaN NaN
推荐阅读
- opencv - 如何使用来自独立函数的数据返回 cv::Mat
- java - 如何解决Eclipse中项目的“找不到源”
- python - 如何从 pickle 文件列表中创建多个 Pandas 数据框?
- asp.net - 如何将具有 TextMode="Date" 的 TextBox 重置为默认值?
- java - 大写和小写的 Switch 中的字符串
- symfony - 教义传承众子
- spring-data-jpa - JPQL 按使用连接表的 OneToMany 关系进行过滤
- sql - 提取重复出现的 blob 数据并放入另一个表的最佳方法是什么?- SQL
- javascript - Flexbox - 基于特定列高度的流体网格
- python - Python Pandas 将列值更改为 NULL 并返回其原始值