python - 重塑 pandas 数据框,计算链接到 2 列
问题描述
从这个数据框中,我想计算团队级别的不同统计数据
data = [['20-10-2020', 'PSG', 'Man U', 1, 2], ['20-10-2020', 'Leipzig','Istanbul',2,0], ['27-10-2020', 'Istanbul','PSG',0,2], ['27-10-2020', 'Man U','Leipzig',5,0]]
df = pd.DataFrame(data, columns = ['Date', 'Home', 'Away', 'HG', 'AG'])
print(df)
Date Home Away HG AG
0 20-10-2020 PSG Man U 1 2
1 20-10-2020 Leipzig Istanbul 2 0
2 27-10-2020 Istanbul PSG 0 2
3 27-10-2020 Man U Leipzig 5 0
例如,对于每支球队,我计算上一场比赛的得分和进球数。简单的实现创建了两个数据帧,一个用于主队,一个用于客队并将它们连接起来。我尝试使用melt
但我没有找到实现我想要的数据框的语法。
df_home = df.reset_index(level=0)
columns = {
"Date": 'date',
"Home": "team",
"Away": "opponent",
'HG': 'team_goals',
'AG': 'opponent_goals',
}
df_home = df_home.rename(columns=columns)
df_home['site'] = 'H'
df_away = df.reset_index(level=0)
columns = {
"Date": 'date',
"Home": "opponent",
"Away": "team",
'HG': 'opponent_goals',
'AG': 'team_goals',
}
df_away = df_away.rename(columns=columns)
df_away['site'] = 'A'
df_team = pd.concat([df_home, df_away], ignore_index=True).sort_values(['date'])
df_team['team'] = df_team['team'].astype('category')
df_team['opponent'] = df_team['opponent'].astype('category')
print(df_team)
index date team opponent team_goals opponent_goals site
0 0 20-10-2020 PSG Man U 1 2 H
1 1 20-10-2020 Leipzig Istanbul 2 0 H
4 0 20-10-2020 Man U PSG 2 1 A
5 1 20-10-2020 Istanbul Leipzig 0 2 A
2 2 27-10-2020 Istanbul PSG 0 2 H
3 3 27-10-2020 Man U Leipzig 5 0 H
6 2 27-10-2020 PSG Istanbul 2 0 A
7 3 27-10-2020 Leipzig Man U 0 5 A
使用此数据框,我可以根据team
列计算统计信息
conditions = [df_team['team_goals'] > df_team['opponent_goals'], df_team['team_goals'] == df_team['opponent_goals']]
choices = [3, 1]
df_team['pts'] = np.select(conditions, choices, default=0)
f = lambda x: x.shift(1).rolling(1).sum()
df_team['form_l1_before'] = df_team.groupby(['team'])['pts'].apply(f)
df_team['goal_l1_before'] = df_team.groupby(['team'])['team_goals'].apply(f)
print(df_team)
index date team opponent team_goals opponent_goals site \
0 0 20-10-2020 PSG Man U 1 2 H
1 1 20-10-2020 Leipzig Istanbul 2 0 H
4 0 20-10-2020 Man U PSG 2 1 A
5 1 20-10-2020 Istanbul Leipzig 0 2 A
2 2 27-10-2020 Istanbul PSG 0 2 H
3 3 27-10-2020 Man U Leipzig 5 0 H
6 2 27-10-2020 PSG Istanbul 2 0 A
7 3 27-10-2020 Leipzig Man U 0 5 A
pts form_l1_before goal_l1_before
0 0 NaN NaN
1 3 NaN NaN
4 3 NaN NaN
5 0 NaN NaN
2 0 0.0 0.0
3 3 3.0 2.0
6 3 0.0 1.0
7 0 3.0 2.0
问题是我想用每场比赛的一行(由index
列标识)将该数据帧转换回来,并且每个统计数据都有自己的列
# Ex second game for Istanbul and PSG with stats from the previous game
expected_data = [['27-10-2020', 'Istanbul','PSG',0,2,0,0,0,1]]
df_target = pd.DataFrame(expected_data, columns = ['date', 'Home', 'Away', 'HG', 'AG', 'Home_form_l1_before', 'Home_goal_l1_before', 'Away_form_l1_before', 'Away_goal_l1_before'])
print(df_target)
date Home Away HG AG Home_form_l1_before \
0 27-10-2020 Istanbul PSG 0 2 0
Home_goal_l1_before Away_form_l1_before Away_goal_l1_before
0 0 0 1
解决方案
这是一种方法。我们可以df_team
使用site
标志重新塑造,然后对所有信息采取观点H
(家庭),除了家庭和客场都需要的信息(ha_fields
)。后者为两个站点保留,并加入到家庭数据中。
ha_fields = ["form_l1_before", "goal_l1_before"]
unstacked_team = df_team.set_index(["index", "site", "date"]).unstack("site")
ha_df = unstacked_team[ha_fields]
ha_df.columns = ha_df.columns.to_flat_index().map(lambda t: "_".join([t[1], t[0]]))
df_final = (
unstacked_team.swaplevel(axis=1)["H"]
.drop(ha_fields, axis=1)
.join(ha_df)
.reset_index("date")
)
print(df_final)
date team opponent team_goals opponent_goals pts \
index
0 20-10-2020 PSG Man U 1 2 0
1 20-10-2020 Leipzig Istanbul 2 0 3
2 27-10-2020 Istanbul PSG 0 2 0
3 27-10-2020 Man U Leipzig 5 0 3
A_form_l1_before H_form_l1_before A_goal_l1_before H_goal_l1_before
index
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 0.0 0.0 1.0 0.0
3 3.0 3.0 2.0 2.0
推荐阅读
- elasticsearch - 如何在 Windows Server 中使用 ElasticSearch 配置 Jaeger 收集器
- java - log4j 不记录带有 unicode 字符的消息
- java - 从java调用oracle存储过程时在mybatis mapper中映射多个out参数
- electron - 电子 PDF 窗口可以导航到页码吗?
- c - 如何检查在 Windows PC 上的阻塞套接字上下文中调用 recv 之前是否有可用数据读取
- r - 在 PCR 函数 R PLS 包中使用预定义拆分
- php - 如何将关联数组元素按索引分组在一起
- numpy - 问题理解主成分分析代码
- python - 抓取脚本具有完整性属性的 JS 渲染网页
- javascript - 如何在组件中监听 redux 动作流