python - 熊猫内部连接两个数据框并聚合列值
问题描述
我有两个df
s,
df_1
txn creator code y_m count
WP BATCH 16 201908 17
WP BATCH 16 201909 32
FB ID2 06 201905 65
FB ID2 13 201906 77
BA TO 08 201904 99
BA TO 08 201905 76
df_2
txn user code y_m count
WP BATCH 16 201908 10
WP BATCH 16 201909 13
FB ID2 06 201905 23
FB ID2 13 201906 34
HF HUD 01 201904 9
HF HUD 01 201903 8
我想内联df_1
和df_2
,
df_1.merge(df_2, how='inner', left_on=['txn', 'creator', 'code', 'y_m'], right_on=['txn', 'user', 'code', 'y_m'])
count
和上的聚合(总和)df_1
,df_2
同时两个数据帧的特定行也保留在结果中df
;
df
txn creator code y_m count user
WP BATCH 16 201908 27 BATCH
WP BATCH 16 201909 45 BATCH
FB ID2 06 201905 88 ID2
FB ID2 13 201906 111 ID2
BA TO 08 201904 99 NaN
BA TO 08 201905 76 NaN
HF NaN 01 201904 9 HUD
HF NaN 01 201903 8 HUD
解决方案
我认为您需要外部连接,然后DataFrame.pop
用于提取列Series.add
:
df = df_1.merge(df_2, how='outer',
left_on=['txn', 'creator', 'code', 'y_m'],
right_on=['txn', 'user', 'code', 'y_m'])
df['count'] = df.pop('count_x').add(df.pop('count_y'), fill_value=0)
print (df)
txn creator code y_m user count
0 WP BATCH 16 201908 BATCH 27.0
1 WP BATCH 16 201909 BATCH 45.0
2 FB ID2 6 201905 ID2 88.0
3 FB ID2 13 201906 ID2 111.0
4 BA TO 8 201904 NaN 99.0
5 BA TO 8 201905 NaN 76.0
6 HF NaN 1 201904 HUD 9.0
7 HF NaN 1 201903 HUD 8.0
推荐阅读
- ios - iOS 15 破坏了 NSPersistentCloudKitContainer 同步
- spring - How to properly create a new child in oneToMany relationship to avoid managed flush and detached entity errors? [Spring+PostgreSQL]
- node.js - Heroku deployment Parse Error: Expected another key value pait at line 22, colum1
- sql - 正则表达式在 presto SQL 中工作,但在 pyspark expr 中不工作
- sql-server - SQL Server : fix Error 18456, Severity: 14, State: 1. Login failed for user
- javascript - 为什么将项目添加到空白数组中会重复显示项目?
- r - 在r闪亮中访问多个pickerInput值中的值
- html - 电子邮件 HTML 行之间的空间
- intellij-idea - 更改 IntelliJ 边框颜色
- reactjs - 如何在 react-codemirror 编辑器中为 nextjs 中的 javascript 语言启用语法验证