首页 > 解决方案 > 熊猫内部连接两个数据框并聚合列值

问题描述

我有两个dfs,

df_1
txn    creator    code    y_m        count
WP     BATCH      16      201908     17
WP     BATCH      16      201909     32
FB     ID2        06      201905     65
FB     ID2        13      201906     77
BA     TO         08      201904     99
BA     TO         08      201905     76

df_2
txn    user       code    y_m        count
WP     BATCH      16      201908     10
WP     BATCH      16      201909     13
FB     ID2        06      201905     23
FB     ID2        13      201906     34
HF     HUD        01      201904     9
HF     HUD        01      201903     8

我想内联df_1df_2

df_1.merge(df_2, how='inner', left_on=['txn', 'creator', 'code', 'y_m'], right_on=['txn', 'user', 'code', 'y_m'])

count和上的聚合(总和)df_1df_2同时两个数据帧的特定行也保留在结果中df

df
txn    creator    code    y_m        count    user
WP     BATCH      16      201908     27       BATCH
WP     BATCH      16      201909     45       BATCH
FB     ID2        06      201905     88       ID2
FB     ID2        13      201906     111      ID2
BA     TO         08      201904     99       NaN
BA     TO         08      201905     76       NaN
HF     NaN        01      201904     9        HUD 
HF     NaN        01      201903     8        HUD

标签: pythonpython-3.xpandasdataframemerge

解决方案


我认为您需要外部连接,然后DataFrame.pop用于提取列Series.add

df = df_1.merge(df_2, how='outer', 
                      left_on=['txn', 'creator', 'code', 'y_m'], 
                      right_on=['txn', 'user', 'code', 'y_m'])
df['count'] = df.pop('count_x').add(df.pop('count_y'), fill_value=0)
print (df)
  txn creator  code     y_m   user  count
0  WP   BATCH    16  201908  BATCH   27.0
1  WP   BATCH    16  201909  BATCH   45.0
2  FB     ID2     6  201905    ID2   88.0
3  FB     ID2    13  201906    ID2  111.0
4  BA      TO     8  201904    NaN   99.0
5  BA      TO     8  201905    NaN   76.0
6  HF     NaN     1  201904    HUD    9.0
7  HF     NaN     1  201903    HUD    8.0

推荐阅读