首页 > 解决方案 > 根据另一个df的几个条件设置df的列值

问题描述

我想根据来自另一个数据帧的值将值设置为数据帧

例子:

df1

A   |  B  |  C  |               
100   20.1        
100   21.3
100   22.0
100   23.6
100   24.0
100   25.8

df2

A   |  B  |  D

100   20     AC1
100   22     AC2 
100   23     AC3
100   25     AC4
100   29     AC5
200   20     AC1
200   34     AC2
200   37     AC3

我希望 df1['C'] 有类似的东西

AC1
AC1
AC2
AC3
AC3
AC4

IEdf1['C'] = df2['D'].where((df2['A'] == df1['A']) & (df2['B'] < df1['B']))

标签: pythonpandasdataframe

解决方案


您可以pd.mergeffill填充缺失值:

df1['C'] = pd.merge(df1, df2, how='left', on = ['A', 'B']).fillna(method='ffill')['D']

输出

+---+-----+----+-----+
|   |  A  | B  |  C  |
+---+-----+----+-----+
| 0 | 100 | 20 | AC1 |
| 1 | 100 | 21 | AC1 |
| 2 | 100 | 22 | AC2 |
| 3 | 100 | 23 | AC3 |
| 4 | 100 | 24 | AC3 |
| 5 | 100 | 25 | AC4 |
+---+-----+----+-----+

编辑:解释

首先我们合并df1df2列:AB

pd.merge(df1, df2, how='left', on = ['A', 'B'])
#output
+---+-----+----+-----+
|   |  A  | B  |  C  |
+---+-----+----+-----+
| 0 | 100 | 20 | AC1 |
| 1 | 100 | 21 | AC1 |
| 2 | 100 | 22 | AC2 |
| 3 | 100 | 23 | AC3 |
| 4 | 100 | 24 | AC3 |
| 5 | 100 | 25 | AC4 |
+---+-----+----+-----+

为了填补缺失值,我们利用了该ffill方法(参见文档):

method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None 用于填充重新索引系列 pad/ffill 中的孔的方法:将最后一个有效观察值向前传播到下一个有效回填/bfill:使用NEXT 有效观察填补空白

pd.merge(df1, df2, how='left', on = ['A', 'B']).fillna(method='ffill')
#output : missing values are filled as expected 
+---+-----+----+------+-----+
|   |  A  | B  |  C   |  D  |
+---+-----+----+------+-----+
| 0 | 100 | 20 | None | AC1 |
| 1 | 100 | 21 | None | AC1 |
| 2 | 100 | 22 | None | AC2 |
| 3 | 100 | 23 | None | AC3 |
| 4 | 100 | 24 | None | AC3 |
| 5 | 100 | 25 | None | AC4 |
+---+-----+----+------+-----+

df1['C']只是D合并和填充数据框的列,这就是我们想要的

df1['C'] = pd.merge(df1, df2, how='left', on = ['A', 'B']).fillna(method='ffill')['D']

推荐阅读