首页 > 解决方案 > 根据条件将行值替换为来自同一 df 的其他行值

问题描述

我有以下数据集:

df = pd.DataFrame( {'user': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 2}, 
    'date': {0: '1995-09-01', 1: '1995-09-02', 2: '1995-10-03', 3: '1995-10-04', 4: '1995-10-05', 5: '1995-11-07', 6: '1995-11-08'}, 
    'x': {0: '1995-09-02', 1: '1995-09-02', 2: '1995-09-02', 3: '1995-10-05', 4: '1995-10-05', 5: '1995-10-05', 6: '1995-10-05'}, 
    'y': {0: '1995-10-03', 1: '1995-10-03', 2: '1995-10-03', 3: '1995-11-08', 4: '1995-11-08', 5: '1995-11-08', 6: '1995-11-08'}, 
    'c1': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'c2': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'c3': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'VTX1': {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0}, 
    'VTY1': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0}} )

这给了我:

    user    date         x           y     c1   c2 c3 VTX1 VTY1
0   1   1995-09-01  1995-09-02  1995-10-03  1   1   1   1   0
1   1   1995-09-02  1995-09-02  1995-10-03  0   0   0   0   1
2   1   1995-10-03  1995-09-02  1995-10-03  0   0   0   0   0
3   2   1995-10-04  1995-10-05  1995-11-08  2   2   2   1   0
4   2   1995-10-05  1995-10-05  1995-11-08  0   0   0   0   0
5   2   1995-11-07  1995-10-05  1995-11-08  9   9   9   0   1
6   2   1995-11-08  1995-10-05  1995-11-08  0   0   0   0   0

我想替换 df['c1'] 如下。

- When df[‘date’]=df[‘x’], 
       change df[‘c1’] for the df[‘c1’] value when df[‘VTX1’]=1
    

在此示例中,对于用户 1,当 df['date']=df['x'] 恰好位于索引 1 上。这里我们希望 df['c1'] 为 1。请注意,1 是当 df['VTX1'] = 1 时,用户 1 使用 df['c1']。

所以最终的结果是:

   user    date          x         y       c1   c2 c3  VTX1 VTY1
0   1   1995-09-01  1995-09-02  1995-10-03  1   1   1   1   0
1   1   1995-09-02  1995-09-02  1995-10-03  0   0   0   0   1
2   1   1995-10-03  1995-09-02  1995-10-03  0   0   0   0   0
3   2   1995-10-04  1995-10-05  1995-11-08  2   2   2   1   0
4   2   1995-10-05  1995-10-05  1995-11-08  2   0   0   0   0
5   2   1995-11-07  1995-10-05  1995-11-08  9   9   9   0   1
6   2   1995-11-08  1995-10-05  1995-11-08  0   0   0   0   0

标签: pythonpandasreplaceconditional-statementspandas-groupby

解决方案


对于每个唯一用户,选择该列VTX1具有 value的行1,这可以通过将索引设置为user并使用query来选择所需的行来完成。然后where中mask的值相等并使用映射系列替换掩码值c1datexd

d = df.set_index('user').query('VTX1 == 1')['c1']
df['c1'] = df['c1'].mask(df['date'].eq(df['x']), df['user'].map(d))

   user        date           x           y c1 c2 c3  VTX1  VTY1
0     1  1995-09-01  1995-09-02  1995-10-03  1  1  1     1     0
1     1  1995-09-02  1995-09-02  1995-10-03  1  0  0     0     1
2     1  1995-10-03  1995-09-02  1995-10-03  0  0  0     0     0
3     2  1995-10-04  1995-10-05  1995-11-08  2  2  2     1     0
4     2  1995-10-05  1995-10-05  1995-11-08  2  0  0     0     0
5     2  1995-11-07  1995-10-05  1995-11-08  9  9  9     0     1
6     2  1995-11-08  1995-10-05  1995-11-08  0  0  0     0     0

推荐阅读