首页 > 解决方案 > 无法在数据帧的特定索引上更新具有不同值的特定行?

问题描述

尝试根据用户组和最后一次出现的标志 == 1 计算时间差:

df = pd.DataFrame({'user': ['x','x','x','x','x','y','y','y','y','y'],
                   'Flag': [0,0,1,0,1,0,1,0,1,0],
                   'time': [10, 34, 40, 43, 44, 12, 20, 46, 51, 71]})

我正在计算与最后一个的差异

(df.assign(mask=df['Flag'].eq(1),
           group=lambda d: d.groupby('user')['mask'].cumsum(),
           # diff from last 1
           diff=lambda d: d.groupby(['user', 'group'])['time'].apply(lambda g: g -(g.iloc[0] if g.name[1]>0 else float('nan'))),
           )
 
   # mask 1s with their own diff
   .assign(diff= lambda d: d['diff'].mask(d['mask'],
                                          (d[d['mask'].groupby(d['user']).cumsum().eq(0)|d['mask']].groupby('user')['time'].diff())
                                         )
          )
)

出于某种原因,我在索引 2 和 6 上没有得到 NaN 值(如我所料):

    user    Flag    time    mask    group   diff
0   x       0       10      False   0       NaN
1   x       0       34      False   0       NaN
2   x       1       40      True    1       6.0
3   x       0       43      False   1       3.0
4   x       1       44      True    2       4.0
5   y       0       12      False   0       NaN
6   y       1       20      True    1       8.0
7   y       0       46      False   1       26.0
8   y       1       51      True    2       31.0
9   y       0       71      False   2       20.0

尝试df.loc[[2,6],'diff'] = 1000000000 但得到:

 user   Flag    time    diff
0   x       0       10      NaN
1   x       0       34      NaN
2   x       1       40      100000000.0
3   x       0       43      NaN
4   x       1       44      NaN
5   y       0       12      NaN
6   y       1       20      100000000.0
7   y       0       46      NaN
8   y       1       51      NaN
9   y       0       71      NaN

标签: python-3.xpandasdataframe

解决方案


IIUC,这个想法是一旦你用 nan 替换时间,where标志不是 1,那么你可以groupby用户并结合shiftffill报告以下行中最后一个 1 的值。然后将其减去原始时间列

df['diff'] = (
    df['time'] 
    - df['time'].where(df['Flag'].astype(bool))
        .groupby(df['user'])
        .apply(lambda x: x.shift().ffill())
)
print(df)
  user  Flag  time  diff
0    x     0    10   NaN
1    x     0    34   NaN
2    x     1    40   NaN
3    x     0    43   3.0
4    x     1    44   4.0
5    y     0    12   NaN
6    y     1    20   NaN
7    y     0    46  26.0
8    y     1    51  31.0
9    y     0    71  20.0

推荐阅读