首页 > 解决方案 > 计算累积总和,而另一列的值保持不变

问题描述

对于以下内容df,我想计算列的累积总和Inst_Dist并保存为Cumu_Dist,而值WDir_Deg保持不变。当 in 的值WDir_Deg发生变化时,我需要重新启动累计和。

所以,

index | WDir_Deg | Inst_Dist | Cumu_Dist
0     | 289      | 20        | NaN
1     | 285      | 17        | NaN
2     | 285      | 19        | NaN
3     | 287      | 19        | NaN
4     | 289      | 10        | NaN

变成

index | WDir_Deg | Inst_Dist | Cumu_Dist
0     | 289      | 20        | 20
1     | 285      | 17        | 17
2     | 285      | 19        | 36
3     | 287      | 19        | 19
4     | 289      | 10        | 10

下面给出了我的非惯用(极慢)Python 代码。如果有人能指导我如何使代码更快、更惯用,我将不胜感激。

prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
    curr_angle = df.loc[curr_ind, 'WDir_Deg']
    if prev_angle == curr_angle:
        curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
        df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
    else:
        prev_angle = curr_angle
        curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
        df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist

标签: pandascumulative-sum

解决方案


将助手Series与比较WDir_Deg列一起用于不等于neshift以及cumsum连续组并将其传递给DataFrameGroupBy.cumsum

s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
   WDir_Deg  Inst_Dist  Cumu_Dist
0       289         20         20
1       285         17         17
2       285         19         36
3       287         19         19
4       289         10         10

详情

print (s)
0    1
1    2
2    2
3    3
4    4
Name: WDir_Deg, dtype: int32

推荐阅读