首页 > 解决方案 > 具有因子重置的 cumsum 的 Dpylr 解决方案

问题描述

我需要一个创建 cumsum 列的 dpylr 解决方案。

# Input dataframe
df <- data.frame(OilChanged = c("No","No","Yes","No","No","No","No","No","No","No","No","Yes","No"),
Odometer = c(300,350,410,420,430,450,500,600,600,600,650,660,700))

# Create difference column - first row starting with zero
df <- df %>% dplyr::mutate(Odometer_delta = Odometer - lag(Odometer, default = Odometer[1]))

我正在尝试根据累积和的因子列设置重置条件。结果需要完全像这样。

# Wanted result dataframe
df <- data.frame(OilChanged = c("No","No","Yes","No","No","No","No","No","No","No","No","Yes","No"),
                   Odometer = c(300,350,410,420,430,450,500,600,600,600,650,660,700),
                   Diff = c(0,50,60,10,10,20,50,100,0,0,50,10,40),
                   CumSum = c(0,50,110,10,20,40,90,190,190,190,240,250,40))

标签: rdplyrcumsum

解决方案


您可以每次创建一个新组OilChanged == 'Yes'并在每个组cumsum中获取价值。Diff

library(dplyr)

df %>%
  group_by(grp = lag(cumsum(OilChanged == 'Yes'), default = 0)) %>%
  mutate(newcumsum = cumsum(Diff)) %>%
  ungroup %>%
  select(-grp)


#  OilChanged Odometer  Diff CumSum newcumsum
#   <chr>         <dbl> <dbl>  <dbl>     <dbl>
# 1 No              300     0      0         0
# 2 No              350    50     50        50
# 3 Yes             410    60    110       110
# 4 No              420    10     10        10
# 5 No              430    10     20        20
# 6 No              450    20     40        40
# 7 No              500    50     90        90
# 8 No              600   100    190       190
# 9 No              600     0    190       190
#10 No              600     0    190       190
#11 No              650    50    240       240
#12 Yes             660    10    250       250
#13 No              700    40     40        40

推荐阅读