首页 > 解决方案 > case_when 对不同行求和的条件

问题描述

我想根据某些作物的种植年数在不同的地点(SiteID)定义不同的作物序列(CS)。

Crop = data.frame(SiteID=c('A','A','A','B','C','C','D','D'),
                Crop = c('soya','corn','wheat','corn','corn','soya','soya','wheat'),
                Years = c(2,2,1,5,3,2,2,3))

到目前为止,我将 case_when 用于单个 Crops 和 Years 条件,但我想为不同的 Crops 累积 Years,例如这两个最后的条件。

Crop  %>%
#  group_by(SiteID)
  mutate(CS = case_when(
             Crop =="corn" &  Years == 5    ~ "CoMo",
             Crop =="wheat" &  Years >= 3     ~ "Whea",
             (Crop =="corn" | Crop =="soya") &  sum(Years) == 5    ~ "CoSo",
#             Years[Crop =="corn"] + Years[Crop =="soya"]  == 5    ~ "CoSo",

       ))

中间结果如下所示:

# A tibble: 8 x 4
  SiteID Crop  Years CS   
  <chr>  <chr> <dbl> <chr>
1 A      soya      2 NA   
2 A      corn      2 NA   
3 A      wheat     1 NA   
4 B      corn      5 CoMo 
5 C      corn      3 CoSo   
6 C      soya      2 Coso   
7 D      soya      2 Whea   
8 D      wheat     3 Whea 

最后 CS 将由 SiteID 总结:

# A tibble: 4 x 2
  SiteID SC   
  <chr>  <chr>
1 A      NA   
2 B      CoMo 
3 C      CoSo 
4 D      Whea 

谢谢!

标签: rdplyr

解决方案


这是一个尝试解释

library(dplyr)

Crop = data.frame(SiteID=c('A','A','A','B','C','C','D','D'),
  Crop = c('soya','corn','wheat','corn','corn','soya','soya','wheat'),
  Years = c(2,2,1,5,3,2,2,3))

Site_crop <- Crop  %>%
  group_by(SiteID) %>%
  # Note that case_when will priority order match so the first match will be
  # the value. Therefore you also want to check if your condition is exclusive
  # or if they somehow overlap then you would need to priority which one first
  mutate(CS = case_when(
    # using any here to cover all record of a SiteID
    any(Crop =="corn" &  Years == 5)    ~ "CoMo",
    any(Crop =="wheat" &  Years >= 3)     ~ "Whea",
    # For this one I use length intersect to ensure that
    # Crop have both "corn" & "soya"
    length(intersect(unique(Crop), c("corn", "soya"))) == 2 &
      sum(Years[Crop %in% c("corn", "soya")]) == 5 ~ "CoSo",
    # Then finally if no match of any condition is NA
    TRUE ~ NA_character_
  ))

这是之后的数据case_when

Site_crop
#> # A tibble: 8 x 4
#> # Groups:   SiteID [4]
#>   SiteID Crop  Years CS   
#>   <chr>  <chr> <dbl> <chr>
#> 1 A      soya      2 <NA> 
#> 2 A      corn      2 <NA> 
#> 3 A      wheat     1 <NA> 
#> 4 B      corn      5 CoMo 
#> 5 C      corn      3 CoSo 
#> 6 C      soya      2 CoSo 
#> 7 D      soya      2 Whea 
#> 8 D      wheat     3 Whea

CS每个的最终输出SiteID

Site_crop %>%
  group_by(SiteID) %>%
  summarize(CS = first(CS))
#> # A tibble: 4 x 2
#>   SiteID CS   
#>   <chr>  <chr>
#> 1 A      <NA> 
#> 2 B      CoMo 
#> 3 C      CoSo 
#> 4 D      Whea

reprex 包于 2021-04-16 创建 (v2.0.0 )


推荐阅读