r - case_when 对不同行求和的条件
问题描述
我想根据某些作物的种植年数在不同的地点(SiteID)定义不同的作物序列(CS)。
Crop = data.frame(SiteID=c('A','A','A','B','C','C','D','D'),
Crop = c('soya','corn','wheat','corn','corn','soya','soya','wheat'),
Years = c(2,2,1,5,3,2,2,3))
到目前为止,我将 case_when 用于单个 Crops 和 Years 条件,但我想为不同的 Crops 累积 Years,例如这两个最后的条件。
Crop %>%
# group_by(SiteID)
mutate(CS = case_when(
Crop =="corn" & Years == 5 ~ "CoMo",
Crop =="wheat" & Years >= 3 ~ "Whea",
(Crop =="corn" | Crop =="soya") & sum(Years) == 5 ~ "CoSo",
# Years[Crop =="corn"] + Years[Crop =="soya"] == 5 ~ "CoSo",
))
中间结果如下所示:
# A tibble: 8 x 4
SiteID Crop Years CS
<chr> <chr> <dbl> <chr>
1 A soya 2 NA
2 A corn 2 NA
3 A wheat 1 NA
4 B corn 5 CoMo
5 C corn 3 CoSo
6 C soya 2 Coso
7 D soya 2 Whea
8 D wheat 3 Whea
最后 CS 将由 SiteID 总结:
# A tibble: 4 x 2
SiteID SC
<chr> <chr>
1 A NA
2 B CoMo
3 C CoSo
4 D Whea
谢谢!
解决方案
这是一个尝试解释
library(dplyr)
Crop = data.frame(SiteID=c('A','A','A','B','C','C','D','D'),
Crop = c('soya','corn','wheat','corn','corn','soya','soya','wheat'),
Years = c(2,2,1,5,3,2,2,3))
Site_crop <- Crop %>%
group_by(SiteID) %>%
# Note that case_when will priority order match so the first match will be
# the value. Therefore you also want to check if your condition is exclusive
# or if they somehow overlap then you would need to priority which one first
mutate(CS = case_when(
# using any here to cover all record of a SiteID
any(Crop =="corn" & Years == 5) ~ "CoMo",
any(Crop =="wheat" & Years >= 3) ~ "Whea",
# For this one I use length intersect to ensure that
# Crop have both "corn" & "soya"
length(intersect(unique(Crop), c("corn", "soya"))) == 2 &
sum(Years[Crop %in% c("corn", "soya")]) == 5 ~ "CoSo",
# Then finally if no match of any condition is NA
TRUE ~ NA_character_
))
这是之后的数据case_when
Site_crop
#> # A tibble: 8 x 4
#> # Groups: SiteID [4]
#> SiteID Crop Years CS
#> <chr> <chr> <dbl> <chr>
#> 1 A soya 2 <NA>
#> 2 A corn 2 <NA>
#> 3 A wheat 1 <NA>
#> 4 B corn 5 CoMo
#> 5 C corn 3 CoSo
#> 6 C soya 2 CoSo
#> 7 D soya 2 Whea
#> 8 D wheat 3 Whea
CS
每个的最终输出SiteID
Site_crop %>%
group_by(SiteID) %>%
summarize(CS = first(CS))
#> # A tibble: 4 x 2
#> SiteID CS
#> <chr> <chr>
#> 1 A <NA>
#> 2 B CoMo
#> 3 C CoSo
#> 4 D Whea
由reprex 包于 2021-04-16 创建 (v2.0.0 )
推荐阅读
- c++ - 创建最多 10^12 的数组
- javascript - 在 JS 数组中创建索引
- node.js - Node.js Mongodb GraphQL - 突变和查询
- arrays - 使用 PowerShell 将 CSV 导入可变大小的二维数组
- node.js - npm-debug 模块:如何以编程方式启用和禁用调试目标?
- discord.js - console.log 消息所有者的显示名称时出错
- r - setDefaultClusterOptions 中的错误(type = .sfOption$type):找不到函数“setDefaultClusterOptions”
- python - python中带有变量的多项式的双积分
- swift - 我收到错误消息 @escaping 属性仅适用于函数类型,即使它以前有效
- typescript - 如何在忽略所有错误的情况下编译 TypeScript 项目?