r - 合并满足条件的唯一 group_by 行并使用 tidyverse 进行汇总
问题描述
我想使用 tidyverse 工具来执行以下操作:
对于下面的数据框,对于(“<em>yr”、“<em>bin”、“<em>range”)的每个唯一行,我想合并“<em>class”具有值的行“ B182E”或“B182G”通过以下操作合并为一行:保持“ yr ”、“ bin ”、“ range ”和“ miles ”的值不变,否则创建一个新变量“ class2 ”,其值变为“B182E_B182G”它保留了 "类" 如果不是“B182E”或“B182G”,而对于变量“<em>new_count”和“<em>old_count”,通过对各自行求和来更新它们的值,例如“<em>new_count”变成“<em第 5 行中的 >new_count” 加上第 6 行中的“<em>new_count” 即 20 + 12 = 32,而“<em>old_count” 成为第 5 行中的“<em>old_count” 加上第 5 行中的“<em>old_count” 的总和第 6 行,即 30 + 14 = 44。
同样,将“<em>class”的值为“K760”或“K758”的行合并为一行,并以与上述类似的方式将新创建的“<em>class2”的值更新为“K760_K758” .
所有与上面要合并的“<em>class”值不匹配的行都保持不变。
以第 5 行和第 6 行为例,它们分别代表“<em>yr”、“<em>bin”和“<em>range”的唯一行,即 2005、2、40-49,所以这两行是按照上述条件合并为一行。
我怎样才能做到这一点?
df <- structure(list(yr = c(2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2006), bin = c("2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
"2", "2", "2", "2", "2"), range = c("40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "40-49", "40-49", "40-49",
"40-49", "40-49", "40-49", "40-49", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59",
"50-59", "50-59", "50-59", "50-59", "50-59", "50-59", "50-59"
), miles = c(528496, 528496, 528496, 528496, 528496, 528496,
528496, 528496, 528496, 528496, 528496, 528496, 528496, 528496,
528496, 528496, 528496, 604761, 604761, 604761, 604761, 604761,
604761, 604761, 604761, 604761, 604761, 604761, 604761, 604761,
604761, 604761, 604761, 604761, 547232, 547232, 547232, 547232,
547232, 547232, 547232, 547232, 547232, 547232, 547232, 547232,
547232, 547232, 547232, 547232, 547232, 613025, 613025, 613025,
613025, 613025, 613025, 613025, 613025, 613025, 613025, 613025,
613025, 613025, 613025, 613025, 613025, 613025, 557216.5, 557216.5,
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5,
557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5, 557216.5,
557216.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5,
604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5, 604720.5,
604720.5, 604720.5, 604720.5, 604720.5, 537808.5, 537808.5, 537808.5,
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5,
537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5, 537808.5,
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5,
597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5, 597005.5,
597005.5, 597005.5, 597005.5), class = c("B181", "B181E", "B181G",
"B182", "B182E", "B182G", "C220", "C221", "C240", "K70", "K703",
"K743", "K746", "K754", "K758", "K760", "K769", "B181", "B181E",
"B181G", "B182", "B182E", "B182G", "C220", "C221", "C240", "K70",
"K703", "K743", "K746", "K754", "K758", "K760", "K769", "B181",
"B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221", "C240",
"K70", "K703", "K743", "K746", "K754", "K758", "K760", "K769",
"B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220", "C221",
"C240", "K70", "K703", "K743", "K746", "K754", "K758", "K760",
"K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G", "C220",
"C221", "C240", "K70", "K703", "K743", "K746", "K754", "K758",
"K760", "K769", "B181", "B181E", "B181G", "B182", "B182E", "B182G",
"C220", "C221", "C240", "K70", "K703", "K743", "K746", "K754",
"K758", "K760", "K769", "B181", "B181E", "B181G", "B182", "B182E",
"B182G", "C220", "C221", "C240", "K70", "K703", "K743", "K746",
"K754", "K758", "K760", "K769", "B181", "B181E", "B181G", "B182",
"B182E", "B182G", "C220", "C221", "C240", "K70", "K703", "K743",
"K746", "K754", "K758", "K760", "K769"), new_count = c(84, 24,
16, 226, 20, 12, NA, NA, 5, 60, 24, 24, 27, 16, NA, 22, 78, 58,
1, 4, 168, 5, 22, 20, 18, 15, 122, 72, 37, 75, 26, 0, 53, 149,
117, 12, 45, 567, NA, 29, 12, 7, NA, 149, 77, NA, 54, NA, NA,
44, 99, 124, 0, 8, 453, NA, 55, 54, 9, 10, 307, 167, 5, 120,
9, NA, 59, 115, 137, NA, 7, 481, NA, 29, 13, 6, NA, 134, 65,
NA, 62, 7, NA, 57, 93, 135, 23, 7, 444, 5, 79, 61, 18, 13, 323,
190, 5, 135, 12, 0, 57, 117, 71, 0, NA, 204, NA, 7, 6, NA, NA,
52, 22, 13, 24, 17, NA, 23, 86, 47, NA, NA, 172, NA, 24, 16,
9, 15, 128, 87, 33, 77, 31, NA, 65, 158), old_count = c(211,
52, 37, 1095, 30, 14, 8, 6, 8, 111, 52, 82, 53, 37, 8, 27, 102,
120, 3, 4, 865, 5, 32, 30, 23, 16, 282, 168, 230, 149, 50, NA,
71, 214, 349, 54, 67, 2102, NA, 39, 15, 9, 8, 263, 147, 6, 102,
11, NA, 65, 123, 293, NA, 9, 1785, 1, 73, 70, 15, 15, 621, 369,
27, 222, 20, NA, 81, 155, 407, NA, 9, 2065, 5, 46, 19, 8, NA,
263, 143, 7, 113, 17, NA, 76, 126, 335, NA, 11, 2067, 5, 110,
89, 21, 19, 632, 383, 24, 225, 21, NA, 79, 151, 234, 0, 0, 1061,
NA, 9, 11, 5, 5, 106, 52, 85, 52, 41, 6, 31, 118, 133, NA, NA,
957, NA, 40, 29, 14, 20, 282, 180, 224, 137, 59, 6, 81, 212)), row.names = c(NA,
-136L), class = "data.frame")
解决方案
您本质上希望通过yr
, bin
, range
,使您的行独一无二class2
。先创建class2
,然后您可以轻松地进行分组和总结:
df %>%
mutate(class2 = case_when(
class %in% c("B181E", "B181G") ~ "B181E_B181G",
class %in% c("K760", "K758") ~ "K760_K758",
TRUE ~ class)
) %>%
group_by(yr, bin, range, class2) %>%
summarise(
new_count = sum(new_count, na.rm = TRUE),
old_count = sum(old_count, na.rm = TRUE)
)
#> # A tibble: 120 x 6
#> # Groups: yr, bin, range [8]
#> yr bin range class2 new_count old_count
#> <dbl> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2005 1 40-49 B181 117 349
#> 2 2005 1 40-49 B181E_B181G 57 121
#> 3 2005 1 40-49 B182 567 2102
#> 4 2005 1 40-49 B182E 0 0
#> 5 2005 1 40-49 B182G 29 39
#> 6 2005 1 40-49 C220 12 15
#> 7 2005 1 40-49 C221 7 9
#> 8 2005 1 40-49 C240 0 8
#> 9 2005 1 40-49 K70 149 263
#> 10 2005 1 40-49 K703 77 147
#> # … with 110 more rows
推荐阅读
- unity3d - 为什么 unityplayer.SkipPermissionsDialog 不能与统一 2018.3.14 一起使用?
- python - Using list() on cycle object hangs system
- multithreading - Scala 避免异步测试用例的线程休眠
- java - 在 Apache Tomcat 上部署 .war 时如何使用给定的 application.properties 文件?
- python-3.x - 仅从 Imapclient 获取联系人的最后一封邮件的信息
- python - 无法从“pywinauto”导入名称“桌面”
- mongodb - 如何对文档数组进行排序,在 mongodb 聚合中获取其内容和长度?
- bash - 删除特定字符后的行 | awk
- python - 模型通过中间模型有两个多对多关系
- html - 如何始终在一个站点中添加样式