首页 > 解决方案 > 在R中的组内绑定行

问题描述

我有一个数据,其中某些列定义了组,而某些列(下面示例数据中的 a1-a4)仅在一个列中具有值,而在其余列中具有 NA。

structure(list(gp = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.4, NA, NA, NA, NA, NA, 0.3, NA, NA), a2 = c(NA, NA, NA, 1, NA, NA, NA, NA, NA), a3 = c(NA, 1.2, NA, NA, NA, NA, NA, NA, NA), a4 = c(NA, NA, 1, NA, NA, NA, NA, NA, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -9L), class = "data.frame")

据我所知,a1 列中只有一个实际上具有价值,我不需要单独的行,我想将组内的所有值收集到一行。我期待像下面这样的东西。

structure(list(gp = structure(c(1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 2, 3), name = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.4, NA, 0.3), a2 = c(NA, 1, NA), a3 = c(1.2, NA, NA), a4 = c(NA, 1, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -3L), class = "data.frame")

我怎样才能做到这一点?如果解决方案使用 tidyverse,那就太好了。

标签: rdataframetidyverse

解决方案


你可以试试这个

library(tidyverse)
df1 %>% 
 group_by(gp, id, name) %>% 
 summarise_all(sum, na.rm = TRUE) %>% 
 summarise_all(na_if, 0)
# A tibble: 3 x 7
# Groups:   gp [?]
#  gp       id name      a1    a2    a3    a4
#  <fct> <dbl> <fct>  <dbl> <dbl> <dbl> <dbl>
#1 gp1      1. A      0.400   NA   1.20   NA 
#2 gp1      2. B     NA        1. NA       1.
#3 gp1      3. C      0.300   NA  NA       1.

最终输出中不会有任何NAs 但0s ,因此第二次调用summarise_all. 我在这里假设 to 的列中没有0s 。a1a4


0这是初始数据集中存在 s 的情况的解决方案。

sum_NA <- function(x) {
  if(all(is.na(x))) {
    NA
  } else {
    sum(x, na.rm = TRUE)
  }
}

df2 %>% 
 group_by(gp, id, name) %>% 
 summarise_all(sum_NA)
# A tibble: 3 x 7
# Groups:   gp, id [?]
#  gp       id name      a1    a2    a3    a4
#  <fct> <dbl> <fct>  <dbl> <dbl> <dbl> <dbl>
#1 gp1      1. A      0.      NA   1.20   NA 
#2 gp1      2. B     NA        0. NA       1.
#3 gp1      3. C      0.300   NA  NA       1.

数据

df1 <- structure(list(gp = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.4, NA, NA, NA, NA, NA, 0.3, NA, NA), a2 = c(NA, NA, NA, 1, NA, NA, NA, NA, NA), a3 = c(NA, 1.2, NA, NA, NA, NA, NA, NA, NA), a4 = c(NA, NA, 1, NA, NA, NA, NA, NA, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -9L), class = "data.frame")

df2 <- structure(list(gp = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "gp1", class = "factor"), id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), name = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), a1 = c(0.0, NA, NA, NA, NA, NA, 0.3, NA, NA), a2 = c(NA, NA, NA, 0, NA, NA, NA, NA, NA), a3 = c(NA, 1.2, NA, NA, NA, NA, NA, NA, NA), a4 = c(NA, NA, 1, NA, NA, NA, NA, NA, 1)), .Names = c("gp", "id", "name", "a1", "a2", "a3", "a4"), row.names = c(NA, -9L), class = "data.frame")

推荐阅读