首页 > 解决方案 > 折叠行,其中一些都是 NA,列是因子、字符和数字类的混合

问题描述

Stack Overflow 上也有一些与这里类似的问题,但我还没有找到一种解决方案来解决这个问题,即使用混合了列类的数据框。

我有一个数据框,df:

df <- structure(list(ID = c("ID1", "ID1", "ID1", "ID1", "ID1", "ID1", 
"ID1", "ID1", "ID1"), COLOUR = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("BLUE", "RED"), class = "factor"), 
    DATE = structure(c(17378, 17378, 17378, 17378, 17378, 17400, 
    17925, 17925, 17925), class = "Date"), size1 = c(NA, 496.4647, 
    332.4, NA, NA, NA, NA, 23, NA), size2 = c(NA, NA, 90, NA, NA, 
    NA, NA, NA, NA), length1 = c(NA, NA, NA, NA, 343.8446, NA, 
    NA, NA, NA), length2 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), width1 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), width2 = c(NA, NA, NA, NA, NA, NA, NA, 
    34.682, NA), group1 = c(NA, NA, NA, NA, NA, NA, NA, NA, "CAT!"
    )), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

# A tibble: 9 x 10
  ID    COLOUR DATE        siz1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <chr> 
1 ID1   RED    2017-07-31   NA     NA     NA       NA     NA   NA   NA    
2 ID1   RED    2017-07-31  496.    NA     NA       NA     NA   NA   NA    
3 ID1   RED    2017-07-31  332.    90     NA       NA     NA   NA   NA    
4 ID1   RED    2017-07-31   NA     NA     NA       NA     NA   NA   NA    
5 ID1   RED    2017-07-31   NA     NA    344.      NA     NA   NA   NA    
6 ID1   RED    2017-08-22   NA     NA     NA       NA     NA   NA   NA    
7 ID1   RED    2019-01-29   NA     NA     NA       NA     NA   NA   NA    
8 ID1   RED    2019-01-29   23     NA     NA       NA     NA   34.7 NA    
9 ID1   RED    2019-01-29   NA     NA     NA       NA     NA   NA   CAT!

我想将其折叠为以下内容:

# A tibble: 9 x 10
  ID    COLOUR DATE        size1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <chr>    
2 ID1   RED    2017-07-31  496.    90    344.       NA     NA   NA      NA  
3 ID1   RED    2017-07-31  332.    90    344.       NA     NA   NA      NA  
6 ID1   RED    2017-08-22   NA     NA     NA        NA     NA   NA      NA     
8 ID1   RED    2019-01-29   23     NA     NA        NA     NA   34.7  CAT!    

请注意,如果 ID/Date 组合有多个值,则会重复 ID 和 Date 组合。我尝试了几种方法,但没有成功:

方法一:

sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

df %>%
    group_by(ID, DATE) %>%
    summarise_all(funs(sum_NA))

Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), na.rm = TRUE) : 
  ‘sum’ not meaningful for factors

^ 上述方法出错。

方法二:

df %>%
    group_by(ID, DATE) %>%
    summarise_if(is.numeric, funs(sum_NA))

  ID    DATE       size1 size2 length1 length2 width1 width2
  <chr> <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 ID1   2017-07-31  829.    90    344.      NA     NA   NA  
2 ID1   2017-08-22   NA     NA     NA       NA     NA   NA  
3 ID1   2019-01-29   23     NA     NA       NA     NA   34.7

^ 上面排除了COLOURgroup1列,因为它们不是数字,并且它还添加了具有相同 ID / DATE 组合的 size1 变量的值。

方法三:

df <- setDT(df)[, lapply(.SD, na.omit), by = c("ID", "DATE")]
Error in `[.data.table`(setDT(df), , lapply(.SD, na.omit), by = c("ID",  : 
  Supplied 2 items for column 2 of group 1 which has 5 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

^ 我收到上述错误。

任何人都可以帮助找到解决方案吗?

标签: r

解决方案


根据上面的评论,您可以尝试以下操作,看看它是否会在可能存在多个值时返回您的预期结果:

library(dplyr)

df %>%
  group_by(ID, COLOUR, DATE) %>%
  summarise(across(everything(), ~ na.omit(.x)[1:pmax(first(max(colSums(!is.na(cur_data())))), 1)]), .groups = "drop")

# A tibble: 3 x 10
  ID    COLOUR DATE        siz1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <chr> 
1 ID1   RED    2017-07-31  496.    90    344.      NA     NA   NA   NA    
2 ID1   RED    2017-08-22   NA     NA     NA       NA     NA   NA   NA    
3 ID1   RED    2019-01-29   23     NA     NA       NA     NA   34.7 CAT!  

推荐阅读