r - 折叠行,其中一些都是 NA,列是因子、字符和数字类的混合
问题描述
Stack Overflow 上也有一些与这里类似的问题,但我还没有找到一种解决方案来解决这个问题,即使用混合了列类的数据框。
我有一个数据框,df:
df <- structure(list(ID = c("ID1", "ID1", "ID1", "ID1", "ID1", "ID1",
"ID1", "ID1", "ID1"), COLOUR = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("BLUE", "RED"), class = "factor"),
DATE = structure(c(17378, 17378, 17378, 17378, 17378, 17400,
17925, 17925, 17925), class = "Date"), size1 = c(NA, 496.4647,
332.4, NA, NA, NA, NA, 23, NA), size2 = c(NA, NA, 90, NA, NA,
NA, NA, NA, NA), length1 = c(NA, NA, NA, NA, 343.8446, NA,
NA, NA, NA), length2 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), width1 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), width2 = c(NA, NA, NA, NA, NA, NA, NA,
34.682, NA), group1 = c(NA, NA, NA, NA, NA, NA, NA, NA, "CAT!"
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))
# A tibble: 9 x 10
ID COLOUR DATE siz1 size2 length1 length2 width1 width2 group1
<chr> <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 ID1 RED 2017-07-31 NA NA NA NA NA NA NA
2 ID1 RED 2017-07-31 496. NA NA NA NA NA NA
3 ID1 RED 2017-07-31 332. 90 NA NA NA NA NA
4 ID1 RED 2017-07-31 NA NA NA NA NA NA NA
5 ID1 RED 2017-07-31 NA NA 344. NA NA NA NA
6 ID1 RED 2017-08-22 NA NA NA NA NA NA NA
7 ID1 RED 2019-01-29 NA NA NA NA NA NA NA
8 ID1 RED 2019-01-29 23 NA NA NA NA 34.7 NA
9 ID1 RED 2019-01-29 NA NA NA NA NA NA CAT!
我想将其折叠为以下内容:
# A tibble: 9 x 10
ID COLOUR DATE size1 size2 length1 length2 width1 width2 group1
<chr> <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
2 ID1 RED 2017-07-31 496. 90 344. NA NA NA NA
3 ID1 RED 2017-07-31 332. 90 344. NA NA NA NA
6 ID1 RED 2017-08-22 NA NA NA NA NA NA NA
8 ID1 RED 2019-01-29 23 NA NA NA NA 34.7 CAT!
请注意,如果 ID/Date 组合有多个值,则会重复 ID 和 Date 组合。我尝试了几种方法,但没有成功:
方法一:
sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}
df %>%
group_by(ID, DATE) %>%
summarise_all(funs(sum_NA))
Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), na.rm = TRUE) :
‘sum’ not meaningful for factors
^ 上述方法出错。
方法二:
df %>%
group_by(ID, DATE) %>%
summarise_if(is.numeric, funs(sum_NA))
ID DATE size1 size2 length1 length2 width1 width2
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ID1 2017-07-31 829. 90 344. NA NA NA
2 ID1 2017-08-22 NA NA NA NA NA NA
3 ID1 2019-01-29 23 NA NA NA NA 34.7
^ 上面排除了COLOUR
和group1
列,因为它们不是数字,并且它还添加了具有相同 ID / DATE 组合的 size1 变量的值。
方法三:
df <- setDT(df)[, lapply(.SD, na.omit), by = c("ID", "DATE")]
Error in `[.data.table`(setDT(df), , lapply(.SD, na.omit), by = c("ID", :
Supplied 2 items for column 2 of group 1 which has 5 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
^ 我收到上述错误。
任何人都可以帮助找到解决方案吗?
解决方案
根据上面的评论,您可以尝试以下操作,看看它是否会在可能存在多个值时返回您的预期结果:
library(dplyr)
df %>%
group_by(ID, COLOUR, DATE) %>%
summarise(across(everything(), ~ na.omit(.x)[1:pmax(first(max(colSums(!is.na(cur_data())))), 1)]), .groups = "drop")
# A tibble: 3 x 10
ID COLOUR DATE siz1 size2 length1 length2 width1 width2 group1
<chr> <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 ID1 RED 2017-07-31 496. 90 344. NA NA NA NA
2 ID1 RED 2017-08-22 NA NA NA NA NA NA NA
3 ID1 RED 2019-01-29 23 NA NA NA NA 34.7 CAT!
推荐阅读
- css - 导入作用域引导 css
- firebase - 如何从firebase获取特定的id数据作为地图?
- android - Android Studio 和约束布局编辑器问题
- javascript - Javascript 日期循环
- python - 在 Python 中,所有基本数据类型都是不可变的,为什么在 String Replace() 函数中有效?
- apache-kafka-streams - KStreams 关闭 Aggregate 的内部主题
- java - Java - 客户端-服务器程序 - http 响应
- flutter - 如何在 Flutter 中更改 AppBar 中的图标
- c# - .NET Core:如果在 Windows 上运行,如何访问 Windows 凭据管理器(否则忽略)?
- php - 动态翻译、Angular 和 PHP