r - 用数值和 NA 折叠行的更好方法
问题描述
我试图用数值和 NA 折叠行,并且脚本出现警告。我想知道是否有更好的方法可以在没有警告的情况下执行此操作 - 如果我在更大的数据集上使用它,当前的方法也需要一段时间。
dput(abc)
structure(list(ID = c(12345, 12345, 12345, 23456, 23456, 34567,
34567, 34567, 45678), cohort_0 = c(10.1, NA, NA, 12, NA, 15.5,
NA, NA, NA), cohort_2 = c(NA, 10.1, NA, NA, NA, NA, NA, NA, NA
), cohort_7 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), cohort_9 = c(NA, NA,
NA, NA, 12, NA, NA, NA, NA), cohort_11 = c(NA, NA, NA, NA, NA,
NA, NA, 15.5, NA)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
工作代码
abc2 <- abc %>% group_by(ID) %>%
summarize_all(~ max(as.character(.), na.rm = TRUE)) %>%
ungroup
我收到的警告信息:
> warnings()
Warning messages:
1: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
2: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
3: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
4: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
5: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
6: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
7: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
8: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
9: In max(as.character(.), na.rm = TRUE) : no non-missing arguments, returning NA
10: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
11: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
12: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
13: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
14: In max(as.character(.), na.rm = TRUE) :
no non-missing arguments, returning NA
更新:
所以我尝试了另一篇文章的data.table
解决方案。它适用于我的小数据,但不适用于我的大数据。奇怪的。有任何想法吗?
var2 <- setDT(var)[, lapply(.SD, na.omit), by = ID]
Error in `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) :
Supplied 2 items for column 2 of group 6039 which has 3 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(setDT(var), , lapply(.SD, na.omit), by = ID) :
Item 1 of j's result for group 18 is zero length. This will be filled with 2 NAs to match the longest column in this result. Later groups may have a similar problem but only the first is reported to save filling the warning buffer.
解决方案
也许以下内容也适用于您的较大数据集。我%in%
用来测试该值是否存在。
x <- sort(unique(unlist(abc[-1])))
sapply(abc[-1], function(y) ifelse(x %in% y, x, NA))
# cohort_0 cohort_2 cohort_7 cohort_9 cohort_11
#[1,] 10.1 10.1 NA NA NA
#[2,] 12.0 NA NA 12 NA
#[3,] 15.5 NA NA NA 15.5
推荐阅读
- node.js - 无法在 nodejs 中获取 /uploads/images/
- android - unity原生OpenGL纹理显示四次
- ios - ld:未找到架构 x86_64(领域)的符号
- php - error_reporting(E_ALL) 失败,没有日志或错误消息
- python - 尝试在 pip 的帮助下安装 python 包
- ios - 在 iOS 中调试时推送 viewController 不起作用
- excel - 合并后用工作簿名称命名工作表?
- python - python中声明方法(&self)的问题
- oracle12c - OBIEE 12c。在具有 XML 权限的标题中创建自定义链接
- ssl - 如何识别证书未知错误背后的原因