r - R中的多个数据帧并行功能
问题描述
在 R 中,我调用parLapply()
一个列表并使用列表中的元素过滤函数中的 2 个数据帧,例如
myfunction <- function(id) {
r1 <- r %>% filter(ID == id)
b1<- b %>% filter(ID == id)
doSomething(r1,b1)
}
result <- parLapply(cluster, listOfIDs, myfunction)
我使用的 SLURM 系统内存不足,因为我认为,我每次都加载两个大型数据帧(r
和) 。较小的数据集不会超出内存。b
myfunction()
parLapply()
因此,只有我想加载一大块数据帧,r
并且b
每次调用该函数以降低内存要求。像这样的东西(系列测试):
library(doParallel)
library(foreach)
foreach(r1= split(r,
rep(1:nrow(r),
each = 1))) %do% {
b1 <- b %>% filter(rowname == as.numeric(r1$rowname))
print(b1) # doSomething(r1, b1)
}
但我也想b
在函数之外进行过滤,以便不会在每个实例中加载整个数据框。b1
并且r1
必须相同rowname
。这可能吗??
数据
> dput(r)
structure(list(ID_DRAIN = c(115504, 115865, 115892, 115955, 115983,
115940, 116033, 116028, 115873, 115905, 115835, 115885, 115452,
115472, 115749, 115900, 115944, 115817, 115860, 115234, 115753,
115505, 115899, 115939, 116015, 115191, 115214, 115339, 115799,
115809, 115898, 115864), rowname = c("1", "7", "8", "9", "10",
"11", "12", "14", "18", "19", "22", "23", "25", "26", "27", "29",
"30", "37", "38", "39", "42", "44", "45", "46", "49", "50", "51",
"57", "59", "60", "61", "63")), row.names = c(1L, 7L, 8L, 9L,
10L, 11L, 12L, 14L, 18L, 19L, 22L, 23L, 25L, 26L, 27L, 29L, 30L,
37L, 38L, 39L, 42L, 44L, 45L, 46L, 49L, 50L, 51L, 57L, 59L, 60L,
61L, 63L), class = "data.frame")
> dput(b)
structure(list(LabelAtlas = structure(c(2L, 2L, 2L, 2L, 4L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Culvert", "dam", "Ford",
"Ramp/bed_sill", "sluice", "unknown", "weir"), class = "factor"),
rowname = c("57", "11", "7", "19", "11", "25", "38", "37",
"57", "57", "25", "25", "7")), row.names = c(325L, 413L,
414L, 1607L, 2382L, 2837L, 2870L, 2945L, 3272L, 3402L, 3433L,
3562L, 4753L), class = "data.frame")
解决方案
Turns out you can give foreach
more than 1 argument...
bgrouped <- b %>% group_by(groupID)
foreach(b1 = group_split(bgrouped),
r1 = split(r, rep(1:nrow(r), each = 1)), .combine=data.frame) %dopar% {
function(b1, r1)
}
推荐阅读
- elasticsearch - Elasticsearch+kibana 6.2 与 Elasticsearch+kibana 7.9
- python - 如何将 tf-idf 应用于文本行
- visual-c++ - 整数值,如果不是在数学运算中将类型强制转换为 double 会产生令人费解的结果
- python - Python中@classmethod的目的是什么?
- mysql - 是否可以根据另一个表中的列值自动在表中插入列值?
- python - 如何解决在 windows 中安装 web3 for python 的问题?
- orm - 没有为查询(table_class).all()获取数据 - sqlalchemy ORM
- r - SelectInput R 闪亮
- visual-c++ - Visual C++ 错误,命名空间“std”没有成员“any”
- node.js - Elastic beanstalk 应用程序未连接到 Nginix