r - 如何针对 r 中的更大数据优化此 for 循环?
问题描述
我有一些可重现的数据,(我的原始数据集包含大约 2,000,000 行)。出于这个原因,我的 for 循环变得低效,并且需要很长时间才能运行这么多数据。我想知道是否有更有效的方法来运行这些数据。我用可重现的数据附加了我的代码
#----Reproducible data example--------------------#
#Upload first data set#
words1<-c("How","did","Quebec","nationalists","see","their","province","as","a","nation","in","the","1960s")
words2<-c("Why","does","volicty","effect","time",'?',NA,NA,NA,NA,NA,NA,NA)
words3<-c("How","do","I","wash","a","car",NA,NA,NA,NA,NA,NA,NA)
library<-c("The","the","How","see","as","a","for","then","than","example")
embedding1<-c(.5,.6,.7,.8,.9,.3,.46,.48,.53,.42)
embedding2<-c(.1,.5,.4,.8,.9,.3,.98,.73,.48,.56)
df <- data.frame(words1,words2,words3)
names(df)<-c("words1","words2","words3")
#--------Upload 2nd dataset-------#
df2 <- data.frame(library,embedding1, embedding2)
names(df2)<-c("library","embedding1","embedding2")
df2$meanembedding=rowMeans(df2[c("embedding1","embedding2")],na.rm=T)
df2<-df2[,-c(2,3)]
#-----Find columns--------#
l=ncol(df)
names<-names(df)
head(names)
classes<-sapply(df[,c(1:l)],class)
head(classes)
#------Combine and match libary to training data------#
require(gridExtra)
List = list()
for( name in names){
df1<-df[,name]
df1<-as.data.frame(df1)
x_train2<-merge(x= df1, y = df2,
by.x = "df1", by.y = 'library',all.x=T, sort=F)
x_train2<-x_train2[,-1]
x_train2<-as.data.frame(x_train2)
names(x_train2) <- name
List[[length(List)+1]] = x_train2
}
解决方案
更好的方法是使用lapply
:
myList2 <- lapply(names(df), function(x){
y <- merge(x = df[, x, drop = FALSE],
y = df2,
by.x = x,
by.y = 'library',
all.x = T,
sort = F)[, -1, drop = FALSE]
names(y) <- x
return(y)
})
我们循环遍历 vector names(df)
、 subset 和 merge ,[drop = FALSE]
用于防止从 one-column-data.frame 简化为 vector,并覆盖列名。输出是一个列表。
发布脚本:正如@RuiBarradas 指出的那样,从技术上讲,您不需要使用drop = FALSE
ifdf[x]
代替。但我认为在需要对行和列进行子集化的情况下df[, x]
了解该选项会很有帮助。drop = FALSE
推荐阅读
- wordpress - flexslider 不是带有 Avada 主题的 WooCommerce 中的产品图片
- julia - 用于遍历具有值和索引的数组的简写 for 循环
- netlogo - 在 NetLogo 中构建和循环列表
- scenekit - SceneKit:粒子系统的停止动画
- java - 在启用片段之前无法启动自动完成活动
- python-3.x - 输入事件时不弹出工具提示
- ios - 如何在 LineChart 中为突出显示的值绘制圆圈
- javascript - 通过单击按钮拼接使用 DOM 呈现的数组(Vanilla Js)
- azure - 了解如何在 Azure AD 中使用范围进行用户授权
- python - Python selenium 浏览器 firefox