首页 > 解决方案 > 通过列通过引用循环赋值

问题描述

我发现了许多类似的问题,但没有一个适合我的问题。我有两个大数据框,很少有公共列。我正在尝试通过引用将第一个 df 中的值分配给第二个。

我尝试了更多组合,但没有人能正确工作,例如:

library(data.table)

#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
                              B=rep(rnorm(60,mean=5)),
                              C=rep(rnorm(60,mean=10)))
#loop 
for (i in c("B","C")){
setDT(DB)[DB2,  i := i, on = .(A == A)]}

因此,我想循环以下代码:

setDT(DB)[DB2, B := B, on = .(A == A)]
setDT(DB)[DB2, C := C, on = .(A == A)]
> DB
    A        B         C
 1: C 5.593191 10.697466
 2: C 5.593191 10.697466
 3: E 4.482933  8.726371
 4: D 5.454512 11.054162
 5: A 4.306571 11.427917
 6: E 4.482933  8.726371
 7: D 5.454512 11.054162
 8: E 4.482933  8.726371
 9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933  8.726371
21: D 5.454512 11.054162
22: E 4.482933  8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933  8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
    A        B         C

任何帮助将不胜感激

标签: rdata.tableassign

解决方案


试用:

library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
                  B=rep(rnorm(60,mean=5)),
                  C=rep(rnorm(60,mean=10)))
#try
setDT(DB)[DB2, c("B", "C") := list(B, C), on = .(A == A)]
DB #output
    A        B         C
 1: C 5.593191 10.697466
 2: C 5.593191 10.697466
 3: E 4.482933  8.726371
 4: D 5.454512 11.054162
 5: A 4.306571 11.427917
 6: E 4.482933  8.726371
 7: D 5.454512 11.054162
 8: E 4.482933  8.726371
 9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933  8.726371
21: D 5.454512 11.054162
22: E 4.482933  8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933  8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
    A        B         C

更新

Franck 的建议也应该可以正常工作,并且对于大量或列更有效(注意mget返回命名列表)

cols <- colnames(DB2)[!(colnames(DB2) %in% colnames(DB))]
setDT(DB)[DB2, (cols) := mget(paste0("i.", cols)), on = .(A = A)]

推荐阅读