首页 > 解决方案 > 在 R 中:数据帧中的重复导致返回发生变化

问题描述

我有两个数据框,我试图从中提取来创建一个新的数据框。

DF1                                       DF2
ClaimID    Money     Type                 ClaimID     Money    Type
1           500    "Weather"              1            500     "Non-Weather"
1           200    "Weather"              1            200     "Non-Weather"
2           50     "Non-Weather"          2            50      "Non-Weather"

使用此代码:

DF3<-data.frame("ClaimID" = DF1$ClaimID, "FinalType" =
DF1$Type,"OldType" = DF2$Type)

使用此代码,添加一个新列以显示“FinalType”和“OldType”是否一致:

DF3<-cbind(DF3, Agreement =c(ifelse(DF3$OldType == 
DF3$FinalType, "Agree","Disagree")))

我希望创建这个数据框:

DF3
ClaimID    FinalType    OldType       Agreement
1          Weather      Non-Weather   Disagree
1          Weather      Non-Weather   Disagree
2          Non-Weather  Non-Weather   Agree

但是,我得到:

DF3
ClaimID    FinalType    OldType       Agreement
1          Weather      Non-Weather   Disagree
1          Weather      weather       Agree
2          Non-Weather  Non-Weather   Agree

所以,它以某种方式改变了 DF2 中的类型,即使在 DF2 中,类型保持不变。谢谢

标签: r

解决方案


这是使用data.table连接的更快解决方案:

# using data.table
setDT(df1)
setDT(df2)

# doing computation during joining, super fast and efficient
df1[df2, on = c('ClaimID', 'Money'), result := ifelse(Type != i.Type, 'Disagree','Agree')]

推荐阅读