首页 > 解决方案 > 验证两个数据框中的值

问题描述

我有两个数据框,大约。现在有 100 万条记录我正在尝试检查 Uniq_ID 是否存在于 df2 中,而 df1 中是否存在 city = mum。然后用 1 或 0 对 df2 进行变异以判断真或假。

df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  city=c("del","mum","mum","pun","bang","mum","triv","vish","mum","mum","bang","vish","mum","kol","noi","mum"))
df2 <- data.frame(Uniq_ID =c("DEV2962","KTN2252","ANA2719","H7236","DEV2692","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2831","ERV2951","KTN2542","ANA2813","ITI2210"),
                  city=c("del","mum","bho","pun","mum","chen","mum","vish","mum","mum","bang","mum","mum","kol","noi","mum"))


标签: rdplyr

解决方案


在这种情况下,我们可以使用基数 R。这是否有效:

> df2$ID_not_in_df1 <- ifelse(!df2$Uniq_ID %in% df1$ID & df2$city == 'mum', 1 ,0)
> df2
   Uniq_ID city ID_not_in_df1
1  DEV2962  del             0
2  KTN2252  mum             0
3  ANA2719  bho             0
4    H7236  pun             0
5  DEV2692  mum             1
6  HRT2921 chen             0
7           mum             0
8  KTN2624 vish             0
9  ANA2548  mum             0
10 ITI2535  mum             0
11 DEV2732 bang             0
12 HRT2831  mum             1
13 ERV2951  mum             0
14 KTN2542  kol             0
15 ANA2813  noi             0
16 ITI2210  mum             0
> 

推荐阅读