首页 > 解决方案 > R: if values of two columns of two data frames are equal, add column of one of the two data frames

问题描述

I have two data frames

columns df1= a, b, c, d

columns df2= e, f, c, d

I want to say that if df1$a == df2$e and at the same time df1$b == df2$f I want to print in a new column df1$x the values of the corresponding df2$c.

So I wrote this command df1$x <- ifelse(df1$a %in% df2$e & df1$b %in% df2$f, df2$c, ifelse (NULL))

But the values in df1x are not correct. I think I have to do a while loop, but I have no idea from where to start.

Nicoletta

标签: rmergedata-cleaning

解决方案


To merge two data frames by multiple columns you can use merge and corresponding by.x and by.y. Both of those accept vector of column names. Parameter all.x means all data from first df1 will be preserved.

df1 <-  merge(df1, df2[,c("e","f","c")], by.x=c("a","b"), by.y=c("e","f"), all.x=T)

colnames(df1) <- c("a","b","c","d","x")

For more information see ?merge. For more complex operation you can check out package dplyr and its function left_join.

Result on random data

df1 <- data.frame(a = sample(1:5), b= sample(1:5), c=runif(5), d = runif(5))
df2 <- data.frame(e = df1$a, f = df1$b, c = runif(5), d = runif(5))
df2$e[3] <- 7

Output

> df1
  a b          c          d
1 5 4 0.76677063 0.92123552
2 4 1 0.93524320 0.09275425
3 3 2 0.01121468 0.12035981
4 1 5 0.72992427 0.87711572
5 2 3 0.11680937 0.93696597
> df2$e[3] <- 7
> df2
  e f         c         d
1 5 4 0.6251662 0.1549575
2 4 1 0.8464672 0.8702837
3 7 2 0.5394273 0.4290171
4 1 5 0.4061817 0.9072905
5 2 3 0.3376456 0.4291463
> merge(df1, df2[,c("e","f","c")], by.x=c("a","b"), by.y=c("e","f"), all.x=T)
  a b          c          d         x
1 1 5 0.72992427 0.87711572 0.4061817
2 2 3 0.11680937 0.93696597 0.3376456
3 3 2 0.01121468 0.12035981        NA
4 4 1 0.93524320 0.09275425 0.8464672
5 5 4 0.76677063 0.92123552 0.6251662

推荐阅读