首页 > 解决方案 > 在 R 中的数据帧之间减去值

问题描述

我对 R 非常陌生,并且有一个您可能会觉得很简单的问题。我有两个具有相同确切列名的数据框。一个数据框有大约 58k 行(每行是一个文章编号,每列是一个月 - 值是数量)。第二个数据帧是第一个数据帧的一个小得多的子集(大约有 1000 行)。第二个数据框中的行将始终在第一个中具有值。我需要做的是从第一个较大的数据框中减去每个月/文章的第二个数据框数量。它几乎就像对两个值的 vlookup。有任何想法吗?

更新:这就是我认为它在 SQL 中的样子:

SELECT I.Division, 
              ILS.Brand, 
              ILS.Cust #, 
              ILS.Article, 
              ILS.201811change - SLT.201811change AS '201811change', 
              ILS.201812change - SLT.201812change AS '201812change', 
              ILS.201901change - SLT.201901change AS '201901change', 
              ILS.201903change,
              ILS.201904change, 
              ILS.201905change, 
              ILS.201906change, 
              ILS.201907change, 
              ILS.201808change, 
              ILS.201809change

              FROM ILS LEFT OUTER JOIN SLT ON ILS.Article = SLT.Article

标签: rdplyr

解决方案


您可以 在类似于SQL的情况下使用left_join函数。在您的情况下,简化形式是. 请看下面的完整代码:dplyrLEFT JOINISL %>% left_join(SLS, by = "Article")

# data.frame simulation
strs3 <- c("Brand", "Cust", "Article", "201808change", "201809change", "201903change", "201904change", "201905change", 
           "201906change", "201907change", "201811change", "201812change", "201901change")
n <- 1000
total <- cbind(
  as.data.frame(matrix(sample(LETTERS, 3 * n, replace = TRUE), ncol = 3)),
  matrix(rnorm(n * 10), ncol = 10)
)
names(total) <- c("Brand", "Cust", "Article", "201808change", "201809change", "201903change", "201904change", "201905change", 
                "201906change", "201907change", "201811change", "201812change", "201901change")

spl <- ceiling(n * 57 / 58)
ils <- total[1:spl, ]
u <- unique(ils$Article)
ul <- length(u)
slt <- total[(spl + 1): (spl + ul), ]
slt$Article <- u

# left join
z <- ils %>% left_join(slt, by = "Article") %>% 
  mutate(`201811change` = `201811change.x` - `201811change.y`) %>%
  mutate(`201812change` = `201812change.x` - `201812change.y`) %>%
  mutate(`201901change` = `201901change.x` - `201901change.y`) %>%
  select(-ends_with("y")) %>% select(-one_of("201811change.x", "201812change.x", "201901change.x"))

str(z)

输出(结果数据帧的结构):

'data.frame':   983 obs. of  13 variables:
 $ Brand.x       : Factor w/ 26 levels "A","B","C","D",..: 16 23 19 20 19 26 7 21 22 9 ...
 $ Cust.x        : Factor w/ 26 levels "A","B","C","D",..: 21 15 25 3 24 2 1 26 3 23 ...
 $ Article       : Factor w/ 26 levels "A","B","C","D",..: 13 14 2 17 23 13 4 1 17 15 ...
 $ 201808change.x: num  -1.398 -0.357 -1.042 -0.653 -1.037 ...
 $ 201809change.x: num  1.483 0.604 0.276 0.846 -1.245 ...
 $ 201903change.x: num  -0.733 -0.413 0.61 -1.037 1.048 ...
 $ 201904change.x: num  -0.794 -1.0688 0.577 0.3368 0.0472 ...
 $ 201905change.x: num  -0.427 -0.898 1.124 -0.435 -0.304 ...
 $ 201906change.x: num  2.094 0.177 -0.892 -1.655 -1.091 ...
 $ 201907change.x: num  0.228 0.546 0.141 -1.166 -0.687 ...
 $ 201811change  : num  1.5082 0.0148 -0.5335 -0.763 -1.7196 ...
 $ 201812change  : num  1.415 -2.128 -0.576 1.205 -0.631 ...
 $ 201901change  : num  -0.883 -0.892 -2.032 -2.172 0.483 ...

推荐阅读