首页 > 解决方案 > 在R中的列之间减去值

问题描述

我需要通过从 stock 列中添加值并从 sales 列中减去值来创建一个新列。当结果为负时,应将下一个股票重写为示例中的结果。在 ThomasIsCoding 的帮助下,创建的代码适用于示例数据。但是当我为许多产品测试程序时,出现了错误的计算。如下所示。你能检查一下为什么会这样吗?

代码:

TD<-data.frame(product = rep("A",9), data = seq(as.Date("2020-01-01"), as.Date("2020-01-09"), by = "day"),
               sale = c(0, 5, 0, 0, 15, 1, 2, 1, 0), stock = c(0, 0, 50, 0,  10, 0, 10, 0, 100))
TD <- within(TD, result <- ave(stock-sale, product,
                              ave(stock-sale,
                               cumsum(stock>sale),FUN = cumsum)>0,FUN = cumsum))

> TD#works well
  product       data sale stock   result
1       A 2020-01-01    0     0        0
2       A 2020-01-02    5     0       -5
3       A 2020-01-03    0    50       50
4       A 2020-01-04    0     0       50
5       A 2020-01-05   15    10       45
6       A 2020-01-06    1     0       44
7       A 2020-01-07    2    10       52
8       A 2020-01-08    1     0       51
9       A 2020-01-09    0   100      151

不起作用 的例子:我不明白为什么程序有时运行良好,有时却不行。

> TD[1330:1340,]#doesn't work
# A tibble: 4 x 5
# Groups:  data [4]
   product data       stock  sale   result      expected
   <chr>   <fct>      <dbl> <dbl>    <dbl>
 1 B       2020-02-25     0     0      -21           -21
 2 B       2020-02-26     0     0      -21           -21
 3 B       2020-02-27    60     4       35            56
 4 B       2020-02-28     0     2       33            54

> TD[7293:7297,]#works ok
# A tibble: 5 x 5
# Groups:   data [5]
   product  data       stock  sale   result      
   <chr>    <fct>      <dbl> <dbl>    <dbl>
 1 D        2020-01-13     0     2      -22
 2 D        2020-01-14     0     2      -24
 3 D        2020-01-15     0     0      -24
 4 D        2020-01-16   100     2       98
 5 D        2020-01-17     0     4       94


标签: rdataframe

解决方案


好的,也许这是一个解决方案,绝对不是最优雅或可读的,但至少它通过了所有“测试”数据帧:

library(dplyr)
df %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

因此,现在测试具有多个组的数据框:

TD2 = data.frame(product = c("AA","AA","AA","AA","AZ","AZ","AZ","AF","AF","AF","BA","BA","BA"), 
                 data = seq(as.Date("2020-01-01"), as.Date("2020-01-13"), by = "day"),
                 sale = c(0, 5, 0, 0, 15, 1, 2, 1, 0,10,0,1,10), 
                 stock = c(0, 0, 50, 0,  10, 0, 10, 0, 100,0,0,100,0))

TD2 %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

# A tibble: 13 x 4
# Groups:   product [4]
   product  sale stock Result
   <fct>   <dbl> <dbl>  <dbl>
 1 AA          0     0      0
 2 AA          5     0     -5
 3 AA          0    50     50
 4 AA          0     0     50
 5 AZ         15    10     -5
 6 AZ          1     0      0
 7 AZ          2    10      8
 8 AF          1     0     -1
 9 AF          0   100    100
10 AF         10     0     90
11 BA          0     0      0
12 BA          1   100     99
13 BA         10     0     89

现在测试您问题的第二个示例(我假设-21在预期结果中是一个错误,实际上是从以前的计算中继承的,所以我决定忽略这个结果并只关注最后一部分,因为在这一步得到 -21 确实根据您的数据没有任何意义):

Test2 = data.frame(product = c("AA","AA","AA","AA"), 
                 sale = c(0, 0,4,2), 
                 stock = c(0, 0, 60, 0))

Test2 %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

# A tibble: 4 x 4
# Groups:   product [1]
  product  sale stock Result
  <fct>   <dbl> <dbl>  <dbl>
1 AA          0     0      0
2 AA          0     0      0
3 AA          4    60     56
4 AA          2     0     54

最后是你问题的最后一个例子:

Test3 = data.frame(product = c("AA","AA","AA","AA","AA"), 
                   sale = c(2,2,0,2,4), 
                   stock = c(0, 0, 0,100, 0))

Test3 %>% group_by(product) %>% mutate(New = stock - sale) %>%
    mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
    mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
    mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
    mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
    select(product, sale, stock, Result = New4)

# A tibble: 5 x 4
# Groups:   product [1]
  product  sale stock Result
  <fct>   <dbl> <dbl>  <dbl>
1 AA          2     0     -2
2 AA          2     0      0
3 AA          0     0      0
4 AA          2   100     98
5 AA          4     0     94

希望它会让你满意。否则,正如@RonakShah 在评论中提到的,我邀请您重新考虑您的方法,以明确定义您需要什么、如何计算以及输出的基本原理是什么。


推荐阅读