首页 > 解决方案 > R中的Sumif有条件

问题描述

我想在 R 中做一个条件求和,我有一个如下表。有了这些数据,我想对未来 5 天每张桌子的总价值进行前瞻性预测。应包含从开始到 out_date 的日期的值。

+-------+------------+-------+-------+------------+------+
| Index |    Date    | Desk  | Value |  Out_date  | Days |
+-------+------------+-------+-------+------------+------+
|    16 | 2020-07-30 | Desk1 | 1     | 2020-08-17 |   12 |
|    51 | 2020-08-13 | Desk2 | 2.000 | 2020-08-14 |    4 |
|    52 | 2020-08-13 | Desk3 | 2.000 | 2020-08-15 |    4 |
|    53 | 2020-08-13 | Desk3 | 2.000 | 2020-08-16 |    4 |
+-------+------------+-------+-------+------------+------+

我该如何解决这个问题?

输出应该如何:

+-------+------------+------------+------------+------------+------------+
| Desk  | 2020-08-14 | 2020-08-15 | 2020-08-16 | 2020-08-17 | 2020-08-18 |
+-------+------------+------------+------------+------------+------------+
| Desk1 |          1 |          1 |      1     |      1     |       0    |
| Desk2 |          2 |          0 |      0     |      0     |       0    |
| Desk3 |          4 |          4 |      2     |      0     |       0    |
+-------+------------+------------+------------+------------+------------+

标签: r

解决方案


根据您的描述,听起来好像表中的每一行都代表ValueDesk给定时间段相关联的 a。与那张桌子关联的Value开始于特定的Date,并一直持续到Out_date。但是,这些关联可以同时发生,这意味着在任何特定的一天,一张桌子可能有几个关联的值。您的意图是对这些值求和。

如果我的理解是正确的,那么以下代码将为您提供相关的金额:

library(dplyr)

df %>% 
  mutate(Days = as.numeric(difftime(Out_date, Date, units = "day")) + 1) %>%
  add_row(Index = max(df$Index) + 1, Date = max(df$Date), 
          Desk = "Desk1", Value = 0, Out_date = max(df$Date) + 1, 
          Days = 6) %>%
  mutate(entry = seq(nrow(.)), n = Days) %>% 
  tidyr::uncount(Days) %>%
  group_by(entry) %>%
  mutate(Date_out = seq.Date(min(Date), length.out = max(n), by = "1 day")) %>%
  group_by(Desk, Date_out) %>%
  summarize(Value = sum(Value)) %>%
  tidyr::pivot_wider(names_from = "Date_out", values_from = "Value") %>%
  mutate_if(function(x) any(is.na(x)), function(x) replace(x, is.na(x), 0)) %>%
  as.data.frame()

#>    Desk 2020-07-30 2020-07-31 2020-08-01 2020-08-02 2020-08-03 2020-08-04
#> 1 Desk1          1          1          1          1          1          1
#> 2 Desk2          0          0          0          0          0          0
#> 3 Desk3          0          0          0          0          0          0
#>   2020-08-05 2020-08-06 2020-08-07 2020-08-08 2020-08-09 2020-08-10 2020-08-11
#> 1          1          1          1          1          1          1          1
#> 2          0          0          0          0          0          0          0
#> 3          0          0          0          0          0          0          0
#>   2020-08-12 2020-08-13 2020-08-14 2020-08-15 2020-08-16 2020-08-17 2020-08-18
#> 1          1          1          1          1          1          1          0
#> 2          0          2          2          0          0          0          0
#> 3          0          4          4          4          2          0          0

来自问题的数据

df <- structure(list(Index = c(16L, 51L, 52L, 53L), Date = structure(c(18473, 
18487, 18487, 18487), class = "Date"), Desk = c("Desk1", "Desk2", 
"Desk3", "Desk3"), Value = c(1, 2, 2, 2), Out_date = structure(c(18491, 
18488, 18489, 18490), class = "Date"), Days = c(12L, 4L, 4L, 
4L)), row.names = c(NA, -4L), class = "data.frame")

reprex 包(v0.3.0)于 2020 年 8 月 14 日创建


推荐阅读