首页 > 解决方案 > 如何在 R 中使用 chr 列聚合数据框

问题描述

我正在尝试在 R 中创建一个聚合三列的数据框。

第一个是日期,第二个是 chr 代码,第三个是 num。对于永远 promo_code,我需要它来创建一个新列,指示已使用促销代码(1 = 是,0 = 否)

promo_codes 数以千计,所以我需要一些可以自行扩展的东西。我无法手动添加列

DATA:
+---------------+------------+--------------+
| Date          | Promo_Code | Sales_Amount | 
+---------------+------------+--------------+
| 10-29-20      |   today20  |   50         |  
+---------------+------------+--------------+
| 10-29-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-29-20      |   today20  |   50         |  
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-27-20      |   pc20     |   25         |
+---------------+------------+--------------+
| 10-28-20      |            |   50         |   
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-27-20      |            |   25         |
+---------------+------------+--------------+
| ....          |      ....  |   ....       |
+---------------+------------+--------------+
| ....          |      ....  |   ....       |
+---------------+------------+--------------+


NEW DATAFRAME
+---------------+------------+--------------+--------------+--------------+
|      Date     | today20    | vip20        |  pc20        | Sales Total  |
+---------------+------------+--------------+--------------+--------------+
| 10-29-20      |   1        |   1          |    0         |  150.00      |
+---------------+------------+--------------+--------------+--------------+
| 10-28-20      |   0        |   1          |    0         |  100.00      |
+---------------+------------+--------------+--------------+--------------+
| 10-27-20      |   0        |   0          |    1         |   25.00      |
+---------------+------------+--------------+--------------+--------------+
| ....          |      ....  |   ....       | ....         |   ....       |
+---------------+------------+--------------+--------------+--------------+
| ....          |      ....  |   ....       | ....         |   ....       |
+---------------+------------+--------------+--------------+--------------+

我试过的:

library(data.table)
out = setDT(data)[,lapply(.SD, sum), by=data$Date]

这给出了这个错误:不支持类型“字符”

library(dplyr)
out = data %>% group_by(date) %>% summerise_each(funs(sum))

这给了我与“无效的'类型'(字符)参数相同的错误

标签: r

解决方案


这是一种方法,它在给定日期为每个添加计数Promo_Code,然后旋转宽以将促销代码表示为它们自己的列。按日期聚合产生总Sales_AmountPromo_Code计数。

library(tidyverse)

df %>%
  group_by(Date, Promo_Code) %>%
  summarise(n = n(),
            Sales_Amount = sum(Sales_Amount)) %>%
  pivot_wider(id_cols = c(Date, Sales_Amount), names_from = Promo_Code, 
              values_from = n, values_fill = 0) %>%
  summarise(across(.fns = sum))

# A tibble: 5 x 5
  Date       Sales_Amount  pc20 vip20 today20
  <date>            <dbl> <int> <int>   <int>
1 2020-10-27          115     1     1       0
2 2020-10-28           75     1     1       0
3 2020-10-29           60     1     1       0
4 2020-10-30          115     1     1       0
5 2020-10-31          100     1     0       1

示例数据:

set.seed(123)

df <- tibble(Date = rep(seq(ymd('2020-10-27'),ymd('2020-10-31'), by = '1 day'), 2),
             Promo_Code = sample(c("today20", "vip20", "pc20"), replace=T, 10),
             Sales_Amount = sample(c(25, 35, 50, 65, 75), replace=T, 10))

df
# A tibble: 10 x 3
   Date       Promo_Code Sales_Amount
   <date>     <chr>             <dbl>
 1 2020-10-27 pc20                 65
 2 2020-10-28 pc20                 25
 3 2020-10-29 pc20                 35
 4 2020-10-30 vip20                50
 5 2020-10-31 pc20                 75
 6 2020-10-27 vip20                50
 7 2020-10-28 vip20                50
 8 2020-10-29 vip20                25
 9 2020-10-30 pc20                 65
10 2020-10-31 today20              25

推荐阅读