首页 > 解决方案 > 使用数据透视表类型的重新格式化将数据表重新格式化(重塑)为时间序列 - 将级别添加到变量名称

问题描述

使用 R,有没有办法以类似于在 excel 中执行数据透视表的方式重新格式化数据集?我的数据总共有 5 个变量。三个变量是 Date、Channel 和 Category,还有两个度量变量 Views 和 Spend。有没有一种方法可以生成时间序列数据,其中我有行中的日期,并根据频道和类别的排列为每个度量变量、视图和支出自动生成新变量?这个问题与其他问题不同,因为我希望变量名成为变量名的一部分。

开始文件看起来像这样

Date=c("01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/2020","08/01/2020","08/01/2020","15/01/2020","15/01/2020","15/01/2020","15/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","29/01/2020","29/01/2020","05/02/2020","05/02/2020","05/02/2020")
Channel=c("TV","TV","TV","Internet","TV","Internet","TV","Internet","TV","TV","Internet","TV","Internet","TV","TV","Internet","TV","TV","Internet","TV","Internet","TV","Internet")
Category=c("CatA","CatA","CatA","CatA","CatB","CatB","CatB","CatB","CatA","CatB","CatB","CatA","CatB","CatB","CatB","CatB","CatB","CatB","CatB","CatA","CatA","CatA","CatA")
Views=c(190,320,260,300,240,190,200,190,230,30,370,260,350,240,330,190,290,220,230,180,230,310,270)
Spend=c(34,63,46,53,21,23,17,24,20,5,50,42,46,39,44,31,72,54,58,22,29,41,36)
df <- data.frame(Date,Channel,Category,Views,Spend)
df

> df
         Date  Channel Category Views Spend
1  01/01/2020       TV     CatA   190    34
2  01/01/2020       TV     CatA   320    63
3  01/01/2020       TV     CatA   260    46
4  01/01/2020 Internet     CatA   300    53
5  01/01/2020       TV     CatB   240    21
6  08/01/2020 Internet     CatB   190    23
7  08/01/2020       TV     CatB   200    17
8  15/01/2020 Internet     CatB   190    24
9  15/01/2020       TV     CatA   230    20
10 15/01/2020       TV     CatB    30     5
11 15/01/2020 Internet     CatB   370    50
12 22/01/2020       TV     CatA   260    42
13 22/01/2020 Internet     CatB   350    46
14 22/01/2020       TV     CatB   240    39
15 22/01/2020       TV     CatB   330    44
16 22/01/2020 Internet     CatB   190    31
17 22/01/2020       TV     CatB   290    72
18 22/01/2020       TV     CatB   220    54
19 29/01/2020 Internet     CatB   230    58
20 29/01/2020       TV     CatA   180    22
21 05/02/2020 Internet     CatA   230    29
22 05/02/2020       TV     CatA   310    41
23 05/02/2020 Internet     CatA   270    36

我希望重新格式化的数据框看起来像这样

Date=c("01/01/2020","08/01/2020","15/01/2020","22/01/2020","29/01/2020","05/02/2020")
TV.CatA.Views=c(770,0,230,260,180,310)
TV.CatB.Views=c(240,200,30,1080,0,0)
Internet.CatA.Views=c(300,0,0,0,0,500)
Internet.CatB.Views=c(0,190,560,540,230,0)
TV.CatA.Spend=c(143,0,20,42,22,41)
TV.CatB.Spend=c(21,17,5,209,0,0)
Internet.CatA.Spend=c(53,0,0,0,0,65)
Internet.CatB.Spend=c(0,23,74,77,58,0)
df_result <- data.frame(Date,TV.CatA.Views,TV.CatB.Views,Internet.CatA.Views,Internet.CatB.Views,TV.CatA.Spend,TV.CatB.Spend,Internet.CatA.Spend,Internet.CatB.Spend)
df_result 

> df_result 
        Date TV.CatA.Views TV.CatB.Views Internet.CatA.Views Internet.CatB.Views TV.CatA.Spend
1 01/01/2020           770           240                 300                   0           143
2 08/01/2020             0           200                   0                 190             0
3 15/01/2020           230            30                   0                 560            20
4 22/01/2020           260          1080                   0                 540            42
5 29/01/2020           180             0                   0                 230            22
6 05/02/2020           310             0                 500                   0            41
  TV.CatB.Spend Internet.CatA.Spend Internet.CatB.Spend
1            21                  53                   0
2            17                   0                  23
3             5                   0                  74
4           209                   0                  77
5             0                   0                  58
6             0                  65                   0

变量名称不需要完全按照我上面指定的方式命名,只要可以识别变量中的这些级别即可。目前,我一直在 excel 中执行此操作,但是在连续执行了 50 多个之后,我需要找到一种更有效的方法。感谢您花时间查看我的问题,非常感谢您的帮助。

标签: rtime-seriespivot-tablereshapereformatting

解决方案


df此代码使用您添加的内容产生类似于您想要的内容:

library(tidyverse)
#Code
mdf <- df %>% group_by(Date,Channel,Category) %>% summarise_all(.funs = sum) %>%
  ungroup() %>% pivot_wider(names_from = c(Channel,Category),values_from = c(Views,Spend))

输出:

        Date Views_Internet_CatA Views_TV_CatA Views_TV_CatB Views_Internet_CatB Spend_Internet_CatA
1 01/01/2020                 300           770           240                  NA                  53
2 05/02/2020                 500           310            NA                  NA                  65
3 08/01/2020                  NA            NA           200                 190                  NA
4 15/01/2020                  NA           230            30                 560                  NA
5 22/01/2020                  NA           260          1080                 540                  NA
6 29/01/2020                  NA           180            NA                 230                  NA
  Spend_TV_CatA Spend_TV_CatB Spend_Internet_CatB
1           143            21                  NA
2            41            NA                  NA
3            NA            17                  23
4            20             5                  74
5            42           209                  77
6            22            NA                  58

推荐阅读