首页 > 解决方案 > 透视数据集和列名 [R]

问题描述

我有一个要旋转的数据集。

在此处输入图像描述

dataset <- data.frame(date = c("01/01/2020","02/01/2020", "02/01/2020", "03/01/2020")
              , camp_type = c("acquisition", "acquisition", "newsletter", "acquisition")
              , channel_type = c("email", "direct_mail","email","email")
              , sent = c(100, 200, 50, 250)
              , open = c(30, NA, 14, 148)
              , click = c(14, NA, 1, 100)
)

请注意:我的 camp_types 比本示例中显示的要多得多。

我想每天获取一行,其余信息在不同的列中,如下图(根据“channel_type”和“camp_type”重命名“sent”、“open”和“click”列)。

在此处输入图像描述

我尝试了一些不太优雅且完全手动的方法,但是当我重命名变量时出现错误(下面的代码)

dataset %>%
  filter(camp_type == 'Acquisition' & channel_type == 'direct_mail') %>%
  rename (dm_acq_sent = sent
    , dm_acq_open = open
    , dm_acq_click = clicked
  )

上面这段代码的问题是(一旦我解决了重命名问题)它将是大量手动的,因为我必须多次重复相同的代码块,并且需要有人定期检查是否没有更多 camp_type 和 channel_type 的组合。

任何帮助/建议将不胜感激。

标签: rpivot

解决方案


tidyr你可以使用pivot_wider

library(tidyr)

pivot_wider(df, id_cols = date, names_from = c(camp_type, channel_type), values_from = c(sent, open, click))

输出

# A tibble: 3 x 10
  date       sent_acquisition… sent_acquisition_… sent_newsletter_… open_acquisitio… open_acquisition… open_newsletter… click_acquisiti… click_acquisitio… click_newslette…
  <date>                 <dbl>              <dbl>             <dbl>            <dbl>             <dbl>            <dbl>            <dbl>             <dbl>            <dbl>
1 2020-01-01               100                 NA                NA               30                NA               NA               14                NA               NA
2 2020-02-01                NA                200                50               NA                NA               14               NA                NA                1
3 2020-03-01               250                 NA                NA              148                NA               NA              100                NA               NA

数据

df <- structure(list(date = structure(c(18262, 18293, 18293, 18322), class = "Date"), 
    camp_type = structure(c(1L, 1L, 2L, 1L), .Label = c("acquisition", 
    "newsletter"), class = "factor"), channel_type = structure(c(2L, 
    1L, 2L, 2L), .Label = c("direct_email", "email"), class = "factor"), 
    sent = c(100, 200, 50, 250), open = c(30, NA, 14, 148), click = c(14, 
    NA, 1, 100)), class = "data.frame", row.names = c(NA, -4L
))

推荐阅读