r - 如何在 R 中使用 Dataframes 根据日期时间对事件进行计数并按条件分组
问题描述
我正面临着通过对两个不同的数据集进行一些数据整理来构建表格的挑战。
DataSet A:有服装店的购买信息,变量有:客户姓名、购买日期、代理、t时间段内购买的产品。
df1 <- tibble::tribble(
~NAME, ~PRODUCT, ~AGENT, ~DATE_PURCHASE,
"Karen", "M_14", "X_1", "8-25-20021 18:21:28",
"Jean", "M_78", "X_3", "8-26-20021 18:11:06",
"Jean", "M_71", "X_4", "8-26-20021 18:21:01",
"Jean", "M_64", "X_4", "8-27-20021 20:21:59",
"Keith", "M_57", "X_4", "8-27-20021 20:21:02",
"Alba", "M_50", "X_1", "8-28-20021 20:21:03",
"Alba", "M_43", "X_3", "8-29-20021 20:21:04",
"Alex", "M_36", "X_2", "8-25-20021 20:21:05"
)
数据集B:有t时间段内拨打公司CX SERVICE专线的客户信息,存储客户姓名、通话日期、通话类型等变量。
df2 <- tibble::tribble(
~NAME, ~TYPE, ~DATE_OF_CALL,
"Karen", "COMPLAIN", "8-26-20021 18:21:28",
"Jean", "CX_SERVICE", "8-27-20021 18:11:06",
"Jean", "COMPLAIN", "8-28-20021 18:21:01",
"Jean", "CX_SERVICE", "8-29-20021 20:21:59",
"Keith", "CX_SERVICE", "8-29-20021 20:21:02",
"Alba", "COMPLAIN", "8-30-20021 20:21:03",
"Alex", "CX_SERVICE", "8-25-20021 21:21:05",
)
我必须使用以下内容构建一个数据集:我必须创建一个名为“x尝试”的新变量,它会让我知道这是否是在线客户的第一次、第二次、第三次等调用,以及每个客户将在收到的最后一个电话之前购买的最后一个产品带回桌面,包括电话的类型。我知道这可能听起来令人困惑,所以这里是所需表的示例:
NAME | x attempt | product | TYPE | DATE_CALL | DATE_PURCHASE |
Jean| | 3 | M_64 |CX_SERVICE | 8-29-20021 20:21:59 | 8-27-20021 20:21:59 |
这个结果是正确的,因为记录在案......这将是 Jean 的第三次电话,最后一次电话的类型是 CX_SERVICE 于 8-29-20021 20:21:59 和她购买的最后一个产品是 8-27-20021 20:21:59 上的 M_64。
解决方案
我相信您可能在年份上弄错了,所以我删除了年份(2021 年)中多余的零。我看到您正在使用 tibbles,因此我将提供一种tidyverse
解决此问题的方法。
提供的代码的想法是首先单独处理小标题,然后通过公分母将它们连接起来NAME
。这应该这样做:
library(dplyr)
df1 <- tibble::tribble(
~NAME, ~PRODUCT, ~AGENT, ~DATE_PURCHASE,
"Karen", "M_14", "X_1", "8-25-2021 18:21:28",
"Jean", "M_78", "X_3", "8-26-2021 18:11:06",
"Jean", "M_71", "X_4", "8-26-2021 18:21:01",
"Jean", "M_64", "X_4", "8-27-2021 20:21:59",
"Keith", "M_57", "X_4", "8-27-2021 20:21:02",
"Alba", "M_50", "X_1", "8-28-2021 20:21:03",
"Alba", "M_43", "X_3", "8-29-2021 20:21:04",
"Alex", "M_36", "X_2", "8-25-2021 20:21:05"
)
df2 <- tibble::tribble(
~NAME, ~TYPE, ~DATE_OF_CALL,
"Karen", "COMPLAIN", "8-26-2021 18:21:28",
"Jean", "CX_SERVICE", "8-27-2021 18:11:06",
"Jean", "COMPLAIN", "8-28-2021 18:21:01",
"Jean", "CX_SERVICE", "8-29-2021 20:21:59",
"Keith", "CX_SERVICE", "8-29-2021 20:21:02",
"Alba", "COMPLAIN", "8-30-2021 20:21:03",
"Alex", "CX_SERVICE", "8-25-2021 21:21:05",
)
(df1_mod <- df1 %>%
mutate(DATE_PURCHASE = as.POSIXct(DATE_PURCHASE, format = "%m-%d-%Y %H:%M:%S")) %>%
group_by(NAME) %>%
summarise(product = PRODUCT[DATE_PURCHASE == max(DATE_PURCHASE)], # retrieve product with the most recent date_purchase
DATE_PURCHASE = max(DATE_PURCHASE), # retrieve most recent date_purchase
.groups = "drop"))
#> # A tibble: 5 x 3
#> NAME product DATE_PURCHASE
#> <chr> <chr> <dttm>
#> 1 Alba M_43 2021-08-29 20:21:04
#> 2 Alex M_36 2021-08-25 20:21:05
#> 3 Jean M_64 2021-08-27 20:21:59
#> 4 Karen M_14 2021-08-25 18:21:28
#> 5 Keith M_57 2021-08-27 20:21:02
(df2_mod <- df2 %>%
mutate(DATE_OF_CALL = as.POSIXct(DATE_OF_CALL, format = "%m-%d-%Y %H:%M:%S")) %>%
group_by(NAME) %>%
summarise(`x attempt` = n(), # retrieve amount of calls, which is n() (the amount of rows in the group)
TYPE = TYPE[DATE_OF_CALL == max(DATE_OF_CALL)], # retrieve type of call from most recent call
DATE_OF_CALL = max(DATE_OF_CALL), # retrieve most recent date_of_call
.groups = "drop"))
#> # A tibble: 5 x 4
#> NAME `x attempt` TYPE DATE_OF_CALL
#> <chr> <int> <chr> <dttm>
#> 1 Alba 1 COMPLAIN 2021-08-30 20:21:03
#> 2 Alex 1 CX_SERVICE 2021-08-25 21:21:05
#> 3 Jean 3 CX_SERVICE 2021-08-29 20:21:59
#> 4 Karen 1 COMPLAIN 2021-08-26 18:21:28
#> 5 Keith 1 CX_SERVICE 2021-08-29 20:21:02
left_join(df1_mod, df2_mod, by = "NAME")
#> # A tibble: 5 x 6
#> NAME product DATE_PURCHASE `x attempt` TYPE DATE_OF_CALL
#> <chr> <chr> <dttm> <int> <chr> <dttm>
#> 1 Alba M_43 2021-08-29 20:21:04 1 COMPLAIN 2021-08-30 20:21:03
#> 2 Alex M_36 2021-08-25 20:21:05 1 CX_SERVICE 2021-08-25 21:21:05
#> 3 Jean M_64 2021-08-27 20:21:59 3 CX_SERVICE 2021-08-29 20:21:59
#> 4 Karen M_14 2021-08-25 18:21:28 1 COMPLAIN 2021-08-26 18:21:28
#> 5 Keith M_57 2021-08-27 20:21:02 1 CX_SERVICE 2021-08-29 20:21:02
Created on 2021-04-10 by the reprex package (v0.3.0)
推荐阅读
- java - 为什么 List.contains(Object) 的行为不同?
- r - 如何在同一个数据帧上做colsum和average
- powerbi - 使用 DAX 生成日期系列
- javascript - 仅在服务器上需要一个包
- java - Intellij IDEA 仅针对所有未提交的更改运行测试
- regex - 在换行符之前匹配字符,不包括空格?
- macos - MacOS - 使用带有launchd的脚本 - 启动,登录,注销,关闭?
- service-worker - Workbox 的服务人员在更改时未更新
- python - AllenNLP 共指分辨率的多 GPU 训练
- reactjs - 事件处理程序中带有 [name] 的 PrevState