r - 总结 tibble 到多行输出
问题描述
假设我在 R 中有以下小标题:
activation_date | country | campaign | revenue | users
======================================================
1 | 1 | 1 | R_1 | U_1
2 | 1 | 1 | R_2 | U_2
3 | 1 | 1 | R_3 | U_3
1 | 1 | 2 | R_4 | U_4
2 | 1 | 2 | R_5 | U_5
3 | 1 | 2 | R_6 | U_6
1 | 2 | 3 | R_7 | U_7
2 | 2 | 3 | R_8 | U_8
3 | 2 | 3 | R_9 | U_9
我想按国家对这个 tibble 进行分组并总结其数据以将此 tibble 作为其输出:
country | campaign | ltv
==========================
1 | 1 | ltv_1
1 | 2 | ltv_2
2 | 3 | ltv_3
但是,我希望同时ltv_1
使用 ltv_2
all of R_1
throughR_6
和U_1
throughU_6
进行联合计算,并ltv_3
使用R_7
throughR_9
和U_7
throughU_9
进行计算。
我不能group_by
“国家”和summarise
,因为这摆脱了我想保留的“活动”列,但我也不能group_by
同时使用“国家”和“活动”,因为那样我将无法使用前三行帮助计算ltv_2
也不是后三行帮助计算ltv_1
。
一种可能的方法是按“国家”分组并使用该group_modify
函数生成分组的输出小标题。但是,该功能处于“实验”阶段,因此我不想过分依赖它。有没有不同的、既定的方法来做到这一点?
一个示例输入小标题是:
# A tibble: 9 x 5
activation_date country campaign revenue users
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 11
2 2 1 1 2 12
3 3 1 1 3 13
4 1 1 2 4 14
5 2 1 2 5 15
6 3 1 2 6 16
7 1 2 3 7 17
8 2 2 3 8 18
9 3 2 3 9 19
它的输出是:
# A tibble: 3 x 3
country campaign ltv
<dbl> <dbl> <dbl>
1 1 1 0.213
2 1 2 0.296
3 2 3 0.444
使用该group_modify
函数生成它的代码是:
test_tibble = tribble (~ activation_date, ~ country, ~ campaign, ~ revenue, ~ users,
1, 1, 1, 1, 11,
2, 1, 1, 2, 12,
3, 1, 1, 3, 13,
1, 1, 2, 4, 14,
2, 1, 2, 5, 15,
3, 1, 2, 6, 16,
1, 2, 3, 7, 17,
2, 2, 3, 8, 18,
3, 2, 3, 9, 19)
test_function = function (activation_date, campaign, revenue, users) {
total_ltv = sum (revenue) / sum (users)
campaign_ltv = double (0)
campaign_names = unique (campaign)
for (c in campaign_names) {
campaign_ltv = c (campaign_ltv, sum (revenue [campaign == c]) / sum (users [campaign == c]))
}
return (tibble (campaign = campaign_names,
ltv = campaign_ltv / 2 + total_ltv / 2))
}
test_tibble %>%
group_by (country) %>%
group_modify (~ test_function (.x$activation_date, .x$campaign, .x$revenue, .x$users)) %>%
ungroup
解决方案
选项1 -
有点冗长但透明的方法是使用joins
. 但是,考虑到其中的代码,它并不是那么冗长test_function
。-
test_tibble %>%
group_by(country, campaign) %>%
summarize(campaign_ltv = sum(revenue)/sum(users)) %>%
inner_join(
test_tibble %>%
group_by(country) %>%
summarise(total_ltv = sum(revenue)/sum(users)),
by = "country"
) %>%
mutate(ltv = (total_ltv + campaign_ltv)/2) %>%
ungroup()
# A tibble: 3 x 5
country campaign campaign_ltv total_ltv ltv
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.167 0.259 0.213
2 1 2 0.333 0.259 0.296
3 2 3 0.444 0.444 0.444
选项 2) -
test_function
将输出包装在 alist
中以作为嵌套 tibble 并使用unnest
.
test_tibble %>%
group_by (country) %>%
mutate(
ltv = list(test_function(activation_date, campaign, revenue, users))
) %>%
select(country, ltv) %>%
filter(row_number() == 1) %>%
unnest() %>%
ungroup()
# A tibble: 3 x 3
country campaign ltv
<dbl> <dbl> <dbl>
1 1 1 0.213
2 1 2 0.296
3 2 3 0.444
选项 3) -
df %>%
group_by(country) %>%
tidyr::complete(nesting(country, campaign), nesting(revenue, users)) %>%
group_by(campaign, add = TRUE)
# now you have all revenue and users for each country-campaign
# for total_ltv: use revenue and users as is
# for campaign_ltv: use revenue and users where activation_date is not NA
# A tibble: 15 x 5
# Groups: country, campaign [3]
country campaign revenue users activation_date
<int> <int> <chr> <chr> <int>
1 1 1 R_1 U_1 1
2 1 1 R_2 U_2 2
3 1 1 R_3 U_3 3
4 1 1 R_4 U_4 NA
5 1 1 R_5 U_5 NA
6 1 1 R_6 U_6 NA
7 1 2 R_1 U_1 NA
8 1 2 R_2 U_2 NA
9 1 2 R_3 U_3 NA
10 1 2 R_4 U_4 1
11 1 2 R_5 U_5 2
12 1 2 R_6 U_6 3
13 2 3 R_7 U_7 1
14 2 3 R_8 U_8 2
15 2 3 R_9 U_9 3
演示test_tibble
-
test_tibble %>%
group_by(country) %>%
tidyr::complete(nesting(country, campaign), nesting(revenue, users)) %>%
group_by(campaign, add = TRUE) %>%
summarise(
ltv = sum(revenue)/sum(users)/2 +
sum(revenue[!is.na(activation_date)])/sum(users[!is.na(activation_date)])/2
) %>%
ungroup()
# A tibble: 3 x 3
country campaign ltv
<dbl> <dbl> <dbl>
1 1 1 0.213
2 1 2 0.296
3 2 3 0.444
推荐阅读
- python - 在 Python 上安装 Cantera 的问题
- typescript - 尝试将扩展运算符与 TypeScript 和 Object.assign 一起使用
- r - 为数据框中的每一行标识具有非零值的第一列
- events - Maximo:在更新相关表时使用脚本更新工单
- excel - Next without For Error, If Else 语句
- google-chrome - Chrome window.open 显示失败 - 网络错误
- c# - 更改场景时如何使用 UnitySingleton 保存 int 值?
- powershell - 使用 Powershell 和 Visual Studio 2019 查询 TFS 2015 源代码存储库
- html - 在 MacOS 上,Chrome 在解锁屏幕后会获得焦点,即使另一个应用程序已获得焦点
- javascript - 如何在javascript中将Json字符串附加到另一个Json字符串