首页 > 解决方案 > 如何按组计算变量内的比率

问题描述

我有一个数据框,包括:

HouseholdID <- c("1", "2","2", "3", "3", "4")
ItemNo <- c("23", "25", "23", "26", "23", "24")
ItemPrice <- c(200, 300, 200, 500, 200, 900)
TotalHouseholdExpenses <- c(200, 500, 500, 700, 700, 900)

df  <- data.frame(HouseholdID, ItemNo, ItemPrice, TotalHouseholdExpenses)

也就是说,住户 1 买了 23 件,花了 200 美元,住户 2 买了 25 和 23 件,花了 500 美元,等等。如果把 23 和 24 件视为“诱惑品”,我想知道每个家庭总数的百分比家庭开支由诱惑商品组成,我将如何计算并创建这个比率的新变量?谢谢你。

标签: r

解决方案


ItemPrice/TotalHouseholdExpenses如果比率dcast适合你

library(data.table)
dt <- data.table(HouseholdID,ItemNo,ItemPrice,TotalHouseholdExpenses)

dcast(dt, HouseholdID + TotalHouseholdExpenses ~ ItemNo, value.var = 'ItemPrice') %>% 
  mutate(across(`23`:`26`, ~ .x /TotalHouseholdExpenses))

或者不必手动指定列:

dcast(dt, HouseholdID + TotalHouseholdExpenses ~ ItemNo, value.var = 'ItemPrice') %>% 
  mutate(across(3:(2+length(unique(ItemNo))), ~ .x /TotalHouseholdExpenses))
output:
   HouseholdID TotalHouseholdExpenses        23 24  25        26
1:           1                    200 1.0000000 NA  NA        NA
2:           2                    500 0.4000000 NA 0.6        NA
3:           3                    700 0.2857143 NA  NA 0.7142857
4:           4                    900        NA  1  NA        NA

最后用 % 格式

library(data.table)
library(scales)

dt <- data.table(HouseholdID,ItemNo,ItemPrice,TotalHouseholdExpenses)

dcast(dt, HouseholdID + TotalHouseholdExpenses ~ ItemNo, value.var = 'ItemPrice') %>% 
  mutate(across(3:(2+length(unique(ItemNo))), ~ label_percent()(.x /TotalHouseholdExpenses)))


output:
   HouseholdID TotalHouseholdExpenses   23   24   25   26
1:           1                    200 100% <NA> <NA> <NA>
2:           2                    500  40% <NA>  60% <NA>
3:           3                    700  29% <NA> <NA>  71%
4:           4                    900 <NA> 100% <NA> <NA>

推荐阅读