r - 使用 select、group_by 和 mutate 对具有 dplyr 的组进行跨行求和
问题描述
问题:我在一个汽车市场上制作了一个总市场份额变量,该市场销售了 286 种不同的车型,总共售出了 501 辆汽车。此组份额仅基于汽车特性:cat=“紧凑型”、“中型”、“大型”和 yr=77、78、79、80、81,以及份额,一个小的双变量;市场上共有15组。
我找到的最接近的答案:community.rstudio 上的 mishabalyasin:“使用 tidyeval 计算按行总计和比例?” 链接到 community.rstudio 上的帖子。
应用 select-split-combine 的原则是我最接近得到正确答案的是 15 个组(15 x 3(cat, yr, s)):
df<- blp %>%
select(cat,yr,s) %>%
group_by(cat,yr) %>%
summarise(group_share = sum(s))
#in my actual data, this is what fills by group share to get what I want, but this isn't the desired pipele-based answer
blp$group_share=0 #initializing the group_share, the 50th col
for(i in 1:501){
for(j in 1:15){
if((blp[i,31]==df[j,1])&&(blp[i,3]==df[j,2])){ #if(sameCat & sameYr){blpGS=dfGS}
blp[i,50]=df[j,3]
}
}
}
这很棒,但我知道这可以一举完成......希望从我上面描述的内容中可以清楚地看到这个想法。一个简单的修复可能是一个循环,并由 cat 和 yr 上的条件设置,这会有所帮助,但我真的想更好地使用 dplyr 处理数据,因此,沿着这条线获得流水线答案的任何见解都是精彩的。
网站示例:下面的示例不适用于我提供的代码,但这是我的数据的“外观”。份额是一个因素存在问题。
#45 obs, 3 cats, 5 yrs
cat=c( "compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large")
yr=c(77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81)
s=c(.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002)
blp=as.data.frame(cbind(unlist(lapply(cat,as.character,stringsAsFactors=FALSE)),as.numeric(yr),unlist(as.numeric(s))))
names(blp)<-c("cat","yr","s")
head(blp)
#note: one example of a group share would be summing the share from
(group_share.blp.large.81.s=(blp[cat== "large" &yr==81,]))
#works thanks to akrun: applying the code I provided for what leads to the 15 groups
df <- blp %>%
select(cat,yr,s) %>%
group_by(cat,yr) %>%
summarise(group_share = sum(as.numeric(as.character(s))))
#manually filling doesn't work, but this is what I'd want if I didn't want pipelining
blp$group_share=0
for(i in 1:45){
if( ((blp[i,1])==(df[j,1])) && (as.numeric(blp[i,2])==as.numeric(df[j,2]))){ #if(sameCat & sameYr){blpGS=dfGS}
blp[i,4]=df[j,3];
}
}
解决方案
如果我正确理解了您的问题,这应该会有所帮助!这里唯一的区别是,您可以使用 mutate 保留原始列并向它们添加聚合列,而不是使用将自动生成分组列和汇总列的汇总。
# Sample input
## 45 obs, 3 cats, 5 yrs
cat <- c( "compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large")
yr <- c(77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81)
s <- c(.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002)
# Calculation
blp <-
data.frame(cat, yr, s, stringsAsFactors = FALSE) %>% # To create dataframe
group_by(cat, yr) %>% # Grouping by category and year
mutate(group_share = sum(s, na.rm = TRUE)) %>% # Calculating sum share per category/year
ungroup()
预期产出 预期产出
推荐阅读
- terraform - 用于查询的雪花地形
- r - R:使用 R edgar 包从 SEC Edgar 数据库中读取旧的 13F txt 文件
- python - 在heroku上部署时推送被拒绝
- azure - 将文件上传到 Blob 存储中的内容长度错误
- python - 尝试将 Json 转换为 Dataframe 时,得到'TypeError:'float'类型的对象没有 len()'
- c# - 如何在 Windows 11 中找到 Windows 产品名称?
- wso2 - 通过在发布者 UI 中输入试用选项卡调用 API 时出错
- intellij-idea - 如何在 intelliJ 中打开 git repo 的特定分支
- authentication - .NET Core Web 应用程序的 Cognito 身份验证
- mysql - MySql docker 容器权限问题