r - 按多个变量分组并总结字符频率
问题描述
我正在尝试按多个变量对我的数据集进行分组,并构建一个字符变量出现次数的频率表。这是一个示例数据集:
Location State County Job Pet
Ohio Miami Data Dog
Urban Ohio Miami Business Dog, Cat
Urban Ohio Miami Data Cat
Rural Kentucky Clark Data Cat, Fish
City Indiana Shelby Business Dog
农村肯塔基克拉克数据狗,鱼俄亥俄迈阿密数据狗,猫城市俄亥俄迈阿密商务狗,猫农村肯塔基克拉克数据鱼城印第安纳谢尔比商务猫
我希望我的输出看起来像这样:
Location State County Job Frequency Pet:Cat Pet:Dog Pet:Fish
Ohio Miami Data 2 1 2 0
Urban Ohio Miami Business 2 2 2 0
Urban Ohio Miami Data 1 1 0 0
Rural Kentucky Clark Data 3 1 1 3
City Indiana Shelby Business 2 1 1 0
我尝试了以下代码的不同迭代,我接近了,但不太正确:
Output<-df%>%group_by(Location, State, County, Job)%>%
dplyr::summarise(
Frequency= dplyr::n(),
Pet:Cat = count(str_match(Pet, "Cat")),
Pet:Dog = count(str_match(Pet, "Dog")),
Pet:Fish = count(str_match(Pet, "Fish")),
)
任何帮助,将不胜感激!先感谢您
解决方案
尝试这个:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
separate_rows(Pet,sep=',') %>%
mutate(Pet=trimws(Pet)) %>%
group_by(Location,State,County,Job,Pet) %>%
summarise(N=n()) %>%
mutate(Pet=paste0('Pet:',Pet)) %>%
group_by(Location,State,County,Job,.drop = F) %>%
mutate(Freq=n()) %>%
pivot_wider(names_from = Pet,values_from=N,values_fill=0)
输出:
# A tibble: 5 x 8
# Groups: Location, State, County, Job [5]
Location State County Job Freq `Pet:Cat` `Pet:Dog` `Pet:Fish`
<chr> <chr> <chr> <chr> <int> <int> <int> <int>
1 "" Ohio Miami Data 2 1 2 0
2 "City" Indiana Shelby Business 2 1 1 0
3 "Rural" Kentucky Clark Data 3 1 1 3
4 "Urban" Ohio Miami Business 2 2 2 0
5 "Urban" Ohio Miami Data 1 1 0 0
使用的一些数据:
#Data
df <- structure(list(Location = c("", "Urban", "Urban", "Rural", "City",
"Rural", "", "Urban", "Rural", "City"), State = c("Ohio", "Ohio",
"Ohio", "Kentucky", "Indiana", "Kentucky", "Ohio", "Ohio", "Kentucky",
"Indiana"), County = c("Miami", "Miami", "Miami", "Clark", "Shelby",
"Clark", "Miami", "Miami", "Clark", "Shelby"), Job = c("Data",
"Business", "Data", "Data", "Business", "Data", "Data", "Business",
"Data", "Business"), Pet = c("Dog", "Dog, Cat", "Cat", "Cat, Fish",
"Dog", "Dog, Fish", "Dog, Cat", "Dog, Cat", "Fish", "Cat")), row.names = c(NA,
-10L), class = "data.frame")
推荐阅读
- mongodb - 如何从 Mongodb 聚合新文档
- css - CSS Bug:Mat-Table 中的列是粘性的,但是当我滚动时不显示border-right
- c# - 从 C# 应用程序调用 c++ DLL
- android - 即使文件存在,Android/Kotlin 中的 java.lang.FileNotFoundException
- linux - 为什么我必须更改 postfix 的 linux 主机名?
- azure - OAuth 2.0 SAML 不记名断言流程
- javascript - 订阅外的角度访问数据
- opencart - Opencart 检查当前页面是否为产品
- javascript - 如何在 react-slick 中将类添加到 li 标签
- c# - 如何修复“值”附近的错误语法