r - 按组划分的频率
问题描述
我有一个数据框,其中包含以下变量 day(工作日从 1 到 7)和时间变量 t1 到 t7,记录在特定时间执行的活动。
我想确定每个同源时间段在 7 个工作日内发生了多少次相同的活动。
输入:
day t1 t2 t3 t4 t5 t6 t7
1 1 0 1 0 0 0 1
1 1 0 1 0 4 0 1
4 2 3 1 0 1 0 1
5 1 1 1 0 0 0 1
输出:
time Most frequent
t1 1
t2 0,1,3
t3 1
t4 0
t5 0
t6 0
t7 1
解决方案
这是一个dplyr
解决方案:
df %>%
pivot_longer(-day) %>%
group_by(name,value) %>%
distinct() %>%
mutate(freq = n()) %>%
group_by(name) %>%
filter(freq == max(freq)) %>%
select(name, value) %>%
distinct() %>%
group_by(name) %>%
summarise(`Most frequent` = paste(value, collapse = ",")) %>%
rename(time = name)
这使:
time `Most frequent`
<chr> <chr>
1 t1 1
2 t2 0,3,1
3 t3 1
4 t4 0
5 t5 0
6 t6 0
7 t7 1
这是带有一些注释的代码:
df %>%
pivot_longer(-day) %>% # Structuring data in long format
group_by(name,value) %>% # Grouping by name(t#) and value(activity)
distinct() %>% # Selecting distinct instances of time + activity (i.e. day + t#)
mutate(freq = n()) %>% # Counting unique occurances of time + activity
group_by(name) %>% # Grouping by time
filter(freq == max(freq)) %>% # Filtering to select only the most frequent cases
select(name, value) %>% # Selecting only the variables name and value
distinct() %>% # Filtering for unique occurances
group_by(name) %>% # Grouping by name (time)
summarise(`Most frequent` = paste(value, collapse = ",")) %>% # Aggregating by time, pasting values on separate rows together with a comma separating the values
rename(time = name) # Renaming variable name to time
推荐阅读
- android - FCM 通知导致 Expo 构建的 Android 应用重新加载
- javascript - 将属性限制为在其他属性的对象数组中定义的值?
- gitlab-ci - Gitlab CI & Code Climate JSON:可视化结果?
- ruby-on-rails - 如何使用rails_admin gem从rails中的另一个表中下拉?
- shell - 如何使用文档的行数在 shell 中设置“for i in seq”或等效项
- python - 如何检查单个朴素贝叶斯预测的结果?
- javascript - for循环检查哪个日期是星期几(周末或工作日)
- java - 请求许可后继续 Android Studio
- azure - 如何将消息从 ServiceBusQueueTrigger Azure 函数发送到 Durable Function
- json - 使用 jq 将 JSON 导出为 CSV