首页 > 解决方案 > 按组划分的频率

问题描述

我有一个数据框,其中包含以下变量 day(工作日从 1 到 7)和时间变量 t1 到 t7,记录在特定时间执行的活动。

我想确定每个同源时间段在 7 个工作日内发生了多少次相同的活动。

输入:

day t1 t2 t3 t4 t5 t6 t7
   1  1  0  1  0  0  0  1
   1  1  0  1  0  4  0  1
   4  2  3  1  0  1  0  1
   5  1  1  1  0  0  0  1

输出:

time   Most frequent
t1     1    
t2     0,1,3       
t3     1
t4     0
t5     0
t6     0
t7     1

标签: rdataframe

解决方案


这是一个dplyr解决方案:

df %>% 
  pivot_longer(-day) %>% 
  group_by(name,value) %>% 
  distinct() %>% 
  mutate(freq = n()) %>% 
  group_by(name) %>% 
  filter(freq == max(freq)) %>% 
  select(name, value) %>% 
  distinct() %>% 
  group_by(name) %>% 
  summarise(`Most frequent` = paste(value, collapse = ",")) %>% 
  rename(time = name)

这使:

  time  `Most frequent`
  <chr> <chr>          
1 t1    1              
2 t2    0,3,1          
3 t3    1              
4 t4    0              
5 t5    0              
6 t6    0              
7 t7    1 

这是带有一些注释的代码:

df %>% 
  pivot_longer(-day) %>% # Structuring data in long format
  group_by(name,value) %>% # Grouping by name(t#) and value(activity)
  distinct() %>%  # Selecting distinct instances of time + activity (i.e. day + t#)
  mutate(freq = n()) %>% # Counting unique occurances of time + activity
  group_by(name) %>% # Grouping by time
  filter(freq == max(freq)) %>% # Filtering to select only the most frequent cases
  select(name, value) %>% # Selecting only the variables name and value
  distinct() %>% # Filtering for unique occurances
  group_by(name) %>% # Grouping by name (time)
  summarise(`Most frequent` = paste(value, collapse = ",")) %>% # Aggregating by time, pasting values on separate rows together with a comma separating the values
  rename(time = name) # Renaming variable name to time

推荐阅读