首页 > 解决方案 > 按组统计重新排序因子

问题描述

我知道这应该是直截了当的,但它总是咬我。
假设我有一个因素:

library(dplyr)
library(forcats)
fruits <- as.factor(c("apples", "oranges", "oranges", "pears", "pears", "pears"))
df <- as.data.frame(fruits)

我想根据它们的频率(或其他一些统计数据)对因子重新排序,以便梨>橘子>苹果。如果不明确调用,我该怎么做df %>% group_by(fruits) %>% summarise(freq = n()) %>% fct_reorder(fruits, freq, .desc = TRUE)

标签: rdplyrtidyverseforcats

解决方案


我们可能需要在mutate.

library(dplyr)
library(forcats)
out <- df %>% 
   group_by(fruits) %>% 
   summarise(freq = n(), .groups = 'drop') %>% 
   mutate(fruits = fct_reorder(fruits, freq, .desc = TRUE))

-检查顺序levels

levels(out$fruits)
[1] "pears"   "oranges" "apples" 
levels(df$fruits)
[1] "apples"  "oranges" "pears"  

如果我们想在原始数据集上执行此操作,而不是summarise,使用add_count创建频率列,并应用fct_reorder

df <- df %>% 
    add_count(fruits) %>% 
    mutate(fruits = fct_reorder(fruits, n, .desc = TRUE)) %>% 
    select(-n)

注意:group_byin 1.0.6-versiondplyr没有.desc参数。.desc发现于fct_reorder


base R中,我们可以这样做table

out1 <- table(fruits)
factor(fruits, levels = names(out1[order(-out1)]))
[1] apples  oranges oranges pears   pears   pears  
Levels: pears oranges apples

推荐阅读