首页 > 解决方案 > as_factor 函数的因子顺序异常

问题描述

我正在尝试使用 as_factor() 按照它们在基础数据中出现的顺序创建因子级别。该函数在底层数据是字符的情况下工作正常,但在它们是数字的情况下不能正常工作。

我尝试了 as_factor 文档中给出的示例代码。它使用底层字符变量,并给出与底层变量相同的顺序;这就是 as_factor 应该做的。但是对于数值变量,顺序是排序的,并且 as.factor 和 as_factor 给出相同的顺序。

# anomaly with as_factor -- is this a feature or a bug?
# Bill Anderson  August 2019
require(tidyverse)
#> Loading required package: tidyverse
#> Warning: package 'tidyverse' was built under R version 3.6.1
#> Warning: package 'dplyr' was built under R version 3.6.1
# example from as_factor documentation
x <- c("a", "z", "g")
as_factor(x) # preserves input order, as desired
#> [1] a z g
#> Levels: a z g
as.factor(x) # factor levels obtained by sorting data
#> [1] a z g
#> Levels: a g z
# numeric example 
y <- c(1, 3, 2)
as_factor(y) # factor levels obtained by sorting data -- not what I expected
#> [1] 1 3 2
#> Levels: 1 2 3
as.factor(y) # factor levels obtained by sorting data
#> [1] 1 3 2
#> Levels: 1 2 3
identical(as_factor(y), as.factor(y))
#> [1] TRUE
# explicit character conversion
z <- as.character(y)
as_factor(z) # preserves input order, as desired
#> [1] 1 3 2
#> Levels: 1 3 2
as.factor(z) # factor levels obtained by sorting data
#> [1] 1 3 2
#> Levels: 1 2 3
# one can also put everything into a data frame, 
# so we can see the impact of the factor order is clearly visible
mtcars %>% group_by(cyl) %>%
  summarize(meandisp = mean(disp)) # cylinder order is sorted
#> # A tibble: 3 x 2
#>     cyl meandisp
#>   <dbl>    <dbl>
#> 1     4     105.
#> 2     6     183.
#> 3     8     353.
mtcars %>% group_by(as_factor(cyl)) %>%
  summarize(meandisp = mean(disp)) # cylinder order is still sorted
#> # A tibble: 3 x 2
#>   `as_factor(cyl)` meandisp
#>   <fct>               <dbl>
#> 1 4                    105.
#> 2 6                    183.
#> 3 8                    353.
mtcars %>% group_by(as_factor(as.character(cyl))) %>%
  summarize(meandisp = mean(disp)) # cylinder order follows data
#> # A tibble: 3 x 2
#>   `as_factor(as.character(cyl))` meandisp
#>   <fct>                             <dbl>
#> 1 6                                  183.
#> 2 4                                  105.
#> 3 8                                  353.

reprex 包(v0.3.0)于 2019 年 8 月 15 日创建

没有错误消息。问题很简单,除非我明确转换为字符,否则我没有得到基础数据的顺序。

我不确定这种情况是功能还是错误。但如果它是一个特性,我建议它应该出现在函数文档中。

标签: rforcats

解决方案


推荐阅读