首页 > 解决方案 > 层次结构中的崩溃因素

问题描述

是否有预先构建的方法来折叠层次结构中的因素?我知道我可以手动折叠。我知道我可以使用 fct_lump 函数自动折叠我最稀有的因素。但是,如果我需要它来“尊重”等级制度,有没有办法?

在最简单的版本中 - 如果我有一个与宠物有关的因素并且有 5 个级别:猫 (9)、狗 (10)、兔子 (4)、豚鼠 (3) 和鱼 (2)。如果我使用 fct_lump_n(3),我会将豚鼠和鱼混为一谈。在我的数据中,我想将兔子和豚鼠放在一起作为“其他哺乳动物”或“Rhodents”。我显然可以用 fct_collapse 手动完成。但是有什么方法可以通过将层次结构的其余部分添加到我的数据结构中,这可以是自动的吗?

这是我的数据的粗略示例:

require(tidyverse)

myData <- tibble (
    pet = as_factor(c("dog", "cat", "guinea pig", "rat", "rabbit", "hamster", "mouse", "gerbil", "goldfish", "tropical fish",
            "snake", "gecko", "tortoise", "chicken", "budgie", "parrot", 
            "dog", "cat", "guinea pig",  "rabbit", "hamster", "gerbil", "goldfish", 
            "snake", "gecko", "budgie", 
            "dog", "cat", "guinea pig", "dog", "dog", "dog", "dog", "cat", "cat", "cat",
            "cat", "dog", "cat", "dog", "cat", "dog", "chicken", "rabbit", "rabbit", "hamster")
))

myClassifications <- tibble (
    pet = as_factor(c("dog", "cat", "guinea pig", "rat", "rabbit", "hamster", "mouse", "gerbil", "goldfish", "tropical fish",
            "snake", "gecko", "tortoise", "chicken", "budgie", "parrot")),
    class = as_factor(c("mammal", "mammal", "mammal", "mammal", "mammal", "mammal", "mammal", "mammal",  "fish", "fish", "reptile", "reptile", "reptile", "bird", "bird", "bird")),
    family = as_factor(c("canine", "feline", "rodent","rodent","rodent","rodent","rodent","rodent", "cold water fish", "warm water fish", "legless repitle", "repitle with legs", "reptile with legs", "flightless bird", "flighted bird", "flighted bird"))
)

# Add classifcations to my data

myData %>%
    left_join(myClassifications, by="pet") -> myData

基本输出是这样的:

> myData %>%
     group_by(pet) %>% #class, family, pet) %>%
     summarise(n())
宠物 n()
10
9
豚鼠 3
1
兔子 4
仓鼠 3
1
沙鼠 2
金鱼 2
热带鱼 1
2
壁虎 2
乌龟 1
2
虎皮鹦鹉 2
鹦鹉 1

我想尝试得到类似的东西:

宠物 n()
10
9
啮齿动物 16
3
爬行动物 5
鸟类 5

我可能会用 for 循环来做一些事情。但是,如果有一个包或功能可以为我做这件事,它可能会提供一种不那么主观的方式来做这件事。实际上,我认为我正在做的是从哺乳动物、爬行动物等开始,如果有“足够”的数据有意义,就扩大水平……我们如何定义有意义当然是主观的!在我上面的表格中,有 48 个数据点,如果数据小于整体的 10%,我没有展开。(类似于 fct_lump_prop(0.10) )

有简单的解决方案吗?

标签: rhierarchyforcats

解决方案


推荐阅读