首页 > 解决方案 > 过滤和总结但保持“零”

问题描述

我在多个站点收集了数据。在每个地点,物种被识别(物种)和计数(数量)。我还记录了他们与我发生的距离(距离)。一个样本数据集是:

library(tidyverse)
library(dplyr)

Data <- data.frame(
  Site = c("1", "1", "1", "1", "2", "3", "3"),
  Species = c("abc", "bcd", "abc", "kjh", "jh", "abc", "gd"),
  Number = c(10,1,1,1,1,1,1),
  Distance = c("50m", "60m", "In", "In", "Out", "In", "In")
)

我想计算:(A)独特物种的数量和(B)每个地点每个物种的个体数量。但是,我希望过滤掉所有距离 == “Out”。我尝试了以下过滤器:

Filtered <- Data %>%
  filter(Distance %in% c(
    "50m", 
    "60m",
    "In"))

然后创建了我的摘要:

summary <- Filtered %>%
  group_by(Site) %>% 
  summarize(richness = n_distinct(Species), count = sum(Number))
summary
# A tibble: 2 x 3
  Site  richness count
  <fct>    <int> <dbl>
1 1            3    13
2 3            2     2

但我真正需要的是:

# A tibble: 3 x 3
  Site  richness count
  <fct>    <int> <dbl>
1 1            3    13
2 2            0     0
3 3            2     2

换句话说,我不希望将“Out”站点包含在汇总计算中,但我想表明在“non-Out”距离处有 0 个物种。

我错过了更好的方法吗?

标签: r

解决方案


我们可以在步骤group_by Site分组后过滤“Out”条目。summarize

library(dplyr)
Data %>%
  group_by(Site) %>%
  summarize(richness = n_distinct(Species[Distance != "Out"]), 
            count = sum(Number[Distance != "Out"]))


#  Site  richness count
#  <fct>    <int> <dbl>
#1 1            3    13
#2 2            0     0
#3 3            2     2

推荐阅读