首页 > 解决方案 > 在两个数据框中寻找共同的条目

问题描述

df1:
 state group species
1 CA 2 cat, dog, chicken, mouse
2 CA 1 cat
3 NV 1 dog, chicken
4 NV 2 chicken
5 WA 1 chicken, rat, mouse, lion
6 WA 2 dog, cat
7 WA 3 dog, chicken
8 WA 4 cat, chicken

df2:
 state special_species
1 CA cat
2 CA chicken
3 CA mouse
4 WA cat
5 WA chicken
6 NV dog

我有兴趣确定哪些 special_speciesdf2存在于df1. 我想要一个具有stategroupspecial_species.

预期输出:

state group special_species
CA 2 cat, chicken, mouse
CA 1 cat
NV 1 dog
NV 2 NA
WA 1 chicken
WA 2 cat
WA 3 chicken
WA 4 cat, chicken

标签: rdplyrtidyrsummarize

解决方案


这比我想象的要难。我认为以下工作,但希望有人能想出更漂亮的东西。

首先,我们制作一些数据来处理(请以后自己做),如果其他人想要尝试,我会包括在内:

library(tidyverse)

df1 <- tribble(
  ~state, ~group, ~species,
  "CA", 2, "cat, dog, chicken, mouse",
  "CA", 1, "cat",
  "NV", 1, "dog, chicken",
  "NV", 2, "chicken",
  "WA", 1, "chicken, rat, mouse, lion",
  "WA", 2, "dog, cat",
  "WA", 3, "dog, chicken",
  "WA", 4, "cat, chicken")

df2 <- tribble(
  ~state, ~special_species,
  "CA", "cat",
  "CA", "chicken",
  "CA", "mouse",
  "WA", "cat",
  "WA", "chicken",
  "NV", "dog")

那么解决方案是:

df1 %>% 
  separate_rows(species) %>% 
  full_join(df2, on = "state") %>%
  filter(species == special_species) %>%
  group_by(state, group) %>%
  summarise(species = paste(special_species, collapse = ", ")) %>%
  full_join(df1, by = c("state" = "state", "group" = "group")) %>%
  select(state, group, special_species = species.x) %>%
  arrange(state)
#> Joining, by = "state"
#> # A tibble: 8 x 3
#> # Groups:   state [3]
#>   state group special_species    
#>   <chr> <dbl> <chr>              
#> 1 CA        1 cat                
#> 2 CA        2 cat, chicken, mouse
#> 3 NV        1 dog                
#> 4 NV        2 <NA>               
#> 5 WA        1 chicken            
#> 6 WA        2 cat                
#> 7 WA        3 chicken            
#> 8 WA        4 cat, chicken

如果您接受格式略有不同的所需输出,则可以显着简化代码,例如,以下是正确的保存NA

df1 %>% 
  separate_rows(species) %>% 
  full_join(df2, on = "state") %>%
  filter(species == special_species) %>%
  group_by(state, group) %>%
  summarise(species = paste(special_species, collapse = ", "))
#> Joining, by = "state"
#> # A tibble: 7 x 3
#> # Groups:   state [3]
#>   state group species            
#>   <chr> <dbl> <chr>              
#> 1 CA        1 cat                
#> 2 CA        2 cat, chicken, mouse
#> 3 NV        1 dog                
#> 4 WA        1 chicken            
#> 5 WA        2 cat                
#> 6 WA        3 chicken            
#> 7 WA        4 cat, chicken

reprex 包(v0.3.0)于 2019 年 12 月 3 日创建


推荐阅读