首页 > 解决方案 > 使用 R 根据其他列中的匹配将字符串添加到目标字段

问题描述

我有一个数据框,其中包含三个填充列(Submitted_Name、Status、Accepted_Name)和一个空列(Flag)。

data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)

我想根据 Submitted_Name 和 Accepted_Name 字段中特定短语的存在用某些字符串填充“标志”。如果“很短”。或“pre Herbarium Practice”出现在 Submitted_Name 中,那么我希望“提交的名称是园艺”或“提交的名称是 pre Herbarium practice”出现在“Flag”中。如果短语“var”。或“forma”或“_x”或“comb.ined”出现在 Accepted_Name 字段中,则应将“variety”、“form”、“hybrid”或“accepted name is comb.ined”添加到“Flag”。如果没有触发短语,则“标志”保持空白。

概括:

来自 Submitted_Name

好极了。= 提交的名称是园艺的

pre Herbarium Practice = 提交的名称是 pre Herbarium practice

来自 Accepted_Name

变种。= 多样性

形式 = 形式

_x = 混合

comb.ined = 接受的名称是 comb.ined

期望的结果是:

data.post <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('variety; submitted name is horticultural','form','hybrid; submitted name is pre herbarium practice','','accepted name is comb.ined.')
)

对于只需要将单个值添加到“标志”的情况,我可以使用下面费力的重复代码来管理它(并且我可以保留这种形式):

Master.Taxonomy$Flag <- ifelse(grepl("var.", Master.Taxonomy$Accepted_Name), "variety", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("comb.ined.", Master.Taxonomy$Accepted_Name), "accepted name is comb.ined.", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("_x", Master.Taxonomy$Accepted_Name), "hybrid", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("pre Herbarium Practice", Master.Taxonomy$Submitted_Name), "submitted name is pre herbarium practice", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("hort.", Master.Taxonomy$Submitted_Name), "submitted name is horticultural", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("forma", Master.Taxonomy$Accepted_Name), "form", Master.Taxonomy$Flag)

但是,如果要添加两个或多个值,则后者会覆盖前者,而我只剩下最后添加到“标志”中的任何内容。我试过弄糊涂,但把自己绑在了结上。请注意,短语出现在“标志”中的顺序并不重要,感谢您的帮助!

标签: rconditional-statementspaste

解决方案


您可以case_whenstr_detect为此目的。您可以创建两个不同的列,而不是在同一列中执行所有操作,一个用于提交的标志,另一个用于接受的标志,最后,您可以将两者结合使用unite以获得所需的结果。

library(tidyverse)

data.pre <- data.frame(
  'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
  'Status' = c('accepted','accepted','accepted','synonym','accepted'),
  'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
  'Flag' = c('','','','','')
)

data.pre %>% 
  mutate(f1 = case_when(Submitted_Name %>% str_detect("hort") ~ "submitted name is horticultural",
                        Submitted_Name %>% str_detect("pre Herbarium Practice") ~ "submitted name is pre herbarium practice"),
         f2 = case_when(Accepted_Name %>% str_detect("var.") ~ "variety",
                        Accepted_Name %>% str_detect("comb.ined.") ~ "accepted name is comb.ined.",
                        Accepted_Name %>% str_detect("_x") ~ "hybrid",
                        Accepted_Name %>% str_detect("forma") ~ "form")) %>% 
  unite("Flag", c(f2,f1), na.rm = T, sep = "; ")
#>                                    Submitted_Name   Status
#> 1                     Aa achalensis Schltr. hort. accepted
#> 2                          Aa argyrolepis Rchb.f. accepted
#> 3 Aa aurantiaca D.Trujillo pre Herbarium Practice accepted
#> 4                               Aa brevis Schltr.  synonym
#> 5                   Aa calceata (Rchb.f.) Schltr. accepted
#>                              Accepted_Name
#> 1          Aa achalensis var. alba Schltr.
#> 2        Aa argyrolepis forma beta Rchb.f.
#> 3               Aa aurantiaca_x D.Trujillo
#> 4         Myrosmodes breve (Schltr.) Garay
#> 5 Aa calceata (Rchb.f.) Schltr. comb.ined.
#>                                               Flag
#> 1         variety; submitted name is horticultural
#> 2                                             form
#> 3 hybrid; submitted name is pre herbarium practice
#> 4                                                 
#> 5                      accepted name is comb.ined.

reprex 包(v0.3.0)于 2021-01-30 创建


推荐阅读