r - 使用 R 根据其他列中的匹配将字符串添加到目标字段
问题描述
我有一个数据框,其中包含三个填充列(Submitted_Name、Status、Accepted_Name)和一个空列(Flag)。
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
我想根据 Submitted_Name 和 Accepted_Name 字段中特定短语的存在用某些字符串填充“标志”。如果“很短”。或“pre Herbarium Practice”出现在 Submitted_Name 中,那么我希望“提交的名称是园艺”或“提交的名称是 pre Herbarium practice”出现在“Flag”中。如果短语“var”。或“forma”或“_x”或“comb.ined”出现在 Accepted_Name 字段中,则应将“variety”、“form”、“hybrid”或“accepted name is comb.ined”添加到“Flag”。如果没有触发短语,则“标志”保持空白。
概括:
来自 Submitted_Name
好极了。= 提交的名称是园艺的
pre Herbarium Practice = 提交的名称是 pre Herbarium practice
来自 Accepted_Name
变种。= 多样性
形式 = 形式
_x = 混合
comb.ined = 接受的名称是 comb.ined
期望的结果是:
data.post <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('variety; submitted name is horticultural','form','hybrid; submitted name is pre herbarium practice','','accepted name is comb.ined.')
)
对于只需要将单个值添加到“标志”的情况,我可以使用下面费力的重复代码来管理它(并且我可以保留这种形式):
Master.Taxonomy$Flag <- ifelse(grepl("var.", Master.Taxonomy$Accepted_Name), "variety", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("comb.ined.", Master.Taxonomy$Accepted_Name), "accepted name is comb.ined.", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("_x", Master.Taxonomy$Accepted_Name), "hybrid", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("pre Herbarium Practice", Master.Taxonomy$Submitted_Name), "submitted name is pre herbarium practice", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("hort.", Master.Taxonomy$Submitted_Name), "submitted name is horticultural", Master.Taxonomy$Flag)
Master.Taxonomy$Flag <- ifelse(grepl("forma", Master.Taxonomy$Accepted_Name), "form", Master.Taxonomy$Flag)
但是,如果要添加两个或多个值,则后者会覆盖前者,而我只剩下最后添加到“标志”中的任何内容。我试过弄糊涂,但把自己绑在了结上。请注意,短语出现在“标志”中的顺序并不重要,感谢您的帮助!
解决方案
您可以case_when
并str_detect
为此目的。您可以创建两个不同的列,而不是在同一列中执行所有操作,一个用于提交的标志,另一个用于接受的标志,最后,您可以将两者结合使用unite
以获得所需的结果。
library(tidyverse)
data.pre <- data.frame(
'Submitted_Name' = c('Aa achalensis Schltr. hort.','Aa argyrolepis Rchb.f.','Aa aurantiaca D.Trujillo pre Herbarium Practice','Aa brevis Schltr.','Aa calceata (Rchb.f.) Schltr.'),
'Status' = c('accepted','accepted','accepted','synonym','accepted'),
'Accepted_Name' = c('Aa achalensis var. alba Schltr.','Aa argyrolepis forma beta Rchb.f.','Aa aurantiaca_x D.Trujillo','Myrosmodes breve (Schltr.) Garay','Aa calceata (Rchb.f.) Schltr. comb.ined.'),
'Flag' = c('','','','','')
)
data.pre %>%
mutate(f1 = case_when(Submitted_Name %>% str_detect("hort") ~ "submitted name is horticultural",
Submitted_Name %>% str_detect("pre Herbarium Practice") ~ "submitted name is pre herbarium practice"),
f2 = case_when(Accepted_Name %>% str_detect("var.") ~ "variety",
Accepted_Name %>% str_detect("comb.ined.") ~ "accepted name is comb.ined.",
Accepted_Name %>% str_detect("_x") ~ "hybrid",
Accepted_Name %>% str_detect("forma") ~ "form")) %>%
unite("Flag", c(f2,f1), na.rm = T, sep = "; ")
#> Submitted_Name Status
#> 1 Aa achalensis Schltr. hort. accepted
#> 2 Aa argyrolepis Rchb.f. accepted
#> 3 Aa aurantiaca D.Trujillo pre Herbarium Practice accepted
#> 4 Aa brevis Schltr. synonym
#> 5 Aa calceata (Rchb.f.) Schltr. accepted
#> Accepted_Name
#> 1 Aa achalensis var. alba Schltr.
#> 2 Aa argyrolepis forma beta Rchb.f.
#> 3 Aa aurantiaca_x D.Trujillo
#> 4 Myrosmodes breve (Schltr.) Garay
#> 5 Aa calceata (Rchb.f.) Schltr. comb.ined.
#> Flag
#> 1 variety; submitted name is horticultural
#> 2 form
#> 3 hybrid; submitted name is pre herbarium practice
#> 4
#> 5 accepted name is comb.ined.
由reprex 包(v0.3.0)于 2021-01-30 创建
推荐阅读
- javascript - 快速变化时反应状态重置/表现不佳
- jsp - 如何从查询参数中获取授权代码 // Amazon Cognito 授权端点
- scala - Scala Spark 创建多个列
- windows - 如何在 Windows CMD 中永久设置 PATH?
- php - WordPress 查找具有特定元值的帖子的 ID
- reactjs - 反应.JS | create-react-app 加载时间非常长,制作一个 react 应用程序需要一个小时
- spring - Spring Webflux:我是否需要捕获(Throwable)来触发 Mono.doOnError?
- spring - 在构建查询时休眠添加 Id 列字段
- java - 如何在休眠中使用 pg_column_size?
- c - 执行或启动调试模式后 Eclipse CDT 错误 193