r - 基于R中的部分字符串匹配查找值
问题描述
我有一张包含一堆城市的表(表 1)(标点符号、大写字母和空格已被删除)。
我想扫描第二个表(表 2)并提取与其中任何位置完全匹配或包含字符串的任何记录(第一个)。
# Table 1
city1
1 waterloo
2 kitchener
3 toronto
4 guelph
5 ottawa
# Table 2
city2
1 waterlookitchener
2 toronto
3 hamilton
4 cityofottawa
这将给出下面看到的第三张表。
# Table 3
city1 city2
1 waterloo waterlookitchener
2 kitchener waterlookitchener
3 toronto toronto
4 guelph <N/A>
5 ottawa cityofottawa
解决方案
您也可以尝试使用fuzzyjoin
. 在这种情况下,您可以使用包中的函数stri_detect_fixed
来stringi
识别字符串中至少出现一次固定模式。
library(fuzzyjoin)
library(stringi)
library(dplyr)
fuzzy_right_join(table2, table1, by = c("city2" = "city1"), match_fun = stri_detect_fixed) %>%
select(city1, city2)
输出
city1 city2
1 waterloo waterlookitchener
2 kitchener waterlookitchener
3 toronto toronto
4 guelph <NA>
5 ottawa cityofottawa
数据
table1 <- structure(list(city1 = c("waterloo", "kitchener", "toronto",
"guelph", "ottawa")), class = "data.frame", row.names = c(NA,
-5L))
table2 <- structure(list(city2 = c("waterlookitchener", "toronto", "hamilton",
"cityofottawa")), class = "data.frame", row.names = c(NA, -4L
))
推荐阅读
- drools - Drools 7 - 推理与推断事实相关的事件
- sql - 在 SQL 脚本之外定义变量
- php - Twig date and time formatting with text in between
- git - git - RPC 失败;curl 18 传输已关闭,剩余未完成的数据 - 一切都是最新的
- c - ptrace(PTRACE_PEEKDATA, ...) 错误:数据转储
- vulkan - What does noautovalidity mean in Vulkan API schema (vk.xml)?
- python - How to execute python from conda environment by dvc run
- windows - Scheduled task creation in Ansible with option to keep it running forever
- python - Python - Unsure how to roll up row values within a column into a list
- c# - I'm in need of help for a prime counting program with no error, howerver still crashing (C#)