首页 > 解决方案 > 使用 R 从字符串向量中搜索列

问题描述

我有带有地址的列。我想解析它并且只有州名。下面是我的专栏

structure(list(BreweryName = c("(512) Brewing Company", "0 Mile Brewing Company", 
"10 Barrel Brewing", "10 Barrel Brewing - Eastside Pub", "10 Barrel Brewing - Portland Pub", 
"10 Barrel Brewing Co."), BreweryAddress = c("407 Radam LnSte F200Austin, Texas, 78745-1197United States(512) 921-1545", 
"11 W 2nd StHummelstown, Pennsylvania, 17036-1506United States(717) 319-0133", 
"1501 E StSan Diego, California, 92101United States", "62950 NE 18th StBend, Oregon, 97701United States(541) 241-7733", 
"1411 NW Flanders StPortland, Oregon, 97209-2620United States(541) 585-1007", 
"830 W Bannock StBoise, Idaho, 83702-5857United States(208) 344-5870"
)), row.names = c(4L, 6L, 8L, 10L, 12L, 14L), class = "data.frame")

我有另一个向量,我想比较它并替换 .

v<- c("Texas","Pennsylvania","Oregon","Oregon","Idaho")

我确实尝试过使用matchgrep但它返回了NA's.

标签: r

解决方案


这是使用的基本 R 选项grepl

v <- c("Texas","Pennsylvania","Oregon","Oregon","Idaho")
states <- paste0("\\b", v, "\\b", collapse="|")
states

[1] "\\bTexas\\b|\\bPennsylvania\\b|\\bOregon\\b|\\bOregon\\b|\\bIdaho\\b"

df[grepl(states, df$BreweryAddress), ]

我打印出来states以便清楚我们使用什么正则表达式模式来搜索啤酒厂地址。我们使用每个州名的交替,包含在单词边界标记中。这确保我们不会意外匹配恰好包含某个状态名称作为子字符串的字符串。


推荐阅读