首页 > 解决方案 > Extract string within first two quotation marks using regular expressions?

问题描述

There is a vector of strings that looks like the following (text with two or more substrings in quotation marks):

vec <- 'ab"cd"efghi"j"kl"m"'

The text within the first pair of quotation marks (cd) contains a useful identifier (cd is the desired output). I have been studying how to use regular expressions but I haven't learned how to find the first and second occurrences of something like quotation marks.

Here's how I have been getting cd:

tmp <- strsplit(vec,split="")[[1]]
paste(tmp[(which(tmp=='\"')[1]+1):(which(tmp=='\"')[2]-1)],collapse="")
"cd"

My question is, is there another way to find "cd" using regular expressions? in order to learn more how to use them. I prefer base R solutions but will accept an answer using packages if that's the only way. Thanks for your help.

标签: rregex

解决方案


Match everything except " then capture everything upto next " and replace captured group by itself.

gsub( '[^"]*"([^"]*).*', '\\1', vec)

[1] "cd"

For detailed explanation of regex you can see this demo


推荐阅读