r - Extract string within first two quotation marks using regular expressions?
问题描述
There is a vector of strings that looks like the following (text with two or more substrings in quotation marks):
vec <- 'ab"cd"efghi"j"kl"m"'
The text within the first pair of quotation marks (cd) contains a useful identifier (cd is the desired output). I have been studying how to use regular expressions but I haven't learned how to find the first and second occurrences of something like quotation marks.
Here's how I have been getting cd:
tmp <- strsplit(vec,split="")[[1]]
paste(tmp[(which(tmp=='\"')[1]+1):(which(tmp=='\"')[2]-1)],collapse="")
"cd"
My question is, is there another way to find "cd" using regular expressions? in order to learn more how to use them. I prefer base R solutions but will accept an answer using packages if that's the only way. Thanks for your help.
解决方案
Match everything except "
then capture everything upto next "
and replace captured group by itself.
gsub( '[^"]*"([^"]*).*', '\\1', vec)
[1] "cd"
For detailed explanation of regex you can see this demo
推荐阅读
- performance - 如何加速这个 python 循环脚本或并行化它
- typescript - 我怎样才能让 TypeScript 相信我从这个链中得到了 [number, number] ?
- flask - 如何将使用 SQLAlchemy 创建的数据库导入 Flask 应用程序?
- c# - 通常要求呼叫者通过队列进行通信?
- c# - 具有数组结构的 JSON
- java - 错误:不兼容的类型:org.opencv.core.Point 无法转换为 android.graphics.Point
- processing - 如何多次迭代同一个草图-(处理)
- python - Pandas - 读取文本文件
- node.js - 访问嵌套的 React App 路由时出现 NGINX 404 错误
- react-native - React Navigation v.5 选项卡栏图标导航到模态