r - 匹配包含字符串模式的 URL,并将 URL 保存在 R 数据框中的新列中
问题描述
我有一个数据集,其中包含多个 url 作为一列“urls”中的字符串
urls <- "https://www.linkedin.com/xx/xxx-xx-xxx/ https//domain.io https://medium.com/@xxxxx"
id <- 1
df <- cbind(data.frame(urls), data.frame(id))
我现在想提取与“linkedin.com”匹配的完整域并将其存储在新列 df$linkedin 中。并对匹配“medium.com”的域执行相同操作,并将其存储在新列 df$medium 中。所以结果基本上是
df$linkedin
"https://www.linkedin.com/xx/xxx-xx-xxx/"
df$medium
"https://medium.com/@xxxxx"
不知何故,我今天的发型很糟糕,没有看到一个优雅的解决方案。如果你能在这里帮助我,那就太棒了:)
解决方案
我将通过将其设置为两行来使其更有趣:
df2 <- structure(list(urls = c("https://www.linkedin.com/xx/xxx-xx-xxx/ https//domain.io https://medium.com/@xxxxx", "https://www.linkedin.com/yy/yyy-yy-yyy/ https//domain.io https://medium.com/@yyyyy"), id = c(1, 1)), row.names = c(NA, -2L), class = "data.frame")
df2
# urls id
# 1 https://www.linkedin.com/xx/xxx-xx-xxx/ https//domain.io https://medium.com/@xxxxx 1
# 2 https://www.linkedin.com/yy/yyy-yy-yyy/ https//domain.io https://medium.com/@yyyyy 1
碱基R
baseurls <- c("linkedin", "medium")
newcols <- lapply(setNames(nm = baseurls), function(U) unlist(regmatches(df2$urls, gregexpr(paste0("http[^ ]*", U, "[^ ]*"), df2$urls))))
newcols
# $linkedin
# [1] "https://www.linkedin.com/xx/xxx-xx-xxx/" "https://www.linkedin.com/yy/yyy-yy-yyy/"
# $medium
# [1] "https://medium.com/@xxxxx" "https://medium.com/@yyyyy"
cbind(df2, data.frame(newcols))
# urls id linkedin medium
# 1 https://www.linkedin.com/xx/xxx-xx-xxx/ https//domain.io https://medium.com/@xxxxx 1 https://www.linkedin.com/xx/xxx-xx-xxx/ https://medium.com/@xxxxx
# 2 https://www.linkedin.com/yy/yyy-yy-yyy/ https//domain.io https://medium.com/@yyyyy 1 https://www.linkedin.com/yy/yyy-yy-yyy/ https://medium.com/@yyyyy
tidyverse
## baseurls <- ...
library(dplyr)
library(stringr) # str_extract
library(purrr) # map_dfc
map_dfc(setNames(nm = baseurls), ~ str_extract(df2$urls, paste0("http[^ ]*", .x, "[^ ]*"))) %>%
bind_cols(df2, .)
# urls id linkedin medium
# 1 https://www.linkedin.com/xx/xxx-xx-xxx/ https//domain.io https://medium.com/@xxxxx 1 https://www.linkedin.com/xx/xxx-xx-xxx/ https://medium.com/@xxxxx
# 2 https://www.linkedin.com/yy/yyy-yy-yyy/ https//domain.io https://medium.com/@yyyyy 1 https://www.linkedin.com/yy/yyy-yy-yyy/ https://medium.com/@yyyyy
推荐阅读
- guice - Google guice persist 抛出错误 No Persistence provider for EntityManager named test
- python - matplotlib Python 库如何“显示”如此高质量的图形?
- c# - 如何在 C# 中拥有动态属性
- vba - VBA 应用程序工作表函数错误:1004
- html - 使 scss/angular 中的动态颜色变暗
- python - 我怎样才能正确地迭代这个循环?
- python - 如何修复 Python 中的 Traceback 模块错误?
- c++ - 按下按钮关闭命令(退出按钮)并检查数组
- c# - 如何在 xamarin 中使用加速度计移动椭圆
- android - 操作系统更新后,Flutter 调试应用程序卡在天文台监听 Vivo 手机上的 ***********