r - 在 R 中使用 trimws 函数后如何包含所有数据?
问题描述
10 'Referer URl' 的示例如下所示
https://www.google.com/ | query_string=utm_source=google&utm_medium=cpc&utm_campaign=121434112139&utm_term=&utm_content=Shirts&gclid=CXjadiOcHGGw6JEiJaf5zMhRxFk-AOtiXMOd_1szoBoCUEMQAvD_BwE | ip_address=123.21.62.57 | user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0
https://www.Type2online.com/ | query_string=null | ip_address=113.193.43.211 | user_agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
https://www.google.com/ | query_string=gclid=CjwKCAjwh7H7BRBBEiwAPXjadn8fnPPR6HnqZrsK46JGDHKOo-C2jxHa1JW7V2glY_Lxi6sNo-AAdRoCDAcQAvD_BwE | ip_address=187.11.116.117 | user_agent=Mozilla/5.0 (Linux; Android 8.0.0; SM-C701F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Mobile Safari/537.36
Other URLs with no parameters are
https://m.facebook.com/
instagram.com
https://l.facebook.com
/https://www.google.com/
http://m.facebook.com
我正在使用下面的代码来分隔上述 URL 参数并为每个参数创建一个新列
Mydata$ref_url<-trimws(matrix(unlist(strsplit(as.character(Mydata$'Referer URL'),'|',fixed=TRUE)),ncol = 4, byrow = TRUE)[,1])
Mydata$query_string<-gsub("query_string=","",trimws(matrix(unlist(strsplit(as.character(Mydata$'Referer URL'),'|',fixed=TRUE)),ncol = 4, byrow = TRUE)[,2]))
Mydata$ip_address<-gsub("ip_address=","",trimws(matrix(unlist(strsplit(as.character(Mydata$'Referer URL'),'|',fixed=TRUE)),ncol = 4, byrow = TRUE)[,3]))
Mydata$user_agent<-gsub("user_agent=","",trimws(matrix(unlist(strsplit(as.character(Mydata$'Referer URL'),'|',fixed=TRUE)),ncol = 4, byrow = TRUE)[,4]))
使用这些功能中的每一个,我都会收到以下错误:
Error: Assigned data `trimws(...)` must be compatible with existing data.
x Existing data has 2645 rows.
x Assigned data has 1096 rows.
i Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In matrix(unlist(strsplit(as.character(Mydata$"Referer URL"), "|", :
data length [4382] is not a sub-multiple or multiple of the number of rows [1096]
如何纠正这个问题?
解决方案
tidyverse
如果您可以保证所有参数具有相同的顺序,则使用以下代码给出所需的输出:
library(tidyverse)
ref %>% separate(V1, paste0("V",2:5), sep=" \\| ") -> separated
names(separated) <- c("url", gsub("=.+", "", separated[1,2:4]))
separated %>% mutate_all( ~ sub(".+?=","", .))
#> url query_string ip_address user_agent
#> 1 https://www.google.com/ utm_source=google&utm_medium=cpc&utm_campaign=121434112139&utm_term=&utm_content=Shirts&gclid=CXjadiOcHGGw6JEiJaf5zMhRxFk-AOtiXMOd_1szoBoCUEMQAvD_BwE 123.21.62.57 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0
#> 2 https://www.Type2online.com/ null 113.193.43.211 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36
#> 3 https://www.google.com/ gclid=CjwKCAjwh7H7BRBBEiwAPXjadn8fnPPR6HnqZrsK46JGDHKOo-C2jxHa1JW7V2glY_Lxi6sNo-AAdRoCDAcQAvD_BwE 187.11.116.117 Mozilla/5.0 (Linux; Android 8.0.0; SM-C701F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Mobile Safari/537.36
#> 4 https://m.facebook.com/ <NA> <NA> <NA>
#> 5 instagram.com <NA> <NA> <NA>
#> 6 https://l.facebook.com <NA> <NA> <NA>
#> 7 /https://www.google.com/ <NA> <NA> <NA>
#> 8 http://m.facebook.com <NA> <NA> <NA>
推荐阅读
- python - Pythonic 计算可降水量的方法?
- telegram - Telegram.Bot SendLocationAsync 仅在作为服务运行时返回“输入字符串的格式不正确”
- minecraft - 如何告诉 config.yml 文件中的 spigot 插件“用另一个字符串替换我!”
- r - 更改 bal.plot 生成的图例
- spring-security - onExpiredSessionDetected 处的 Spring Security 会话空指针错误
- java - 如何在角度应用程序上显示来自springboot的异常消息
- javascript - HTML - 在背景上创建响应式顶部导航菜单
- c - 创建一个随机数组并检查用户输入是否在其中
- sql - SQL 查询结果集更改列值组
- mfc - 我的 HiDPI 显示器上的 Visual Studio MFC 对话框编辑器似乎坏了 - 这是一个已知问题吗?