r - 提取向量中的所有值
问题描述
经过大量研究后,我无法找到在不知道键的情况下从 R 中的数据框中提取值的解决方案
我有一个名为 test 的数据框,它有一个公共向量(具有一些 json 值)和一个向量作为 id。记录以百万计
例子
id common
1 {ename=pageload, pgloc={from=https://m.amazon.com/gift/popular-aloe-vera-plant?gclid=CjwKCAjw8MD7BRArEiwAGZsrBZh6cWJ1-PGvFC1zMutwfjBJuGROHhW4l_ZtcH3n2ZvPSotsTO-sgxoCucAQAvD_BwE, to=https://m.amazon.com/gift/popular-aloe-vera-plant?gclid=CjwKCAjw8MD7BRArEiwAGZsrBZh6cWJ1-PGvFC1zMutwfjBJuGROHhW4l_ZtcH3n2ZvPSotsTO-sgxoCucAQAvD_BwE#/product-page, clikd=}, dev={ver=1.0, blang=en-GB, ip=27.5.192.167, dtype=Mobile, ua=Mozilla/5.0 (Linux; Android 10; HD1901) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.127 Mobile Safari/537.36, did=bc0a2740-d89e-11ea-b35c-567d21f9cbe3, appos=Android, appversion=null, model=null, osver=null, brand=null, pfspec=web}, fngid=81f8036d1099bd59ff93454d1f8, dname=fnp.com, user={cur=, id=sufeb@gmail.com}, wegid=null}
2 {ename=pageload, pgloc={from=https://www.google.com/, to=https://m.amazon.com/gift/red-velvet-fresh-cream-cake?gclid=CjwKCAjw8MD7BRArEiwAGZsrBd464AGGzOLMzzaxggCPNU-onDOZuhUqzz3tB6UOIUneNq6rcduxUxoCjXwQAvD_BwE#/product-page, clikd=}, dev={ver=1.0, blang=en-US, ip=106.217.118.179, dtype=Mobile, ua=Mozilla/5.0 (Linux; Android 8.1.0; vivo 1724) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.92 Mobile Safari/537.36, did=d43e7274-8116-11ea-96ae-b627f142e667, appos=Android, appversion=null, model=null, osver=null, brand=null, pfspec=web}, fngid=aeb8d109630f797980ac4cc4066d4c4b, dname=fnp.com, user={cur=, id=}, wegid=null}
I want to extract all of the values in separate columns in same data frame. Example for id 1 user={cur=, id=sufeb@gmail.com} has values where in id 2 user={cur=, id=} has no values. Hence i would need all values irrespective of keys
试过这个但不起作用
library(tidyverse)
library(rjson)
extract_json_column <- function(test){
test %>%
rowwise() %>%
mutate(data = map(common, fromJSON)) %>%
split(.$index) %>%
map(~.$common[[1]]) %>%
map(~map_if(., function(x) length(x) != 1, list)) %>%
map(as_data_frame) %>%
bind_rows(.id = "index")
}
df <- do.call(plyr::rbind.fill, lapply(test[test != ""], function(x) as.data.frame(t(unlist(fromJSON(x))))))
错误信息
Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
按键 ename 和 did 上的示例输出。但是需要所有可能的键,例如 fngid,dtype,而无需键入所有键
id ename did
1 pageload bc0a2740-d89e-11ea-b35c-567d21f9cbe3
2 pageload d43e7274-8116-11ea-96ae-b627f142e667
链接到数据样本 https://docs.google.com/spreadsheets/d/17Gsrq-BCKqRA6csQdQH0duK4olc39TqmnbDYGExhXqo/edit#gid=0
链接到 gdrive https://drive.google.com/file/d/1TxSY97DRT8ih2VJENolar3MJXwtSFTGX/view?usp=sharing
解决方案
您可以在基础 R 中编写一个函数:
convert_to_json_string <- function(x){
key_update<-gsub("(?<=[{]|, )(\\w+)=",'"\\1"=', x, perl = TRUE)
update_vals <- gsub('(?<="=)([^{}"]+)(?=(, ")|[}])','"\\1"', key_update, perl=TRUE)
update_empty_vals<-gsub('(?<="=)(?=[,}])','""',update_vals, perl=TRUE)
sprintf("[%s]", toString(gsub('"=', '":', update_empty_vals)))
}
然后做:
jsonlite::fromJSON(convert_to_json_string(test$common))
结果太大,无法在此处发布,但结构如下:
str(jsonlite::fromJSON(convert_to_json_string(test$common)))
'data.frame': 10 obs. of 7 variables:
$ ename: chr "pageload" "pageload" "pageload" "pageload" ...
$ pgloc:'data.frame': 10 obs. of 3 variables:
..$ from : chr "https://m.amazon.com/gift/popular-aloe-vera-plant?gclid=CjwKCAjw8MD7BRArEiwAGZsrBZh6cWJ1-PGvFC1zMutwfjBJuGROHhW"| __truncated__ "https://www.google.com/" "plp~designer-cakes" "" ...
..$ to : chr "https://m.amazon.com/gift/popular-aloe-vera-plant?gclid=CjwKCAjw8MD7BRArEiwAGZsrBZh6cWJ1-PGvFC1zMutwfjBJuGROHhW"| __truncated__ "https://m.amazon.com/gift/red-velvet-fresh-cream-cake?gclid=CjwKCAjw8MD7BRArEiwAGZsrBd464AGGzOLMzzaxggCPNU-onDO"| __truncated__ "home" "https://www.amazon.com/" ...
..$ clikd: chr "" "" "back" "" ...
$ dev :'data.frame': 10 obs. of 12 variables:
..$ ver : chr "1.0" "1.0" "1.0" "1.0" ...
..$ blang : chr "en-GB" "en-US" "en-US" "en-US" ...
..$ ip : chr "27.5.192.167" "106.217.118.179" "223.237.31.203" "103.240.195.254" ...
..$ dtype : chr "Mobile" "Mobile" "Mobile" "De
推荐阅读
- tensorflow - 将张量流转置转换为 pytorch
- mysql - SQL - 如何使 BEFORE UPDATE 触发器作为 BEFORE INSERT 工作?
- angularjs - 无法通知AngularJS中从工厂到摘要周期的更改
- arrays - 在一个for循环中循环遍历2个数组?
- api - Coinmarketcap 上市的 Total Supply api
- javascript - 如何将动态表格中的文本替换为图片
- javascript - 在同一脚本中设置的函数被定义然后未定义
- python-bob - DataFolder 类未检测到训练样本
- reactjs - 在导航(firebase)之间切换时将项目读取到列表中
- discord.js - 如何让我的机器人在没有命令的情况下重复一条消息?