r - 格式错误的 JSON 缺少逗号分隔符,在 R 中插入逗号
问题描述
我是 R 新手,有一个 json 文件,其中包含我希望转换为 R 数据帧的数据,该文件已按以下格式抓取:
图片指示了数据被错误抓取的位置,因为没有插入逗号来分隔条目。我已经尝试使用扫描读取数据并使用以下代码分离成一个列表(然后读入一个df):
indices <- grep(":[{",x, fixed=TRUE)
n <- length(indices)
l <- vector("list", n);
for(i in 1:n) {
ps <- substr(x ,indices[[i]], indices[i+1]) ## where i is whatever your Ps is
l[[i]] <- ps
}
但是我得到了空字符串和 NAN 值。我尝试使用 jsonlite、tidyjson、rjson 进行解析,但没有任何运气(这是有道理的,因为 json 格式错误)。本文似乎与我的 json 结构相匹配,但由于缺少逗号,该解决方案无法正常工作。当文件作为一个字符串读入时,如何在 R 中的每个“ {”entries“:[ ”实例之前插入一个逗号?
更新:第一,第二和第三个条目
{"entries":[{"url":"/leonardomso/playground","name":"playground","lang":"TypeScript","desc":"Playground using React, Emotion, Relay, GraphQL, MongoDB.","stars":5,"forks":"2","updated":"2021-03-24T09:35:44Z","info":["react","reactjs","graphql","typescript","hooks","apollo","boilerplate","!DOCTYPE html \"\""],"repo_url":"/leonardomso?tab=repositories"}
{"entries":[{"url":"/leonardomso/playground","name":"playground","lang":"TypeScript","desc":"Playground using React, Emotion, Relay, GraphQL, MongoDB.","stars":5,"forks":"2","updated":"2021-03-24T09:35:44Z","info":["react","reactjs","graphql","typescript","hooks","apollo","boilerplate","!DOCTYPE html \"\""],"repo_url":"/leonardomso?tab=repositories"}
{"entries":[{"url":"/shiffman/Presentation-Manager","name":"Presentation-Manager","lang":"JavaScript","desc":"Simple web app to manage student presentation schedule.","stars":17,"forks":"15","updated":"2021-01-19T15:28:55Z","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"url":"/shiffman/A2Z-F20","name":"A2Z-F20","lang":"JavaScript","desc":"ITP Course Programming from A to Z Fall 2020","stars":40,"forks":"31","updated":"2020-12-21T13:52:58Z","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"url":"/shiffman/RunwayML-Object-Detection","name":"RunwayML-Object-Detection","lang":"JavaScript","desc":"Object detection model with RunwayML, node.js, and p5.js","stars":16,"forks":"2","updated":"2020-11-15T23:36:36Z","info":[]},{"url":"/shiffman/ShapeClassifierCNN","name":"ShapeClassifierCNN","lang":"JavaScript","desc":"test code for new tutorial","stars":11,"forks":"1","updated":"2020-11-06T15:02:26Z","info":[]},{"url":"/shiffman/Bot-Code-of-Conduct","name":"Bot-Code-of-Conduct","desc":"Code of Conduct to guide ethical bot making practices","stars":15,"forks":"1","updated":"2020-10-15T18:30:26Z","info":[]},{"url":"/shiffman/Twitter-Bot-A2Z","name":"Twitter-Bot-A2Z","lang":"JavaScript","desc":"New twitter bot examples","stars":26,"forks":"2","updated":"2020-10-13T16:17:45Z","info":["hacktoberfest","!DOCTYPE html \"\""],"repo_url":"/shiffman?tab=repositories"}
解决方案
您可以使用
gsub('}{"entries":[', '},{"entries":[', x, fixed=TRUE)
所以,这是{"entries":[
用},{"entries":[
.
请注意fixed=TRUE
禁用正则表达式引擎解析字符串的参数。
推荐阅读
- java - IntelliJ:在非模块化应用程序中的模块路径上放置一些依赖项
- node.js - 试图从 express 的 url 路径中获取用户名
- python - Seaborn:将 seaborn 从 0.9.0 升级到 0.11.0 后无法添加色调
- java - Android studio 游戏速度不同的设备
- javascript - 如何使用javascript在给定单词的输出中打印字典单词
- apache-spark - Databricks“使用创建表”选项记录在哪里
- algorithm - 给定两个未排序的数组,找出 A[i] > X 和 B[i] > Y 的对数
- python - 将多个路径作为命令行参数传递的最佳实践,以说明带有空格的路径
- reactjs - 您如何使用 Framer Motion 为填充设置动画?
- c - 带有命名管道的程序有时工作有时会失败