首页 > 解决方案 > 格式错误的 JSON 缺少逗号分隔符,在 R 中插入逗号

问题描述

我是 R 新手,有一个 json 文件,其中包含我希望转换为 R 数据帧的数据,该文件已按以下格式抓取: 在此处输入图像描述

图片指示了数据被错误抓取的位置,因为没有插入逗号来分隔条目。我已经尝试使用扫描读取数据并使用以下代码分离成一个列表(然后读入一个df):

indices <- grep(":[{",x, fixed=TRUE)

n <- length(indices)
l <- vector("list", n);
for(i in 1:n) {
  ps <- substr(x ,indices[[i]], indices[i+1])  ## where i is whatever your Ps is
  l[[i]] <- ps
}

但是我得到了空字符串和 NAN 值。我尝试使用 jsonlite、tidyjson、rjson 进行解析,但没有任何运气(这是有道理的,因为 json 格式错误)。本文似乎与我的 json 结构相匹配,但由于缺少逗号,该解决方案无法正常工作。当文件作为一个字符串读入时,如何在 R 中的每个“ {”entries“:[ ”实例之前插入一个逗号?

更新:第一,第二和第三个条目

{"entries":[{"url":"/leonardomso/playground","name":"playground","lang":"TypeScript","desc":"Playground using React, Emotion, Relay, GraphQL, MongoDB.","stars":5,"forks":"2","updated":"2021-03-24T09:35:44Z","info":["react","reactjs","graphql","typescript","hooks","apollo","boilerplate","!DOCTYPE html \"\""],"repo_url":"/leonardomso?tab=repositories"}
{"entries":[{"url":"/leonardomso/playground","name":"playground","lang":"TypeScript","desc":"Playground using React, Emotion, Relay, GraphQL, MongoDB.","stars":5,"forks":"2","updated":"2021-03-24T09:35:44Z","info":["react","reactjs","graphql","typescript","hooks","apollo","boilerplate","!DOCTYPE html \"\""],"repo_url":"/leonardomso?tab=repositories"}
{"entries":[{"url":"/shiffman/Presentation-Manager","name":"Presentation-Manager","lang":"JavaScript","desc":"Simple web app to manage student presentation schedule.","stars":17,"forks":"15","updated":"2021-01-19T15:28:55Z","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"url":"/shiffman/A2Z-F20","name":"A2Z-F20","lang":"JavaScript","desc":"ITP Course Programming from A to Z Fall 2020","stars":40,"forks":"31","updated":"2020-12-21T13:52:58Z","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"desc":"","stars":null,"forks":"","info":[]},{"url":"/shiffman/RunwayML-Object-Detection","name":"RunwayML-Object-Detection","lang":"JavaScript","desc":"Object detection model with RunwayML, node.js, and p5.js","stars":16,"forks":"2","updated":"2020-11-15T23:36:36Z","info":[]},{"url":"/shiffman/ShapeClassifierCNN","name":"ShapeClassifierCNN","lang":"JavaScript","desc":"test code for new tutorial","stars":11,"forks":"1","updated":"2020-11-06T15:02:26Z","info":[]},{"url":"/shiffman/Bot-Code-of-Conduct","name":"Bot-Code-of-Conduct","desc":"Code of Conduct to guide ethical bot making practices","stars":15,"forks":"1","updated":"2020-10-15T18:30:26Z","info":[]},{"url":"/shiffman/Twitter-Bot-A2Z","name":"Twitter-Bot-A2Z","lang":"JavaScript","desc":"New twitter bot examples","stars":26,"forks":"2","updated":"2020-10-13T16:17:45Z","info":["hacktoberfest","!DOCTYPE html \"\""],"repo_url":"/shiffman?tab=repositories"}

标签: rdataframegsub

解决方案


您可以使用

gsub('}{"entries":[', '},{"entries":[', x, fixed=TRUE)

所以,这是{"entries":[},{"entries":[.

请注意fixed=TRUE禁用正则表达式引擎解析字符串的参数。


推荐阅读