python - 从引号内的 JSON 字符串中删除 \r\n 以获得多行
问题描述
我有一个包含多个顺序 JSON 对象的大文本文件。据我所知,单独解释/加载 JSON 对象的最佳方法是从文本文件中获取它们并将它们放在单独的行中,以便我可以逐行遍历它们。
不幸的是,我无法让 python 将它们分成单独的行,而不会破坏 JSON 结构到难以辨认的程度。此外,这些文件非常大,并且包含大量信息。请让我知道最好的方法a)将不同的JSON对象字符串放到python中的不同行上,或者b)单独解析信息的更好方法。
以下是文件中文本的样子:
"{\"time\":\"Fri Aug 09 18:55:37 +0000 2019\", \"id\":720,\"text\":\"I'd really like to find a good solution to this problem.\",\"source\":\"href=\\\"http:\\/\\/stackoverflow.com\\\",\"lang\":\"en\",\"timestamp_ms\":\"1565376937344\"}\r\n""{\"time\":\"Sat Aug 10 22:16:00 +0000 2019\", \"id\":721,\"text\":\"And I would appreciate your help!\",\"source\":\"href=\\\"http:\\/\\/stackoverflow.com\\\",\"lang\":\"en\",\"timestamp_ms\":\"156534564531\"}\r\n""{\"time\":\"Sun Aug 09 18:55:37 +0000 2019\", \"id\":720,\"text\":\"Imagine additional text repeating below.\",\"source\":\"href=\\\"http:\\/\\/stackoverflow.com\\\",\"lang\":\"en\",\"timestamp_ms\":\"1565376937344\"}\r\n"
如果将上述文本分配给 python 对象并要求 python 打印它,python 会返回我想看到的内容,即:
{"time":"Fri Aug 09 18:55:37 +0000 2019", "id":720,"text":"I'd really like to find a good solution to this problem.","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"1565376937344"}
{"time":"Sat Aug 10 22:16:00 +0000 2019", "id":721,"text":"And I would appreciate your help!","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"156534564531"}
{"time":"Sun Aug 09 18:55:37 +0000 2019", "id":720,"text":"Imagine additional text repeating below.","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"1565376937344"}
但是,如果我将文件读取到 python 对象并打印该对象,我会得到原始文本。我试过f.read()
, readline()
, readlines()
, splitlines()
(这给了我一堆乱七八糟的额外 \\s),我试过用splitstring()
. 我非常茫然,我承认我对编码还很陌生,从来没有真正坐下来学习基础知识。
您可以给我的任何帮助来获取上述文本并最终能够将它们翻译成单独的 JSON 对象并阅读,例如,每个文本都会很棒。我的最终目标是能够从各个 json 对象中调用字典键,如下所示:
for line in f:
data = json.loads(line)
print(data[‘text’])
并获得以下列表
"I'd really like to find a good solution to this problem."
"And I would appreciate your help!"
"Imagine additional text repeating below."
解决方案
如果我理解正确的问题,使用literal_eval()
可能会做你需要的:
from ast import literal_eval
with open('json_strings.txt') as file:
for line in file:
for line in literal_eval(line).splitlines():
print(line)
样本输出:
{"time":"Fri Aug 09 18:55:37 +0000 2019", "id":720,"text":"I'd really like to find a good solution to this problem.","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"1565376937344"}
{"time":"Sat Aug 10 22:16:00 +0000 2019", "id":721,"text":"And I would appreciate your help!","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"156534564531"}
{"time":"Sun Aug 09 18:55:37 +0000 2019", "id":720,"text":"Imagine additional text repeating below.","source":"href=\"http:\/\/stackoverflow.com\","lang":"en","timestamp_ms":"1565376937344"}
推荐阅读
- linux - 根据扩展标准移动文件的脚本
- c++ - C++ .find() 函数在目录中搜索文件名
- python - 为什么我的精确召回和 ROC 曲线不平滑?
- macos - 从 MacOS 上的日志文件中获取常用命令计数
- javascript - Angular中的重复功能失败
- apache-zeppelin - zeppelin 0.8 中的 json 导入错误
- android - Firebase 读取 getter 设置器
- ios - iOS 使导航栏透明,同时防止视图在导航栏下滑动
- python - 如何使用 scapy 和 python 提取 SSL/TLS 消息?
- ios - iOS AudioUnit 以 8kHz 采样率录制时无声