python - 在 .txt 文件中打开 JSON 格式文件
问题描述
我被分配从 Twitter 读取多个实际上是 JSON 文件的 .txt 文件,但尝试使用 JSON 包加载文件时出现错误。
with open(files_path+'/tweets.json.2019-01-15.txt') as f:
string=f.read()
data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)
我得到的错误是:
File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)
我尝试打开其他文件,但结果相同,错误出现在第二行的第一列。
{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}
谢谢您的帮助。
解决方案
这不是一个单一的 JSON 文档。它是一系列单独的 JSON 文档。而不是 using string=f.read()
,您需要分别为每一行使用一个循环,例如:
for line in f:
data = json.loads(line)
推荐阅读
- python - Python:查找循环组的所有生成器
- javascript - Using prototypes to gain more performance
- reactjs - fetch 被告知调用一个 http 资源,但它调用了一个 https 资源
- python - 每个标签的多个复选框
- c++ - 为什么我的 C++ 程序给了我错误的输出?
- django - 在 django 中指定自定义用户模型并调用对象
- c# - 在文本框、单选按钮或下拉菜单上临时存储值
- python - 使用 pyopengl 创建透明窗口/叠加层
- c++ - 将 Poco StreamSocket 转换为 SecureStreamSocket
- python - 从 xarray DataArray 中删除坐标