首页 > 解决方案 > 在 .txt 文件中打开 JSON 格式文件

问题描述

我被分配从 Twitter 读取多个实际上是 JSON 文件的 .txt 文件,但尝试使用 JSON 包加载文件时出现错误。

    with open(files_path+'/tweets.json.2019-01-15.txt') as f:
    string=f.read()
    data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)

我得到的错误是:

 File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)

我尝试打开其他文件,但结果相同,错误出现在第二行的第一列。

{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}

谢谢您的帮助。

标签: pythonjsontwitter

解决方案


这不是一个单一的 JSON 文档。它是一系列单独的 JSON 文档。而不是 using string=f.read(),您需要分别为每一行使用一个循环,例如:

    for line in f:
        data = json.loads(line)

推荐阅读