首页 > 解决方案 > JSONDecodeError 在 Python 中打开多行文本文件

问题描述

我正在尝试打开从 hdfs 提取的文本文件,提取某些值,然后将此文件输出到单行 csv 文件中。下面是文本文件的“内容”以及我用来提取数据和输出的代码:

#file.txt
{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}

{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}

...

初始代码:

wanted_data = []
with open('file.txt', 'r') as f:
  for line in f:
    json_data = json.loads(line)
    wanted_data.append(json_data['videoId'])
    for i in range(6):
      wanted_data.append(json_data['scores'][i]['bucket'])
      wanted_data.append(json_data['scores'][i]['value'])

with open('file.csv', 'w+') as f_out:
  write = csv.writer(f_out)
  write.writerow(wanted_data)

这会导致 JSONDecode 错误:

/usr/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

我应该加载这个文本文件的正确方法是什么?

标签: pythonjsontext

解决方案


看起来您在 JSON 字符串之间有空行。在处理之前检查该行实际上有一些文本:

wanted_data = []
with open('file.txt', 'r') as f:
  for line in f:
    if line.strip():
      json_data = json.loads(line)
      wanted_data.append(json_data['videoId'])
      for score in json_data['scores']:
        wanted_data.append(score['bucket'])
        wanted_data.append(score['value'])

推荐阅读