首页 > 解决方案 > 遍历多个 json 行

问题描述

我有一个包含数百个 JSON 行的文件。我写了一个小python脚本,可以让我提取一些数据,但它只适用于一行。我现在想知道如果有多个行,我如何循环遍历文件中的所有行。到目前为止,我所拥有的是:

import json
from pprint import pprint

"""with open('1st_run_fixed.json') as f:"""
with open('fixed.json') as f:
    data = json.load(f)

    print "--------------------------------------------";
    """get number of characters"""
    nchar = data["frames"]["frame"]["lps"]["lp"]["ncharacter"];
    print "Got "+nchar+" characters";
    for x in range (1,int(nchar)+1):
        x = str(x);
        print data["frames"]["frame"]["lps"]["lp"]["characters"]["char"+x]["code_ascii"]+"    "+data["frames"]["frame"]["lps"]["lp"]["characters"]["char"+x]["confidence"];
    print "--------------------------------------------";

适用于以下数据:

{"response":{"container":{"id":"41d6efcb-24d6-490d-8880-762255519b5f","timestamp":"2018-Jul-11 19:51:06.461665"},
"id":"00000002-0000-0000-0000-000000000015"},
"frames":{"frame":{"id":"5583","timestamp":"2016-Nov-30 13:05:27","lps":{"lp":{"licenseplate":"15451BBL","text":"15451BBL","wtext":"15451BBL","confidence":"20","bkcolor":"16777215","color":"16777215","type":"0","ntip":"11","cct_country_short":"","cct_state_short":"","tips":{"tip":{"poly":{"p":{"x":"1094","y":"643"},
"p":{"x":"1099","y":"643"},
"p":{"x":"1099","y":"667"},
"p":{"x":"1094","y":"667"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},
"tip":{"poly":{"p":{"x":"1103","y":"642"},
"p":{"x":"1113","y":"642"},
"p":{"x":"1112","y":"667"},
"p":{"x":"1102","y":"667"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},
"tip":{"poly":{"p":{"x":"1112","y":"640"},
"p":{"x":"1122","y":"640"},
"p":{"x":"1122","y":"666"},
"p":{"x":"1112","y":"666"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},
"tip":{"poly":{"p":{"x":"1123","y":"640"},
"p":{"x":"1132","y":"640"},
"p":{"x":"1131","y":"665"},
"p":{"x":"1123","y":"665"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},
"tip":{"poly":{"p":{"x":"1134","y":"640"},
"p":{"x":"1139","y":"640"},
"p":{"x":"1139","y":"664"},
"p":{"x":"1133","y":"664"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},
"tip":{"poly":{"p":{"x":"1154","y":"639"},
"p":{"x":"1163","y":"639"},
"p":{"x":"1163","y":"663"},
"p":{"x":"1153","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},
"tip":{"poly":{"p":{"x":"1164","y":"638"},
"p":{"x":"1173","y":"638"},
"p":{"x":"1173","y":"663"},
"p":{"x":"1163","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},
"tip":{"poly":{"p":{"x":"1191","y":"637"},
"p":{"x":"1206","y":"636"},
"p":{"x":"1205","y":"660"},
"p":{"x":"1190","y":"661"}},
"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"},
"tip":{"poly":{"p":{"x":"1103","y":"655"},
"p":{"x":"1111","y":"655"},
"p":{"x":"1111","y":"667"},
"p":{"x":"1103","y":"667"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},
"tip":{"poly":{"p":{"x":"1103","y":"655"},
"p":{"x":"1111","y":"655"},
"p":{"x":"1111","y":"667"},
"p":{"x":"1103","y":"667"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},
"tip":{"poly":{"p":{"x":"1176","y":"638"},
"p":{"x":"1185","y":"637"},
"p":{"x":"1184","y":"661"},
"p":{"x":"1175","y":"662"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"7"}},
"ncharacter":"8","characters":{"char1":{"poly":{"p":{"x":"1094","y":"643"},
"p":{"x":"1099","y":"643"},
"p":{"x":"1099","y":"667"},
"p":{"x":"1094","y":"667"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},
"char2":{"poly":{"p":{"x":"1103","y":"642"},
"p":{"x":"1113","y":"642"},
"p":{"x":"1112","y":"667"},
"p":{"x":"1102","y":"667"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},
"char3":{"poly":{"p":{"x":"1112","y":"640"},
"p":{"x":"1122","y":"640"},
"p":{"x":"1122","y":"666"},
"p":{"x":"1112","y":"666"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},
"char4":{"poly":{"p":{"x":"1123","y":"640"},
"p":{"x":"1132","y":"640"},
"p":{"x":"1131","y":"665"},
"p":{"x":"1123","y":"665"}},
"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},
"char5":{"poly":{"p":{"x":"1134","y":"640"},
"p":{"x":"1139","y":"640"},
"p":{"x":"1139","y":"664"},
"p":{"x":"1133","y":"664"}},
"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},
"char6":{"poly":{"p":{"x":"1154","y":"639"},
"p":{"x":"1163","y":"639"},
"p":{"x":"1163","y":"663"},
"p":{"x":"1153","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},
"char7":{"poly":{"p":{"x":"1164","y":"638"},
"p":{"x":"1173","y":"638"},
"p":{"x":"1173","y":"663"},
"p":{"x":"1163","y":"663"}},
"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},
"char8":{"poly":{"p":{"x":"1191","y":"637"},
"p":{"x":"1206","y":"636"},
"p":{"x":"1205","y":"660"},
"p":{"x":"1190","y":"661"}},
"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"}},
"det_time_us":"1072592","poly":{"p":{"x":"1088","y":"642"},
"p":{"x":"1210","y":"634"},
"p":{"x":"1210","y":"661"},
"p":{"x":"1087","y":"669"}}}},
"det_time_us":"1720812"}}}

但我也想让它适用于以下数据:

{"response":{"container":{"id":"80d996a1-c267-4fa4-b3f8-f61ff9fda198","timestamp":"2018-Jul-10 17:00:50.829709"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"398","timestamp":"2016-Nov-30 12:56:47.900000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"67","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"249"},
"p":{"x":"1559","y":"249"},
"p":{"x":"1559","y":"267"},
"p":{"x":"1553","y":"267"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},
"tip":{"poly":{"p":{"x":"1569","y":"248"},
"p":{"x":"1575","y":"248"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1585","y":"248"},
"p":{"x":"1591","y":"248"},
"p":{"x":"1591","y":"267"},
"p":{"x":"1585","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},
"tip":{"poly":{"p":{"x":"1602","y":"248"},
"p":{"x":"1607","y":"248"},
"p":{"x":"1607","y":"266"},
"p":{"x":"1602","y":"266"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},
"ncharacter":"6","characters":{"char1":{"poly":{"p":{"x":"1553","y":"249"},
"p":{"x":"1559","y":"249"},
"p":{"x":"1559","y":"267"},
"p":{"x":"1553","y":"267"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},
"char2":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},
"char3":{"poly":{"p":{"x":"1569","y":"248"},
"p":{"x":"1575","y":"248"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char4":{"poly":{"p":{"x":"1585","y":"248"},
"p":{"x":"1591","y":"248"},
"p":{"x":"1591","y":"267"},
"p":{"x":"1585","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},
"char5":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},
"char6":{"poly":{"p":{"x":"1602","y":"248"},
"p":{"x":"1607","y":"248"},
"p":{"x":"1607","y":"266"},
"p":{"x":"1602","y":"266"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},
"det_time_us":"776874","poly":{"p":{"x":"1543","y":"237"},
"p":{"x":"1618","y":"237"},
"p":{"x":"1618","y":"274"},
"p":{"x":"1543","y":"274"}}}},
"det_time_us":"1883017"}}}
{"response":{"container":{"id":"fa75e8f8-1b44-4f2f-a09b-6fe3b801ca1b","timestamp":"2018-Jul-10 17:00:55.863641"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"399","timestamp":"2016-Nov-30 12:56:48","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"tip":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"tip":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"ncharacter":"6","characters":{"char7":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"char8":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"char9":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char10":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"char11":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"char12":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"det_time_us":"600136","poly":{"p":{"x":"1543","y":"238"},
"p":{"x":"1618","y":"239"},
"p":{"x":"1619","y":"274"},
"p":{"x":"1543","y":"273"}}}},
"det_time_us":"1495308"}}}
{"response":{"container":{"id":"5c9c773c-a72a-488f-bc49-148dcd6cfa0a","timestamp":"2018-Jul-10 17:01:01.756522"},
"id":"00000002-0000-0000-0000-000000000002"},
"frames":{"frame":{"id":"400","timestamp":"2016-Nov-30 12:56:48.100000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"tip":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"tip":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"tip":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"tip":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"tip":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"ncharacter":"6","characters":{"char13":{"poly":{"p":{"x":"1553","y":"248"},
"p":{"x":"1560","y":"248"},
"p":{"x":"1560","y":"266"},
"p":{"x":"1554","y":"266"}},
"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},
"char14":{"poly":{"p":{"x":"1561","y":"248"},
"p":{"x":"1568","y":"248"},
"p":{"x":"1568","y":"267"},
"p":{"x":"1561","y":"267"}},
"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},
"char15":{"poly":{"p":{"x":"1569","y":"247"},
"p":{"x":"1576","y":"247"},
"p":{"x":"1576","y":"267"},
"p":{"x":"1569","y":"267"}},
"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},
"char16":{"poly":{"p":{"x":"1586","y":"248"},
"p":{"x":"1592","y":"248"},
"p":{"x":"1592","y":"267"},
"p":{"x":"1586","y":"267"}},
"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},
"char17":{"poly":{"p":{"x":"1593","y":"248"},
"p":{"x":"1600","y":"248"},
"p":{"x":"1600","y":"267"},
"p":{"x":"1593","y":"267"}},
"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},
"char18":{"poly":{"p":{"x":"1601","y":"249"},
"p":{"x":"1608","y":"249"},
"p":{"x":"1608","y":"265"},
"p":{"x":"1601","y":"265"}},
"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},
"det_time_us":"457492","poly":{"p":{"x":"1543","y":"238"},
"p":{"x":"1618","y":"239"},
"p":{"x":"1619","y":"274"},
"p":{"x":"1543","y":"273"}}}},
"det_time_us":"1311946"}}}

我怎样才能完成这项工作?

我的脚本当前返回:

Traceback (most recent call last):
  File "read.py", line 8, in <module>
    data = json.load(f)
  File "/usr/lib/python2.7/json/__init__.py", line 291, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 367, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 68 column 1 - line 202 column 1 (char 3182 - 9548)

shell returned 1

当我运行大文件时。

标签: pythonjsonpython-2.7loopsdictionary

解决方案


我有一个包含数百个 JSON 行的文件。

不,你没有,这就是问题所在。


数百个 JSON 文本不是有效的 JSON 文件。一个有效的 JSON 文件只是一个文本。这就是为什么json.load返回错误。


数百个 JSON 文本,每一个都正好在一行中,中间有换行符,JSONlines 或 NDJ 等其他格式的有效文件。它仍然不是一个有效的 JSON 文件,所以你不能使用json.load,但你可以使用 JSONlines 或 NDJ 库,或者像这样解析它:

with open('fixed.json') as f:
    for line in f:
        data = json.loads(line)
        # do stuff

再次,对于编写 JSONlines 文件,您可以使用 JSONlines 库,或者您可以确保每个 JSON 文本没有嵌入的换行符(如果您未指定非默认值ensure_asciiindent参数,则默认情况下实际上会发生这种情况)并且只需写出json.dumps(data) + "\n"每个值。


但是,数百个每个占用多行的 JSON 文本并不是一个有效的任何文件。

这实际上json模块文档中进行了解释:

注意 与pickleand不同marshal,JSON 不是框架协议,因此尝试通过重复调用dump()使用相同fp来序列化多个对象将导致 JSON 文件无效。

“不是框架协议”的意思基本上是格式不明确。例如,如果您执行了 a json.dump(2, f),然后执行了 a json.dump(3, f),那么您将在文件中得到23. 这与你得到的东西是一样的json.dump(23, f)


如果您可以将文件修复为有效的文件,例如 JSONlines,那么这就是简单的解决方案。


如果你不能……</p>

好吧,在标准化之前,有一个“JSON 文档”的概念,它基本上意味着一个 JSON 文本,它要么是一个数组,要么是一个对象。并且 JSON 文档流是没有歧义的。

由于这不是标准格式,您可能不会为它找到解析器,因此您必须自己编写一个。

一种方法是使用模块raw_decode中的方法。json这将尝试解码 JSON 文本,可能在其后添加额外内容,并将索引返回到该额外内容。在您的情况下,它是下一个 JSON 文档。

由于数百个这种大小的对象并不太大,因此将整个文件读入内存然后解析它可能更简单,所以我们不必担心缓冲:

with open('fixed.json') as f:
    contents = f.read()
decoder = json.JSONDecoder()
while contents:
    data, idx = decoder.raw_decode(contents)
    do_stuff(data)
    contents = contents[idx:].lstrip()

请记住,这仅在您的文件是 JSON 文档流时才有效——也就是说,顶级值始终是 Array 或 Object。此外,如果您手动编辑这些文件,不像 JSONlines,它可以跳过一个错误的文本并继续解析其余部分,现在有一种方法可以从这里的错误中恢复,因为您不知道下一个文档从哪里开始。


推荐阅读