python - 如何在python中加入两个json文件而不是嵌套for循环
问题描述
每次我会从文件 1 中获得 500 条记录来加入包含超过 100000 条记录的文件 2,它需要两分钟!
with open(file1,'r') as f1,open(file2,'r') as f2:
a = json.load(f1)
b = json.load(f2)
list_a = []
for i in range(len(a)):
for n in range(len(b)):
if b[n]["id"] == a[i]["id"]:
list_a.append(dict(b[n], **a[i]))
with open(result,'w') as f3:
json.dump(list_a, f3,sort_keys=True, ensure_ascii=False)
文件1:
[{ "id":"1", "name":"Tom" },
{ "id":"2", "name":"Jim" },
{ "id":"3", "name":"Bob" },
{ "id":"4", "name":"Jeny" },
{ "id":"5", "name":"Lara" },
{ "id":"6", "name":"Lin" },
{ "id":"7", "name":"Kim" },
{ "id":"8", "name":"Jack" },
{ "id":"9", "name":"Tony" }]
文件 2:
[ { "id":"1", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] },
{ "id":"2", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ]
结果:
[ { "id":"1", "name":"Tom", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] },
{ "id":"2", "name":"Jim", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ]
解决方案
我没有经验知道这是否会加快速度。Eugene Yarmash 提供的以下解决方案似乎更可靠。我也没有大文件来测试速度,但是您可以尝试看看使用集合是否会加快迭代速度。如果它会改变任何东西,我实际上会很好奇:
File1 = [ { "id":"1", "name":"Tom" }, { "id":"2", "name":"Jim" }, { "id":"3", "name":"Bob" }, { "id":"4", "name":"Jeny" }, { "id":"5", "name":"Lara" }, { "id":"6", "name":"Lin" }, { "id":"7", "name":"Kim" }, { "id":"8", "name":"Jack" }, { "id":"9", "name":"Tony" } ]
File2 = [ { "id":"1", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] }, { "id":"2", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ]
from collections import defaultdict
d = defaultdict(dict)
for l in (File1, File2):
for elem in l:
d[elem['id']].update(elem)
Result = dict(d)
推荐阅读
- python - GCP 数据存储与搜索 API 性能基准?
- java - 如何在 SpringBoot 中的 PathVariable 处替换字符
- c++ - 如何正确安装 ImGui?
- datagrip - 如何在datagrip的数据库表中显示未提交的更改
- c++ - 为什么 C++ 数组类比 C 样式数组花费更多时间来操作?
- python - 如何根据特定数据集在 Python 中对数组进行排序
- mysql - SQL查询使用group by子句计算不同列中的行数
- c# - 如何更改 C# Windows 窗体标签颜色 1 秒并重置标签颜色?
- python - IMAP 检索电子邮件
- postgresql - postgresql pod 的 pg_dump 没有终端访问,因为它是一个 prod env