python - 如何修复 Python 函数以遍历目录中的 JSON 文件列表并合并到单个 JSON 文件中
问题描述
我有一个不断生成 JSON 文件的设备 - a.json、b.json、c.json 等等,并将它们存储在一个文件夹目录中,如下所示。
“Data/d/a.json”
“Data/d/b.json”
“Data/d/c.json”
.
.
.
.
“Data/d/g.json”
每个 JSON 文件中的示例数据
一个.json
{"artist":null,"auth":"Logged In","firstName":"Walter","gender":"M","itemInSession":0,"lastName":"Frye","length":null,"level":"free","location":"San Francisco-Oakland-Hayward, CA","method":"GET","page":"Home","registration":1540919166796.0,"sessionId":38,"song":null,"status":200,"ts":1541105830796,"userAgent":"\"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/36.0.1985.143 Safari\/537.36\"","userId":"39"}
{"artist":null,"auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":0,"lastName":"Summers","length":null,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"GET","page":"Home","registration":1540344794796.0,"sessionId":139,"song":null,"status":200,"ts":1541106106796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
b.json
{"artist":"Des'ree","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":1,"lastName":"Summers","length":246.30812,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"You Gotta Be","status":200,"ts":1541106106796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
{"artist":null,"auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":2,"lastName":"Summers","length":null,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"GET","page":"Upgrade","registration":1540344794796.0,"sessionId":139,"song":null,"status":200,"ts":1541106132796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
c.json
{"artist":"Mr Oizo","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":3,"lastName":"Summers","length":144.03873,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"Flat 55","status":200,"ts":1541106352796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
{"artist":"Tamba Trio","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":4,"lastName":"Summers","length":177.18812,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"Quem Quiser Encontrar O Amor","status":200,"ts":1541106496796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
这些文件每天可以增长到多达 1000 个 JSON 文件,每周可以增长到数千个文件。为了进一步处理这些 JSON 文件中的数据,我必须将每个 JSON 文件中的数据批量插入 PostgreSQL,正如您在下面的代码片段中看到的那样,但是当前的过程过于手动且效率低下,因为我在每个文件之后插入一个另一个。
import json
import psycopg2
connection = psycopg2.connect("host=localhost dbname=devicedb user=#### password=####")
cursor = connection.cursor()
connection.set_session(autocommit=True)
cursor.execute("create table if not exists events_table(artist text, auth text, firstName text, gender varchar, itemInSession int, lastName text, length text, level text, location text, method varchar, page text, registration text, sessionId int, song text, status int, ts bigint, userAgent text, userId int );")
data = []
with open('Data/d/a.json ') as f:
for line in f:
data.append(json.loads(line))
columns = [
'artist',
'auth',
'firstName',
'gender',
'itemInSession',
'lastName',
'length',
'level',
'location',
'method',
'page',
'registration',
'sessionId',
'song',
'status',
'ts',
'userAgent',
'userId'
]
for item in data:
my_data = [item[column] for column in columns]
for i, v in enumerate(my_data):
if isinstance(v, dict):
my_data[i] = json.dumps(v)
insert_query = "INSERT INTO events_table VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
cursor.execute(insert_query, tuple(my_data))
为了改进当前的流程,我在网上搜索了一下,发现下面这个功能可以将多个文件合并为一个文件。我对这个函数的理解是,我可以通过将merged.json 指向我的合并文件和包含我的输入JSON 文件列表的目录来定义我的output_filename 和input_filenames,然后运行该函数,但似乎我错了。请问,谁能告诉我我做错了什么?
def cat_json(output_filename, input_filenames):
with file(output_filename, "w") as outfile:
first = True
for infile_name in input_filenames:
with file(infile_name) as infile:
if first:
outfile.write('[')
first = False
else:
outfile.write(',')
outfile.write(mangle(infile.read()))
outfile.write(']')
output_filename = 'data/d/merged.json'
input_filenames = 'data/d/*.json'
cat_json(output_filename, input_filenames)
我收到以下错误
TypeError Traceback (most recent call last)
<ipython-input-19-3ff012d91d76> in <module>()
1 output_filename = 'data/d/merged.json'
2 input_filenames = 'data/d/*.json'
----> 3 cat_json(output_filename, input_filenames)
<ipython-input-18-760b670f79b1> in cat_json(output_filename, input_filenames)
1 def cat_json(output_filename, input_filenames):
----> 2 with file(output_filename, "w") as outfile:
3 first = True
4 for infile_name in input_filenames:
5 with file(infile_name) as infile:
TypeError: 'str' object is not callable
@deusxmachine 感谢您的贡献,我按照建议将功能更改为:
def cat_json(output_filename, input_filenames):
with open(output_filename, "w") as outfile:
first = True
for infile_name in input_filenames:
with open(infile_name) as infile:
if first:
outfile.write('[')
first = False
else:
outfile.write(',')
outfile.write(mangle(infile.read()))
outfile.write(']')
代码创建了 merge.Json 文件,但没有内容并且出现以下错误
-------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-16-40d7387f704a> in <module>()
1 output_filename = 'merged.json'
2 input_filenames = 'data/d/*.json'
----> 3 cat_json(output_filename, input_filenames)
<ipython-input-15-951cbaba7765> in cat_json(output_filename, input_filenames)
3 first = True
4 for infile_name in input_filenames:
----> 5 with open(infile_name) as infile:
6 if first:
7 outfile.write('[')
FileNotFoundError: [Errno 2] No such file or directory: 'd'
我无法弄清楚为什么它会给出上述错误并且说没有这样的文件或目录。a.json、b.json、c.json ... 位于目录“data/d/”中,还是我需要提及每个文件名而不是 *.json?
解决方案
我真的不明白你所说的合并 JSON 是什么意思,但我知道你为什么会收到这个错误
代替
with file(output_filename, "w") as outfile:
做这个
with open(output_filename, "w") as outfile:
file
不是函数。open
用于打开文件
希望能帮助到你
推荐阅读
- python - Python - 如何将文件设置为元组列表
- r - 如何大写R中的函数参数?
- java - websocket输出适配器的动态注册不起作用
- c# - 扫描图像并将其像 pdf 一样保存到数据库 c#
- scala - 在 Http4S 中组合路由
- oracle - 返回当前分钟和当前日期的记录数的存储过程
- python - 仅支持自动完成搜索的管理员外键字段?
- python - 如何在 Flask 中重定向根 url
- angular - 如何在 Angular Web 应用程序中收听来自 Chrome 扩展程序的 background.js 的消息?
- python - 如何在这个 json 文件中获取“属性”的键并在 DataFrame 中设置它们的列名?