python - 每次计数器达到一定数量时,自动将输出保存到新的 Json 文件中 Python
问题描述
我有几个文件夹,每个文件夹都包含几个具有大量行和列的 CSV 文件。我正在尝试将 CSV 文件中的某些列连接到 JSON 文件。我的代码在通过 CSV 文件少于 100 个的文件夹时运行良好。如果我尝试文件超过 100 个,代码会变得非常慢,并且在几个文件之后就会卡住。
我创建了一个包含 4 个数据框的模拟数据,这些数据框复制了我的原始数据:
import pandas as pd
import numpy as np
import glob
data_1 = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['q', 'q', 'q', 'q', 'q', 'q', 'q', 'q', 'q', 'q'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
data_2 = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['w', 'w', 'w', 'w', 'w', 'w', 'w', 'w', 'w', 'w'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
data_3 = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['NaN', 'NaN', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
data_4 = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'],
'neighbourhood':['NaN', 'NaN', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US', 'Chicago, US'],
'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West',
'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'],
'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'],
'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]}
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)
df_3 = pd.DataFrame(data_3)
df_4 = pd.DataFrame(data_4)
df_list_1 = []
df_list_2 = []
df_list_3 = []
df_list_4 = []
df_list_1.append(df_1)
df_list_2.append(df_2)
df_list_3.append(df_3)
df_list_4.append(df_4)
df_all = df_list_1 + df_list_2 + df_list_3 +df_list_4
count = 0
li = []
for df in df_all:
count = count +1
print(count)
if count < 3:
df_n = df
li.append(df_n)
frame_1 = pd.concat(li, axis=0, ignore_index= True)
def Get_Columns(file_name):
return file_name[['host_identity_verified', 'latitude']]
concat_data_1 = Get_Columns(frame_1)
with open('Booking_Data_%s.json' % count,'w') as outfile:
concat_data_j_1 = concat_data_1.to_json()
outfile.write(concat_data_j_1)
如您所见,为了将 x 个文件连接起来并保存到一个 JASON 文件中,我必须通过编写许多 elif 语句来手动执行此操作。我有一个包含 900 个文件以下的文件夹,因此我必须编写大约 19 个条件才能将每 50 个 CSV 文件保存到 JSON 文件中。
因此,我想让代码更短,并在每次计数器达到 20 的倍数时自动将输出保存到一个新的 JSON 文件中。将前 20 个保存在文件中,将第二个 20 保存在文件中,依此类推。
例如,我在一个文件夹中有 58 个文件。如果我想将每 20 个文件保存在一个 JSON 文件中,我必须有 3 个 JSON 文件,前 2 个有 20 个 CSV,最后一个有 18 个。
另外,由于 JSON 文件太大,我会在分析 JSON 文件时遇到问题吗?它是保存大数据的最佳文件类型吗?我们谈论的是每个文件中的近百万行,如果不是更多,大小为数百 MB。
解决方案
推荐阅读
- reactjs - 仅获取日期并仅将日期作为输入提供给 reat-date-picker
- sql - Oracle 查询与版本 < 12 的兼容性问题
- integer - ocaml int 和 unsigned int
- javascript - 如何从 Internet Explorer 更改已打开的 chrome 选项卡的 url?
- java - 转换列表
估计 在地图中 - json - 如何设置“Invoke-WebRequest”命令的“正文”部分
- azure-devops - 无法从 azure devops 中提取工件
- xamarin.forms - Xamarin Forms:轮播子页面未触发消失事件?
- c# - Nlog IsDebugEnabled 和所有其他都是错误的
- sql-server - 连接到单用户模式 SQL Server 但获取错误帐户已禁用 -