首页 > 解决方案 > 在 python 中加速 pd.read_excel

问题描述

我正在编写读取excel数据并将其导入数据库的python。对于 10,000~30,000 条记录是可以的。但是 150,000 多条记录花了我 13 多秒的时间。我怎样才能加快速度?

f = request.files['file']

all_data = {} #insert group data

df = pd.read_excel (f)
df = df.dropna(how='all')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df.index.name = excel_conf['config']['default_config']['identification_column']['name'] #column header identification
df.index += 1 #raise index to a valid one
df = df.fillna("")

########## loop take times #########
for index, row in df.iterrows():
    row_dict = []
    for key in excel_conf['config']['column_config']: #column header name lists
        row_dict.append({
            key : row[key]
            #key (from excel config) row[key] (row value from excel)
        })
    index_key = index_keygen(create_blake2s_signature(...)) #just create index key stuffs, i shorted it
    # add child data to main
    all_data[index_key] = row_dict
    #"test_key" : { "key":"value",... }
 ####################################

insert_db(all_data) #this is fast
 

 

标签: pythonpandasdataframe

解决方案


推荐阅读