首页 > 解决方案 > python pandas read_excel engine=openpyxl不关闭文件

问题描述

我正在使用以下方法将数据框加载到熊猫中:

import pandas as pd

df_factor_histories=pd.read_excel("./eco_factor/eco_factor_test_data_builder.xlsx",
                                  engine='openpyxl', sheet_name=0)

engine=openpyxl需要启用read_excel以支持更新的 Excel 文件格式(特别是在我的情况下.xlsx,而不是 jusy .xls)。

数据框加载得很好,但文件保持打开状态

import psutil

p = psutil.Process()
print(p.open_files())

OUTPUT
[popenfile(path='C:\\Users\\xx\\.ipython\\profile_default\\history.sqlite', fd=-1), 
popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1), 
popenfile(path='C:\\Windows\\System32\\en-US\\kernel32.dll.mui', fd=-1), 
popenfile(path='D:\\xxxxx\\data modelling\\eco_factor\\eco_factor_test_data_builder.xlsx', fd=-1)]

这篇Github 帖子表明该错误已修复 - 但不适用于我(运行 Anaconda/Jupyter)。我正在运行的相关版本:

numpy                         1.19.2
openpyxl                      3.0.5
pandas                        1.1.3
Python 3.7.4

我将不胜感激有关如何关闭文件/解决此问题的最佳工作的一些建议,谢谢

标签: pythonpandasopenpyxlfile-handling

解决方案


我建议engine='openpyxl'从您的代码中删除。它实际上是不需要的。我在pd.read_excel没有它的情况下使用它,即使是.xlsx格式也能正常工作。

删除它会导致引擎参数的默认行为接管。引擎会知道使用哪个引擎: https ://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel

engine : str, default None
If io is not a buffer or path, this must be set to identify io. Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”. Engine compatibility :

“xlrd” supports old-style Excel files (.xls).

“openpyxl” supports newer Excel file formats.

“odf” supports OpenDocument file formats (.odf, .ods, .odt).

“pyxlsb” supports Binary Excel files.

Changed in version 1.2.0: The engine xlrd now only supports old-style .xls files. When engine=None, the following logic will be used to determine the engine:

If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.

Otherwise if path_or_buffer is an xls format, xlrd will be used.

Otherwise if openpyxl is installed, then openpyxl will be used.

Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.

Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.

推荐阅读