python - 在 Python3 中读取压缩 excel 时的 unknown_codepage_21010
问题描述
url = 'http://47.97.204.47/syl/bk20200416.zip'
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
entry = zip_file.namelist()[0]
file = zip_file.open(entry)
# This works
my_xls = xlrd.open_workbook(file_contents=zip_file.read(entry), encoding_override="gb2312")
my_xls.sheet_names()
# This doesn't work!
df = pd.read_excel(file, encoding_override='gb2312')
最后一行引发错误:
> LookupError: unknown encoding: unknown_codepage_21010 ERROR ***
> codepage 21010 -> encoding 'unknown_codepage_21010' -> LookupError:
> unknown encoding: unknown_codepage_21010
你知道如何传递encoding_override
给xlrd
引擎pandas.read_excel
吗?
我检查了源代码pandas
,似乎它没有传递encoding_override
给xlrd
:
def load_workbook(self, filepath_or_buffer):
from xlrd import open_workbook
if hasattr(filepath_or_buffer, "read"):
data = filepath_or_buffer.read()
return open_workbook(file_contents=data)
else:
return open_workbook(filepath_or_buffer)
或者我可以使用xlrd.open_workbook
,但不知道如何转换xlrd.book.Book
为DataFrame
.
解决方案
url = 'http://47.97.204.47/syl/bk20200416.zip'
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
entry = zip_file.namelist()[0]
file_contents = zip_file.read(entry)
book = xlrd.open_workbook(file_contents=file_contents, encoding_override="gb2312")
xls_file = pd.ExcelFile(book)
pd.ExcelFile
或者pd.read_excel
可以接受一本书作为论据。因此,首先构建这本书,然后将其传递给ExcelFile
会做的伎俩。
阅读评论了解更多详情:
class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.
Uses xlrd. See read_excel for more documentation
Parameters
----------
io : string, path object (pathlib.Path or py._path.local.LocalPath),
file-like object or xlrd workbook
If a string or path object, expected to be a path to xls or xlsx file.
engine : string, default None
If io is not a buffer or path, this must be set to identify io.
Acceptable values are None or ``xlrd``.
"""
from pandas.io.excel._odfreader import _ODFReader
from pandas.io.excel._openpyxl import _OpenpyxlReader
from pandas.io.excel._xlrd import _XlrdReader
_engines = {"xlrd": _XlrdReader, "openpyxl": _OpenpyxlReader, "odf": _ODFReader}
def __init__(self, io, engine=None):
if engine is None:
engine = "xlrd"
if engine not in self._engines:
raise ValueError("Unknown engine: {engine}".format(engine=engine))
self.engine = engine
# could be a str, ExcelFile, Book, etc.
self.io = io
# Always a string
self._io = _stringify_path(io)
self._reader = self._engines[engine](self._io)
推荐阅读
- c++ - 编译器错误 C2371“重新定义基本类型”,原因不明
- java - Android Studio “错误:不兼容的类型:Fragment 无法转换为 SupportMapFragment”
- ssis - 如何更改集成服务项目中错误消息的语言
- arrays - 在 golang 中使用用于写入文件字节优化
- amadeus - Amadeus Sandbox 和 Developers 子域之间的关系
- bus - 为什么 CPU 物理地址空间和总线地址空间不同?
- javascript - 添加新 HTML 项后丢失事件触发
- javascript - 数据表 - 第二次单击时未打开子行
- angular - 具有 BehaviorSubject 和异步管道的 RxJS share() 运算符 - Angular
- javascript - 启用复选框