首页 > 解决方案 > 在带有 cPickle 的 python 3.7 上使用 python 2.7 代码时出现 UnicodeDecodeError

问题描述

我正在尝试在由“已解析”.csv 文件构造的 .pkl 文件上使用 cPickle。使用预先构建的 python 工具箱进行解析,该工具箱最近已从 python 2 ( https://github.com/GEMScienceTools/gmpe-smtk )移植到 python 3

我正在使用的代码如下:

from smtk.parsers.esm_flatfile_parser import ESMFlatfileParser
parser=ESMFlatfileParser.autobuild("Database10","Metadata10","C:/Python37/TestX10","C:/Python37/NorthSea_Inc_SA.csv")
import cPickle
sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","r"))

它返回以下错误:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 44: character maps to <undefined>

据我所知,我需要指定 .pkl 文件的编码以使 cPickle 能够工作,但我不知道解析 .csv 文件生成的文件的编码是什么,所以我不能使用cPickle 目前这样做。

我使用 sublime text 软件发现它是“十六进制”,但这不是 Python 3.7 中可接受的编码格式,不是吗?

如果有人知道如何确定所需的编码格式,或者如何使十六进制编码在 Python 3.7 中可用,他们的帮助将不胜感激。

Ps 使用的模块(例如“ESMFlatfileparser”)是预先构建的工具箱的一部分。考虑到这一点,我是否可能需要在此模块中以某种方式更改编码?

标签: pythonencodingdeserializationpickle

解决方案


The code is opening the file in text mode ('r'), but it should be binary mode ('rb').

From the documentation for pickle.load (emphasis mine):

[The] file can be an on-disk file opened for binary reading, an io.BytesIO object, or any other custom object that meets this interface.

Since the file is being opened in binary mode there is no need to provide an encoding argument to open. It may be necessary to provide an encoding argument to pickle.load. From the same documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime, date and time pickled by Python 2.

This ought to prevent the UnicodeDecodeError:

sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","rb"))

推荐阅读