python - IPython 中的 UnicodeEncodeError 但不是标准的 REPL
问题描述
我正在使用 Python 3.6.3 读取包含 Unicode 字符的文件。在标准 Python REPL 中,我可以通过指定 UTF-8 编码毫无问题地读取文件:
>>> with open("emoji.csv", encoding='utf-8') as f:
... lines = f.readlines()
>>> lines
['this line has an emoji \U0001f644\n']
那里没有问题。但是,当我在 IPython 6.1.0 中尝试相同的操作时,我得到以下信息UnicodeEncodeError
:
In [1]: with open('emoji.csv', encoding='utf-8') as f:
...: lines = f.readlines()
...:
In [2]: lines
Out[2]: ---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-2-3fb162a4fe05> in <module>()
----> 1 lines
/opt/anaconda/lib/python3.6/site-packages/IPython/core/displayhook.py in __call__(self, result)
259 self.fill_exec_result(result)
260 if format_dict:
--> 261 self.write_format_data(format_dict, md_dict)
262 self.log_output(format_dict)
263 self.finish_displayhook()
/opt/anaconda/lib/python3.6/site-packages/IPython/core/displayhook.py in write_format_data(self, format_dict, md_dict)
188 result_repr = '\n' + result_repr
189
--> 190 print(result_repr)
191
192 def update_user_ns(self, result):
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f644' in position 24: ordinal not in range(128)
同样,如果我尝试简单地对 Unicode 字符本身进行编码和解码,我会得到同样的错误:
In [1]: '\U0001f644'.encode('utf-8').decode('utf-8')
Out[1]: ---------------------------------------------------------------------------
...
...
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f644' in position 1: ordinal not in range(128)
是什么原因造成的,我如何在 IPython 中读取这个文件?
编辑:这似乎是 IPython 默认使用 ASCII 编码的功能:
In [1]: from IPython.utils.encoding import get_stream_enc; import sys
In [2]: get_stream_enc(sys.stdout)
Out[2]: 'ANSI_X3.4-1968'
但是,我在 IPython 文档中没有看到任何关于如何更改它的内容。这可能吗?
解决方案
This is due to my system using a POSIX locale. Setting $PYTHONIOENCODING=UTF-8
resolved the issue by overriding the ASCII-based encoding IPython was using by default.