首页 > 解决方案 > 为什么打印这些值会在不同的操作系统和版本中给出不同的值?

问题描述

标签: pythonpython-3.xpython-2.7unicode

解决方案


It is a question of encoding.

In Latin1 or Windows 1252 encoding, you have:

0xef -> ï (LATIN SMALL LETTER I WITH DIAERESIS)
0xbe -> ¾ (VULGAR FRACTION THREE QUARTERS)
0xad -> undefined and non printed in your examples
0xde -> Þ (LATIN CAPITAL LETTER THORN)

In utf-8 encoding, you have:

'\xef\xbe\xad' -> u'\uffad' or 'ᆳ' (HALFWIDTH HANGUL LETTER RIEUL-SIOS) '\xde' -> should raise an UnicodeDecodeError...

In Windows, Python2 or Python3 both use Windows 1252 code page (in your example). On Kali, Python2 sees the string as byte string and the terminal displays it in utf8, while Python3 assumes it already contains unicode character values and displays them directly.

As in Latin1 (and in Windows 1252 for all characters outside 0x80-0x9f) the byte code is the unicode value, that is enough to explain your outputs.

What to learn: be explicit whether strings contains unicode or bytes and beware of encodings!


推荐阅读