首页 > 解决方案 > 如何从字符串中删除某些 utf-8 字符?

问题描述

标签: pythonpython-2.7beautifulsoup

解决方案


The line variable in your code is a unicode object. When you call line.replace Python expects the first argument to also be a unicode object. If you provide a str object instead, Python will try to automatically decode it into a unicode string using the system default encoding (which you can check via sys.getdefaultencoding()).

Apparently, the system encoding is ascii in your case. The byte string '„' cannot be decoded using the ascii codec, because '„' is not an ACII symbol, which causes the Exception that you see.

You could fix the problem by changing the default system encoding to the same one you used to provide the '„' string (CP1252, I guess), however such a fix is only interesting from the academic point of view, as it just sweeps the issue under the carpet.

A proper, safe and easy fix to your problem would be to simply provide a unicode object to the replace method in the first place. This would be as simple as replacing '„' with u'„' in your code.


推荐阅读