首页 > 解决方案 > 如何删除在 python3 字符串对象中显示为 `\uxxx` 的特殊字符?

问题描述

python字符串对象如下:

The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833.

我想删除这些\u200b \ufeff显示为原始 unicode 的字符。

标签: python-3.xregexpython-unicode

解决方案


将其编码ascii并忽略错误

>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> s.encode('ascii', 'ignore')
b'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833'

要用空格替换 unicode 字符以保持长度相同,您可以使用

#length of original string

>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> len(s)
179

#to maintain the same length

>>> new_s = s.encode('ascii',errors='ignore').decode('utf-8')
>>> final_s = new_s + ' ' * (len(s) - len(new_s))
>>> final_s
'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833            '
>>> len(final_s)
179

这将最终增加额外的空间以保持长度


推荐阅读