python-3.x - 如何删除在 python3 字符串对象中显示为 `\uxxx` 的特殊字符?
问题描述
python字符串对象如下:
The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833.
我想删除这些\u200b
\ufeff
显示为原始 unicode 的字符。
解决方案
将其编码ascii
并忽略错误
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> s.encode('ascii', 'ignore')
b'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833'
要用空格替换 unicode 字符以保持长度相同,您可以使用
#length of original string
>>> s = 'The site of the old observatory in Bern \u200bis the point of origin of the CH1903 coordinate system at 46°57′08.66″N 7°26′22.50″E\ufeff / \ufeff46.9524056°N 7.4395833°E\ufeff / 46.9524056; 7.4395833'
>>> len(s)
179
#to maintain the same length
>>> new_s = s.encode('ascii',errors='ignore').decode('utf-8')
>>> final_s = new_s + ' ' * (len(s) - len(new_s))
>>> final_s
'The site of the old observatory in Bern is the point of origin of the CH1903 coordinate system at 465708.66N 72622.50E / 46.9524056N 7.4395833E / 46.9524056; 7.4395833 '
>>> len(final_s)
179
这将最终增加额外的空间以保持长度
推荐阅读
- python - 在 Github for windows 中打开 Tensorboard 的问题
- c# - 我需要帮助尝试在 Unity C# 中的 IF 语句中反转逻辑
- c++ - 在固定的、无序的、拥有的数组中安全、惯用的销毁和压缩
- css - Webpack 4节点模块css - 语法错误:意外的令牌。(点)
- java - 链表的删除方法。插入方法是非传统的,在 theta(1) 时间运行
- kubernetes - Request vs limit cpu in kubernates/openshift
- unit-testing - 角度测试在组件中模拟 store.pipe
- xslt - XSLT 转换文件中产生的额外换行符
- c# - nopcommerce 尝试在管理面板中添加自定义选项卡
- r - 时间序列的时间百分比