首页 > 解决方案 > 我如何使用正则表达式在两个字符内获取一个字符串并删除该字符串内的某些字符

问题描述

我有一个要使用正则表达式过滤的长字符串

<@961483653468439706> Text to remove, this text is useless, that's why i want it gone!
i want this: `keep the letters and spaces`

我想保留 ` 字符之间的文本

唯一的问题是在我想要的字符串部分的每个字符之间都有一个不可见的字符。您可以在 regex101 中看到不可见字符:https ://regex101.com/r/rAYrMT/1

`([\'^\w]*)`

简而言之:将所有内容保留在 ` 之间,除了可以在此处找到的不可见字符信息:https ://apps.timwhitlock.info/unicode/inspect?s=%EF%BB%BF

标签: pythonpython-3.xregexre

解决方案


您可以过滤掉不可打印的字符:

import re 
from string import printable

# your invisibles are in the string...

s='''<@961483653468439706> Text to remove, this text is useless, that's why i want it gone!
Type `keep the letters and spaces` and `this too`'''

for m in re.findall(r'`([^`]*)`', s):
    print(repr(m))
    print(''.join([c for c in m if c in printable]))
    print()

印刷:

'k\ufeffe\ufeffe\ufeffp\ufeff \ufefft\ufeffh\ufeffe\ufeff \ufeffl\ufeffe\ufefft\ufefft\ufeffe\ufeffr\ufeffs a\ufeffn\ufeffd s\ufeffp\ufeffa\ufeffc\ufeffe\ufeffs'
keep the letters and spaces

'this too'
this too

推荐阅读