regex - 如何在包含标点符号的同时将字符串拆分为句子?
问题描述
我希望拆分句子包含标点符号(例如:?,!,。),如果句子末尾有双引号,我也想包含它。
我使用 python3 中的 re.split() 函数将我的字符串拆分为句子。但遗憾的是,生成的字符串不包含标点符号,也不包含双引号(如果句尾出现双引号)。
这是我当前的代码的样子:
x = 'This is an example sentence. I want to include punctuation! What is wrong with my code? It makes me want to yell, "PLEASE HELP ME!"'
sentence = re.split('[\.\?\!]\s*', x)
我得到的输出是:
['This is an example sentence', 'I want to include punctuation', 'What is wrong with my code', 'It makes me want to yell, "PLEASE HELP ME', '"']
解决方案
尝试向后拆分:
sentences = re.split('(?<=[\.\?\!])\s*', x)
print(sentences)
['This is an example sentence.', 'I want to include punctuation!',
'What is wrong with my code?', 'It makes me want to yell, "PLEASE HELP ME!"']
当我们看到紧跟在我们身后的标点符号时,这个正则表达式技巧会通过拆分来发挥作用。在这种情况下,我们还会匹配并消耗我们前面的任何空格,然后再继续输入字符串。
这是我处理双引号问题的平庸尝试:
x = 'This is an example sentence. I want to include punctuation! "What is wrong with my code?" It makes me want to yell, "PLEASE HELP ME!"'
sentences = re.split('((?<=[.?!]")|((?<=[.?!])(?!")))\s*', x)
print filter(None, sentences)
['This is an example sentence.', 'I want to include punctuation!',
'"What is wrong with my code?"', 'It makes me want to yell, "PLEASE HELP ME!"']
请注意,它甚至可以正确拆分以双引号结尾的句子。
推荐阅读
- javascript - 使用 jquery 阅读更多按钮
- qt - 如何禁用水平滚动条中的自动滚动
- csv - databricks CSV导入时间戳NULL问题
- python - 一维卷积的对称边界条件
- python - Python 脚本不接收来自 vim 的 system() 的参数
- reactjs - 当使用 web-pack 而不是 create-react-app 有新部署时提醒用户刷新
- python - ValueError:请提供单个数组或数组列表作为模型输入
- yfinance - Yfinance 自动调整和进度
- android - IntentService onHandleIntent 正在主线程上运行
- php - 通过按钮单击禁用/启用按钮