python - 使用python将字符串拆分成句子
问题描述
我有以下字符串:
string = 'This is one sentence ${w_{1},..,w_{i}}$. This is another sentence. '
现在,我想把它分成两句话。
但是,当我这样做时:
string.split('.')
我得到:
['This is one sentence ${w_{1},',
'',
',w_{i}}$',
' This is another sentence',
' ']
任何人都知道如何改进它,以免检测到“。” 内$ $
?
另外,你会怎么做:
string2 = 'This is one sentence ${w_{1},..,w_{i}}$! This is another sentence. Is this a sentence? Maybe ! '
编辑1:
期望的输出是:
对于字符串 1:
['This is one sentence ${w_{1},..,w_{i}}$','This is another sentence']
对于字符串 2:
['This is one sentence ${w_{1},..,w_{i}}$','This is another sentence', 'Is this a sentence', 'Maybe ! ']
解决方案
对于更一般的情况,您可以re.split
像这样使用:
import re
mystr = 'This is one sentence ${w_{1},..,w_{i}}$. This is another sentence. '
re.split("[.!?]\s{1,}", mystr)
# ['This is one sentence ${w_{1},..,w_{i}}$', 'This is another sentence', '']
str2 = 'This is one sentence ${w_{1},..,w_{i}}$! This is another sentence. Is this a sentence? Maybe ! '
re.split("[.!?]\s{1,}", str2)
['This is one sentence ${w_{1},..,w_{i}}$', 'This is another sentence', 'Is this a sentence', 'Maybe ', '']
括号中的字符是您选择的标点符号,并且您在末尾添加至少一个空格\s{1,}
以忽略其他.
没有间距的字符。这也将处理您的感叹号案例
这是一种(有点老套)找回标点符号的方法
punct = re.findall("[.!?]\s{1,}", str2)
['! ', '. ', '? ', '! ']
sent = [x+y for x,y in zip(re.split("[.!?]\s{1,}", str2), punct)]
sent
['This is one sentence ${w_{1},..,w_{i}}$! ', 'This is another sentence. ', 'Is this a sentence? ', 'Maybe ! ']
推荐阅读
- scala - 用于类验证的 Scala 反射测试
- hive - Hive ldap 身份验证组过滤器
- wordpress - 特定页面上的 Wordpress 加载功能
- html - CSS 动画过渡不适用于 Firefox
- javascript - 在用 React 加载 dom 内容后应用事件监听器
- java - 无法启动进入无限循环的 JBoss 服务器
- laravel - 为什么使用wire打开引导模式:单击并调度事件块滚动在管理菜单中
- python - 如何修复/解决 ALSA 在 Ubuntu 21.04 上的 pygame 中不起作用
- python - 日期时间存储在 hd5 数据库中
- javascript - 如何在不重新加载 NextJS 的情况下删除 onClick 查询字符串?