r - 如何删除R中以冒号结尾的文本模式?
问题描述
我有以下句子
review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your requests.: Were a little bit pushy but excellent otherwise 2g. Your satisfaction with the process of coming to an agreement on pricing.: Were willing to try to bring the price to a level that was acceptable to me. Please provide any additional comments regarding your recent sales experience.: Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! ")
我想删除之前的所有内容:
我尝试了以下代码,
gsub("^[^:]+:","",review)
但是,它只删除了以冒号结尾的第一句
预期成绩:
Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me. Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)!
任何帮助或建议将不胜感激。谢谢你。
解决方案
如果句子不复杂且没有缩写,您可以使用
gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
请参阅正则表达式演示。
请注意,您可以通过更改\\d+[a-zA-Z]
为[0-9a-zA-Z]+
/[[:alnum:]]+
以匹配 1+ 数字或字母来进一步概括它。
细节
(?:\d+[a-zA-Z]\.)?
- 一个可选的序列\d+
- 1+ 位数[a-zA-Z]
- 一个 ASCII 字母\.
- 一个点
[^.?!:]*
.
- 除,?
,!
,之外的0 个或更多字符:
[?!.]
- 一个?
,!
或.
:
- 一个冒号\s*
- 0+ 个空格
R测试:
> gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
[1] "Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me.Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! "
扩展以处理缩写
如果添加交替,您可以枚举异常:
gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)
^^^^^^^^^^^^^^^^^^^^^^
在这里,(?:i\.?e\.|[^.?!:])*
匹配 0 个或多个ie.
或子字符串或除、或之外的i.e.
任何字符。.
?
!
:
请参阅此演示。
推荐阅读
- form-recognizer - 我们可以为多种类型的表单训练一个模型吗?
- azure - 关于延迟使用云数据库存储同步数据的最佳实践
- linkedin - 社会行为总结的错误值?
- ios - 是否有任何第三方平台可以根据 BPM 获取曲目
- c# - 一次运行所有测试会导致其中一些测试失败,但是当它们被单独解雇时,它们可以正常工作
- python - 用 Python 获取 100 个最长的电影标题
- reactjs - 嵌套数组中的组件
- google-cloud-platform - 重置后无法登录谷歌云虚拟机实例
- c++ - 文本文件 Cfile C++ 中的行数
- javascript - 在 HTTPS POST 请求中使用“return”返回服务器响应