r - 如何在 R 中用 2 个句子对文本进行子集化?
问题描述
我有以下数据框:
df = data.frame(Text = c("This is great. A really great place to be. For sure if you wanna solve R issues. Skilled people.", "Good morning. There are very skilled programmers here. They will help sorting this. I am sure.", "SO is great. You can get many things solve. Additional paragraph."), stringsAsFactors = F)
我曾经将文本子集化为句子:
library(textshape)
split_sentence(df$Text)
但是,我想每 2 个句子对“文本”列进行子集化,以便获得如下列表:
This is great.
A really great place to be.
Good morning.
There are very skilled programmers here.
SO is great.
You can get many things solve.
谁能帮我?
谢谢!
解决方案
另一个选项strsplit
and head
:
unlist(lapply(strsplit(df$Text, '(?<=\\.)\\s*', perl = TRUE), head, 2))
# [1] "This is great." "A really great place to be."
# [3] "Good morning." "There are very skilled programmers here."
# [5] "SO is great." "You can get many things solve."
推荐阅读
- hdl - 有没有办法在 Chisel3 中警告错误的时钟域交叉?
- php - 如何为发布方法默认定义输入值?
- wordpress - 仅在尝试登录 wordpress 管理面板时如何修复 500 错误?
- java - 将模块化 maven 项目 jar 安装到本地存储库中的父目录中
- c# - 'Microsoft.SqlServer.SqlEnum, Version=13.0.0.0', 找到的程序集的清单定义与程序集引用不匹配
- kendo-ui - Kendo UI 工具栏对齐
- jenkins - 如何在 Jenkinsfile 的环境变量中使用环境变量?
- vb.net - 命名通过 vb.net 编码的新访问表
- javascript - 为什么菜单并不总是以 HTML 和 Javascript 显示?
- rollup - 未解决的依赖汇总不会合并依赖项