首页 > 解决方案 > 如何在 R 中用 2 个句子对文本进行子集化?

问题描述

我有以下数据框:

df = data.frame(Text = c("This is great. A really great place to be. For sure if you wanna solve R issues. Skilled people.", "Good morning. There are very skilled programmers here. They will help sorting this. I am sure.", "SO is great. You can get many things solve. Additional paragraph."), stringsAsFactors = F)

我曾经将文本子集化为句子:

library(textshape)

split_sentence(df$Text)

但是,我想每 2 个句子对“文本”列进行子集化,以便获得如下列表:

This is great.
A really great place to be.
Good morning.
There are very skilled programmers here. 
SO is great.
You can get many things solve.

谁能帮我?

谢谢!

标签: rdataframe

解决方案


另一个选项strsplitand head

unlist(lapply(strsplit(df$Text, '(?<=\\.)\\s*', perl = TRUE), head, 2))
# [1] "This is great."                           "A really great place to be."             
# [3] "Good morning."                            "There are very skilled programmers here."
# [5] "SO is great."                             "You can get many things solve."    

推荐阅读