首页 > 解决方案 > 正则表达式模式以随机 \n 或 \n\n 作为换行符来计算诗歌中的行数

问题描述

我需要计算 221 首诗的行数并尝试计算换行符 \n。

但是,有些行有双换行符 \n\n 以构成新的诗句。这些我只想算一个。每首诗中双换行符的数量和位置是随机的。

最小的工作示例:

library("quanteda")

poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"

poems <- quanteda::corpus(poem1, poem2)

结果行数应该是 5 行poem1和 4 行poem2

我试过stringi::stri_count_fixed(texts(poems), pattern = "\n")了,但正则表达式模式不够精细,无法解决随机双换行问题。

标签: rregexnlpdata-sciencequanteda

解决方案


您可以使用stringr::str_count\R+模式来查找字符串中连续换行序列的数量:

> poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
> poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
> library(stringr)
> str_count(poem1, "\\R+")
[1] 4
> str_count(poem2, "\\R+")
[1] 3

所以行数str_count(x, "\\R+") + 1

\R模式匹配任何换行符序列、CRLF、LF 或 CR。\R+匹配一个或多个这样的换行序列的序列。

在线查看R 代码演示

poem1 <- "This is a line\nThis is a line\n\nAnother line\n\nAnd another one\nThis is the last one"
poem2 <- "Some poetry\n\nMore poetic stuff\nAnother very poetic line\n\nThis is the last line of the poem"
library(stringr)
str_count(poem1, "\\R+")
# => [1] 4
str_count(poem2, "\\R+")
# => [1] 3
## Line counts:
str_count(poem1, "\\R+") + 1
# => [1] 5
str_count(poem2, "\\R+") + 1
# => [1] 4

推荐阅读