r - 列 `token` 的长度必须为 2(行数)或 1,而不是 3
问题描述
我正在尝试标记长句子:
dat <- data.frame(text = c("hi i am Apple, not an orange. that is an orange","hello i am banana, not an pineapple. that is an pineapple"),
received = c(1, 0))
dat <- dat %>%
mutate(token = sent_detect(text, language = "en"))
但我收到此错误:
Error: Column `token` must be length 2 (the number of rows) or one, not 3
这是因为 str_detect函数返回的句子列表不会映射回原始数据帧的长度。
library(openNLP)
library(NLP)
sent_detect <- function(text, language) {
# Function to compute sentence annotations using the Apache OpenNLP Maxent sentence detector employing the default model for language 'en'.
sentence_token_annotator <- Maxent_Sent_Token_Annotator(language)
# Convert text to class String from package NLP
text <- as.String(text)
# Sentence boundaries in text
sentence.boundaries <- annotate(text, sentence_token_annotator)
# Extract sentences
sentences <- text[sentence.boundaries]
# return sentences
return(sentences)
}
我正在研究 purrr::map,但我不确定如何在这种情况下应用它。
我期待一个看起来像这样的结果:
text received token
"hi i am Apple, not an orange. that is an orange" 1 "hi i am Apple, not an orange."
"hi i am Apple, not an orange. that is an orange" 1 "that is an orange"
"hello i am banana, not an pineapple. that is an pineapple" 0 "hello i am banana, not an pineapple."
"hello i am banana, not an pineapple. that is an pineapple" 0 "that is an pineapple"
解决方案
使用 tidyr + purrr 可以让你到达那里。将创建一个嵌套输出,您可以使用tidyrmap
将其提升到更高级别。unnest
library(tidyr)
dat %>%
mutate(sentences = purrr::map(text, sent_detect, "en")) %>%
unnest(sentences)
# A tibble: 4 x 3
text received sentences
<chr> <dbl> <chr>
1 hi i am Apple, not an orange. that is an orange 1 hi i am Apple, not an orange.
2 hi i am Apple, not an orange. that is an orange 1 that is an orange
3 hello i am banana, not an pineapple. that is an pineapple 0 hello i am banana, not an pineapple.
4 hello i am banana, not an pineapple. that is an pineapple 0 that is an pineapple
推荐阅读
- google-cloud-firestore - 在 Firestore 安全规则 str.matches(regex) 中将“i”标志放在哪里?
- wordpress - wordpress media library keeps spinning on server but working on localhost
- javascript - mapStateToProps not updating component with toggled state
- kubernetes - Want to specify rules in VirtualService file where two or more services have same rules
- r - 在 ggplot barplot 中指定条形颜色
- ios - 如何将此方法从 Objective-C 转换为 Swift?
- sql - SQL Count where difference > number
- javascript - 反应 | TypeError:this.state.cars.map 不是函数
- android - DatePicker android - changing day from 31 to 1 also change the month (+1)
- mysql - 如何通过将两行添加到一行来解决这个 Mysql 问题