首页 > 解决方案 > R中unnest_tokens函数中折叠参数的含义是什么

问题描述

例如,我知道 unnest_tokens 中折叠的默认设置是 TRUE。但我很困惑崩溃论点的真正含义是什么。我已经阅读了 R 文档,但我仍然感到困惑。这是我写的一个例子。如果我将 collapse 更改为 TRUE,返回值有什么不同吗?

bigram_freq <- tw %>%
  unnest_tokens(bigram,text,token = "ngrams", n=2, collapse = FALSE)

标签: rtidyversetext-miningtidytext

解决方案


collapse参数控制如何跨新行处理输入文本:

如果标记(例如句子或段落)跨越多行,是否首先将文本与换行符组合。

查看与collapse = TRUE比较的行为差异collapse = FALSE

library(tidyverse)
library(tidytext)

emily <- tibble(text = c("Because I could not stop for Death -",
                         "He kindly stopped for me -"))

## notice the bigram "death he"
emily %>%
  unnest_tokens(word, text, token = "ngrams", n = 2, collapse = TRUE)
#> # A tibble: 11 x 1
#>    word          
#>    <chr>         
#>  1 because i     
#>  2 i could       
#>  3 could not     
#>  4 not stop      
#>  5 stop for      
#>  6 for death     
#>  7 death he      
#>  8 he kindly     
#>  9 kindly stopped
#> 10 stopped for   
#> 11 for me

## notice no "death he"
emily %>%
  unnest_tokens(word, text, token = "ngrams", n = 2, collapse = FALSE)
#> # A tibble: 10 x 1
#>    word          
#>    <chr>         
#>  1 because i     
#>  2 i could       
#>  3 could not     
#>  4 not stop      
#>  5 stop for      
#>  6 for death     
#>  7 he kindly     
#>  8 kindly stopped
#>  9 stopped for   
#> 10 for me

reprex 包于 2020-08-18 创建(v0.3.0.9001)


推荐阅读