首页 > 解决方案 > 修复它们的 Rlang 详细信息

问题描述

代码示例:

library(quanteda)
library(tidyr)
library(dplyr)
 df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it"), stringsAsFactors = FALSE)

 myDfm <- df$text %>%
     tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
     tokens_remove(pattern = c(stopwords(source = "smart"))) %>%
     dfm()

 out <- convert(myDfm, to = "data.frame")
 pivot_longer(out, cols = !contains("document"), names_to = "features", values_to = "count")  %>% 
     mutate(id = as.integer(gsub("[a-z]", "", document))) %>% 
     filter(count != 0) %>% 
     inner_join(df) %>% # joins on id
     select(id, features) # select only the id and features column

我运行一个特定的命令,并且有错误

这是回溯

我能做些什么来修复它?

> rlang::last_error()
<error/rlang_error>
`!contains("document")` must evaluate to column positions or names, not a logical vector
Backtrace:
 1. `%>%`(...)
 4. tidyr::pivot_longer(...)
 5. tidyr::build_longer_spec(...)
 6. tidyselect::vars_select(unique(names(data)), !!enquo(cols))
 7. tidyselect:::bad_calls(bad, "must evaluate to { singular(.vars) } positions or names, \\\n       not { first_type }")
 8. tidyselect:::glubort(fmt_calls(calls), ..., .envir = .envir)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
`!contains("document")` must evaluate to column positions or names, not a logical vector
Backtrace:
    x
 1. +-`%>%`(...)
 2. | \-base::eval(lhs, parent, parent)
 3. |   \-base::eval(lhs, parent, parent)
 4. \-tidyr::pivot_longer(...)
 5.   \-tidyr::build_longer_spec(...)
 6.     \-tidyselect::vars_select(unique(names(data)), !!enquo(cols))
 7.       \-tidyselect:::bad_calls(bad, "must evaluate to { singular(.vars) } positions or names, \\\n       not { first_type }")
 8.         \-tidyselect:::glubort(fmt_calls(calls), ..., .envir = .envir)

标签: rtidyrquanteda

解决方案


out问题是您试图在不存在的对象中引用不存在的列 - “文档” 。正确的列名是doc_id.

这实际上在即将到来的 2.1.0 中发生了变化,我们将它从“document”重命名为“doc_id”,因为这在整个包中更加一致。(所以我怀疑您正在使用quanteda v 2.0.9000(开发版)作为您的示例。

这可以使用任一版本修复它:

library(quanteda)
## Package version: 2.0.1

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df <- data.frame(id = c(1, 2), text = c("I am loving it", "I am hating it"), stringsAsFactors = FALSE)

myDfm <- df$text %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(pattern = c(stopwords(source = "smart"))) %>%
  dfm()

out <- convert(myDfm, to = "data.frame")
if ("document" %in% names(out)) {
  out <- rename(out, doc_id = document)
}

pivot_longer(out, cols = !contains("doc_id"), names_to = "features", values_to = "count") %>%
  mutate(id = as.integer(gsub("[a-z]", "", doc_id))) %>%
  filter(count != 0) %>%
  inner_join(df) %>% # joins on id
  select(id, features) # select only the id and features column
## Joining, by = "id"
## # A tibble: 2 x 2
##      id features
##   <dbl> <chr>   
## 1     1 loving  
## 2     2 hating

推荐阅读