首页 > 解决方案 > 从列中合并数据框

问题描述

我有data.frame两个变量:ID并且Text 我正在使用以下文本分析命令,该命令data.frame输出 48 列。

analysis <- textstat_readability(mydata$text,  measure = c("all"), remove_hyphens = TRUE)

如何将这 48 列结果添加为单独的列mydata

目前我正在使用以下内容:

analysis <- cbind(mydata$ID[1:100000], textstat_readability(mydata$text[1:100000],  measure = c("all"), remove_hyphens = TRUE))

但它需要永远完成。

标签: rdataframecalculated-columnsquanteda

解决方案


不知道为什么你的方法需要永远完成老实说,但我认为正确的方法如下:

# (0.) Load the package and make a random sample dataset (usually this should be
# provided in the question, just saying):

library(quanteda)
mydata <- data.frame(ID = 1:100,
                     text = stringi::stri_rand_strings(
                       n = 100, 
                       length = runif(100, min=1, max=100), 
                       pattern = "[A-Za-z0-9]"),
                     stringsAsFactors = FALSE)

# 1. Make a quanteda corpus, where the ID is stored alongside the text column:

mydata_corpus <- corpus(mydata, docid_field = "ID", text_field = "text")

# 2. Then run the readability command:

`analysis <- textstat_readability(mydata_corpus,  measure = c("all"), remove_hyphens = TRUE)`

# 3. Now you can either keep this, or merge it with your original set based on
# IDs:

mydata_analysis <- merge(mydata, analysis, by.x = "ID", by.y = "document")

这应该可以工作,而您根本不必使用cbind()


推荐阅读