r - How do i avoid nested for loop in R which takes more processing time
问题描述
I have a dataset where i need to tokenize the words and find the frequency of each word, i can achieve this by doing for loop in R.
InputData <- To_Find_Categories
ShtDesc_Token_all <- ""
ShtDesc_Token <- ""
for(i_ID in 1:nrow(InputData))
#for(i_ID in 1:20)
{
ShtDesc_Token <- regmatches(InputData$short_description, gregexpr("((?![0-9]+)[A-Za-z0-9]+)",
InputData$short_description, perl = TRUE))[[i_ID]]
ShtDesc_Token_all <- append(ShtDesc_Token_all, ShtDesc_Token)
}
X<- sort(table(unlist(ShtDesc_Token_all)))
write.csv(X, "temp.csv", row.names=FALSE)
#
But it takes much processing time, i want to avoid the for loop, how i can do this? Data is like in .csv format, i can give sample records
data.table::fread("number,parent , short_description
GECTASK0011264, GECHG0036340 , Restore Request
GECTASK0011265, GECHG0036340 , Restore Request
GECTASK0011748, GECHG0038670, lkj
GECTASK0011797 , GECHG0034985 , vm down-grade
GECTASK0011798, GECHG0034985 , vm down-grade
GECTASK0012252 , GECHG0040437 , remove server from load
GECTASK0012253 , GECHG0040437 , remove server from load
GECTASK0012328 , GECHG0034983 , vm down-grade
GECTASK0012329 , GECHG0034983 , vm down-grade")
解决方案
尝试这个
在这种情况下,您不需要 for 循环。
input <- data.table::fread("number,parent , short_description
GECTASK0011264, GECHG0036340 , Restore Request
GECTASK0011265, GECHG0036340 , Restore Request
GECTASK0011748, GECHG0038670, lkj
GECTASK0011797 , GECHG0034985 , vm down-grade
GECTASK0011798, GECHG0034985 , vm down-grade
GECTASK0012252 , GECHG0040437 , remove server from load
GECTASK0012253 , GECHG0040437 , remove server from load
GECTASK0012328 , GECHG0034983 , vm down-grade
GECTASK0012329 , GECHG0034983 , vm down-grade")
tmp <- paste(input$short_description,collapse = " ")
tmp.splt <- stringr::str_split(tmp, pattern= " ")[[1]]
table(tmp.splt)
#> tmp.splt
#> down-grade from lkj load remove Request
#> 4 2 1 2 2 2
#> Restore server vm
#> 2 2 4
由reprex 包(v0.2.0.9000) 于 2018 年 8 月 10 日创建。
或者
使用这个单线(来自@Onyambu 的评论):
sort(table(unlist(strsplit(InputData$short_description,"\\W"))))
推荐阅读
- angular - 在 RxJs 6 中避免嵌套的 subcribe() 调用
- ruby-on-rails - 在Ruby中对json分层数据进行排序
- typescript - 如何在打字稿中为泛型参数添加“可更新”约束?
- html - XPath 表达式,用于查找具有属性但不包含这些后代的任何后代的后代
- css - Styled-components 是指兄弟组件的 props
- javascript - 单击包含复选框的容器 - JQuery
- docker - Kubernetes NodePort 不返回来自容器的响应
- javascript - document.body.appendChild() 将起作用,但当我尝试附加到正文中的特定元素时不起作用
- c++ - 未解决的 std::exception::exception(char const * const &,int)
- swift - 'UICollectionView 必须使用非 nil 布局参数初始化' 问题即使设置了布局参数也会出现