首页 > 解决方案 > How do i avoid nested for loop in R which takes more processing time

问题描述

I have a dataset where i need to tokenize the words and find the frequency of each word, i can achieve this by doing for loop in R.

InputData <- To_Find_Categories
ShtDesc_Token_all <- ""
ShtDesc_Token <- ""
for(i_ID in 1:nrow(InputData))
  #for(i_ID in 1:20)  
{
  ShtDesc_Token <- regmatches(InputData$short_description, gregexpr("((?![0-9]+)[A-Za-z0-9]+)",
                                                                    InputData$short_description, perl = TRUE))[[i_ID]]
  ShtDesc_Token_all <- append(ShtDesc_Token_all, ShtDesc_Token)
}

X<- sort(table(unlist(ShtDesc_Token_all)))

write.csv(X, "temp.csv", row.names=FALSE)
#

But it takes much processing time, i want to avoid the for loop, how i can do this? Data is like in .csv format, i can give sample records

data.table::fread("number,parent , short_description
GECTASK0011264,  GECHG0036340 ,   Restore Request
GECTASK0011265,  GECHG0036340 ,   Restore Request
GECTASK0011748,  GECHG0038670,    lkj
GECTASK0011797 , GECHG0034985 ,   vm down-grade
GECTASK0011798,  GECHG0034985 ,   vm down-grade
GECTASK0012252 , GECHG0040437  ,  remove server from load
GECTASK0012253 , GECHG0040437 ,   remove server from load
GECTASK0012328 , GECHG0034983 ,   vm down-grade
GECTASK0012329 , GECHG0034983 ,   vm down-grade")

标签: rfor-loop

解决方案


尝试这个

在这种情况下,您不需要 for 循环。

input <-    data.table::fread("number,parent , short_description
    GECTASK0011264,  GECHG0036340 ,   Restore Request
    GECTASK0011265,  GECHG0036340 ,   Restore Request
    GECTASK0011748,  GECHG0038670,    lkj
    GECTASK0011797 , GECHG0034985 ,   vm down-grade
    GECTASK0011798,  GECHG0034985 ,   vm down-grade
    GECTASK0012252 , GECHG0040437  ,  remove server from load
    GECTASK0012253 , GECHG0040437 ,   remove server from load
    GECTASK0012328 , GECHG0034983 ,   vm down-grade
    GECTASK0012329 , GECHG0034983 ,   vm down-grade")

tmp <- paste(input$short_description,collapse = " ")

tmp.splt <- stringr::str_split(tmp, pattern= " ")[[1]]

table(tmp.splt)
#> tmp.splt
#> down-grade       from        lkj       load     remove    Request 
#>          4          2          1          2          2          2 
#>    Restore     server         vm 
#>          2          2          4

reprex 包(v0.2.0.9000) 于 2018 年 8 月 10 日创建。

或者

使用这个单线(来自@Onyambu 的评论):

sort(table(unlist(strsplit(InputData$short_description,"\\W"))))

推荐阅读