首页 > 解决方案 > How to speed up the processing time of long article with StanfordCoreNLP (v3.9.2)

问题描述

I have an article with 8226 chars, what i want is extracting NERs. (check the original article at Here)

Using command as below cost 8.0 sec at NERCombinerAnnotator

 java -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.model edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz -ner.nthreads 4 -file longArticleSample.txt -outputFormat json

enter image description here

Also, I have tried another article with 1973 chars in the same way. It takes 4.2 sec to get NERs.

java -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.model edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz -ner.nthreads 4 -file mediumArticle.txt -outputFormat json

enter image description here

This result is much less efficient than the author's result.(both use Token+SS+PoS+L+NER)

[My Result]

[Stanford Result]

标签: stanford-nlp

解决方案


推荐阅读