stanford-nlp - How to speed up the processing time of long article with StanfordCoreNLP (v3.9.2)
问题描述
I have an article with 8226 chars, what i want is extracting NERs. (check the original article at Here)
Using command as below cost 8.0 sec at NERCombinerAnnotator
java -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.model edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz -ner.nthreads 4 -file longArticleSample.txt -outputFormat json
Also, I have tried another article with 1973 chars in the same way. It takes 4.2 sec to get NERs.
java -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.model edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz -ner.nthreads 4 -file mediumArticle.txt -outputFormat json
This result is much less efficient than the author's result.(both use Token+SS+PoS+L+NER)
[My Result]
- MediumLengthArticle: 4.5 sec. for 356 tokens at 78.4 tokens/sec.
- LongArticle: 8.4 sec. for 1683 tokens at 200.3 tokens/sec.
[Stanford Result]
- More than 10,000 tokens/sec.
- originalSite
解决方案
推荐阅读
- javascript - 显示错误 - 'TypeError: 无法读取属性'comments' of null'
- arduino - 从 ESP32 上的 NFC 卡获取 UID
- r - ggOceanmaps 中的限制和投影问题
- sql - 在sql中查找两行之间的差异
- cdi - UnsatisfiedResolutionException - Quarkus 扩展中的 CDI bean
- xamarin.forms - okta IDP 预填充用户名
- jenkins - 如何从脚本转换为声明式管道
- c# - 为什么 Finder Sync 扩展在调试配置中工作而不是在发布配置中工作?
- flutter - 如何将 LinearProgressIndicator 添加到 AlertDialog 的顶部
- nginx - 在 GKE 上的 nginx-ingress 中禁用 TLS 1.0 和 TLS 1.1