java - 斯坦福 CoreNLP 在不应该的情况下将 2 个单独的实体识别为相同
问题描述
继承人的代码:
public static void main(String[] args) {
String text = "Loryn lives across the street from me. "
+ "She is 19 years old. "
+ "Sydney goes to my school. "
+ "She graduated last year. ";
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
props.setProperty("coref.algorithm", "neural");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
CoreDocument document = new CoreDocument(text);
pipeline.annotate(document);
for (Entry<Integer, CorefChain> e : document.corefChains().entrySet()) {
System.out.println(e.getValue() + " - " + e.getKey());
for (CorefMention s : e.getValue().getMentionsInTextualOrder()) {
System.out.println(" - " + s);
}
}
这是代码输出的内容:
CHAIN7-["me" in sentence 1, "my" in sentence 3] - 7
- "me" in sentence 1
- "my" in sentence 3
CHAIN8-["Loryn" in sentence 1, "She" in sentence 2, "She" in sentence 4] - 8
- "Loryn" in sentence 1
- "She" in sentence 2
- "She" in sentence 4
为什么She
从第 4 句中提到 Loryn?我怎样才能让它参考悉尼
所需的输出应类似于以下内容:
CHAIN7-["me" in sentence 1, "my" in sentence 3] - 7
- "me" in sentence 1
- "my" in sentence 3
CHAIN8-["Loryn" in sentence 1, "She" in sentence 2] - 8
- "Loryn" in sentence 1
- "She" in sentence 2
CHAIN9-["Sydney" in sentence 3, "She" in sentence 4] - 8
- "Sydney" in sentence 3
- "She" in sentence 4
解决方案
“Sydney”被标记为 CITY,所以这里有一个 NER 错误。
话虽如此,如果您只是将名称更改为“Jane”或类似的名称,它看起来会失败。
不幸的是,共指还没有真正解决,即使是最先进的系统也会犯很多错误。这是一个有趣的问题案例,我们将尝试使用它来为模型添加更多的训练数据!