nlp - 测试令牌是否是 spaCy 中的连词头
问题描述
问题
有没有办法检测令牌是否是 spaCy 中的连词头?
问题描述
我想要以下句子:
“美国人和穆斯林朋友和公民、纳税公民和各国的穆斯林都感到震惊,无法相信我们在电视屏幕上看到的一切。”
...返回以下内容custom_chunks
:
custom_chunks = [
Americans,
Muslim friends,
citizens, # `poss` modifier Muslim to be added as a hidden element
tax-paying citizens,
Muslims in nations,
what,
what,
we,
TV screens]
这句话在其主要连词中包含一个子连词,这使得任务更加复杂:
# main conjunction
"Americans" : Conjuncts(friends, citizens, Muslims, citizens)
"Americans" : Children(Both, and, friends, ,, citizens, ,, and, Muslims)
# sub-conjunction
"Friends : Conjuncts(Americans, citizens, Muslims, citizens)
"Friends" : Children(Muslim, and, citizens) # `children` attribute correctly identifies the sub-conjunction tokens
目前,我正在使用以下代码来产生所需的答案:
# if the word has conjuncts but does not have a `conj` dependency it is the head of the main conjunction.
if word.conjuncts and word.dep != conj:
# prev_end is the current word index
prev_end = word.i
yield word.left_edge.i, word.i + 1, cc_label
# if the word has a `conj` dependency and its subtree contains `conj` dependencies, it is the head of a sub-conjunction to a main conjunction
elif word.dep == conj and list(word.rights) and conj in [t.dep for t in word.rights]:
# prev_end is the current word index
prev_end = word.i
yield word.left_edge.i, word.i + 1, cc_label
# for when the word is not part of a conjunction
elif word.dep in np_deps: # `conj` added to np_deps for other tokens of a conjunction
# prev_end marks the right edge of the token subtree
prev_end = word.right_edge.i
yield word.left_edge.i, word.right_edge.i + 1, cc_label
用于识别连词头和子连词头的if
andelif
语句感觉有点 hacky,是否有更肯定的方法来识别标记是否是连词头,或者可以请求这样的属性?
解决方案
推荐阅读
- c - 捕获 WM_CHAR 消息时如何检测 UNICODE 字符?
- npm - NPM 脚本条件不适用于 Windows,但适用于 Linux 和 Mac
- python - 在 Ubuntu 中重新启动 tmux 会话(运行 Python)
- python - “conda install”命令中c标志的用途是什么
- java - 嵌套副本构造期间的 NullPointerException
- java - 使用 Java 8 安装 RichTextFX 时出现 Gradle 错误
- php - 唯一 id 用户团队排名 php/json
- php - 使用while循环将列保持在同一行,直到标题结果不同
- javascript - 读取字符串数组作为属性名称?
- asp.net-mvc - 每个请求的静态方法中的 EntityFramework DbContext