首页 > 解决方案 > How to get entities with direction in relation extraction?

问题描述

I have been working with relation extraction for a week. But what I need is direction between two entities, such as Company_x got bought by Company_y. So the model should predict the entities like Company_y->bought-> Company_X. Any models you guys think will be helpful for this?

标签: pythontensorflownlpdata-science

解决方案


被动语态通常是关系方向的良好指标。

您可以从两个实体之间的上下文中提取以动词开头的模式,然后检测被动语态的存在或不存在。

一些简单的概念验证代码(使用 NLTK 中的 RegexpParser 实际上可能更简单)

from nltk import pos_tag
from nltk import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer

lmtzr = WordNetLemmatizer()
aux_verbs = ['be']

def detect_passive_voice(pattern):
    passive_voice = False

    if len(pattern) >= 3:
        if pattern[0][1].startswith('V'):
            verb = lmtzr.lemmatize(pattern[0][0], 'v')
            if verb in aux_verbs:
                if (pattern[1][1] == 'VBN' or pattern[1][1] == 'VBD') and pattern[-1][0] == 'by':
                    passive_voice = True

            # past verb + by
            elif (pattern[-2][1] == 'VBN' or pattern[-2][1] == 'VBD') and pattern[-1][0] == 'by':
                passive_voice = True

        # past verb + by
        elif (pattern[-2][1] == 'VBN' or pattern[-2][1] == 'VBD') and pattern[-1][0] == 'by':
                passive_voice = True

    # past verb + by
    elif len(pattern) >= 2:
        if (pattern[-2][1] == 'VBN' or pattern[-2][1] == 'VBD') and pattern[-1][0] == 'by':
            passive_voice = True

return passive_voice

运行一些示例:

In [4]: tokens = word_tokenize("was bought by")
   ...: tags = pos_tag(tokens)
   ...: detect_passive_voice(tags)
Out[4]: True

In [5]: tokens = word_tokenize("mailed the letter")
   ...: tags = pos_tag(tokens)
   ...: detect_passive_voice(tags)
Out[5]: False

In [7]: tokens = word_tokenize("was mailed by")
   ...: tags = pos_tag(tokens)
   ...: detect_passive_voice(tags)
Out[7]: True

您可以添加更多助动词,也可以允许中间存在副词或形容词。


推荐阅读