首页 > 解决方案 > 如何使用 nltk.Text findall() API 来确定客户端逻辑中的成功和失败

问题描述

我可以在没有问题的情况下使用findall () API。下面是简单的案例

import nltk

raw = "Management Discussion and Analysis"
raw = raw.lower()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)
text.findall(r"<.*> <.*> <.*> <analysis>")

输出

管理讨论与分析


现在,如果我更改raw变量以便 findall 找不到任何东西。

import nltk

raw = "Management Discussion and Analysisss"
raw = raw.lower()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)
text.findall(r"<.*> <.*> <.*> <analysis>")

输出


所以问题是如何在调用方区分成功和失败?

我还检查和调试了库代码,实现只是打印内容而不返回任何内容。我觉得有点奇怪,但不知道为什么 API 不返回任何东西。

hits = self._token_searcher.findall(regexp)
hits = [" ".join(h) for h in hits]
print(tokenwrap(hits, "; "))

好心提醒。

标签: pythonpython-3.xnltk

解决方案


随着对此的进一步研究,我能够思考/实现一个符合我要求的逻辑。我发布了答案,以便其他人将来可以参考。

def nltk_text_findall_object(nltkText, regexp):
    outList      = []
    finalOutList = []

    # now assign stdout handle with some text file so that
    # nltk findall() API output/print could be redirected.
    tempFileName = "tempFile.txt"
    orig_out = sys.stdout
    sys.stdout = open(tempFileName, "w")

    nltkText.findall(regexp)
    # now restore the stdout handle with original value.
    sys.stdout.close()
    sys.stdout = orig_out

    # Now check for the content in the file and return the list.
    file = open(tempFileName,"r")
    raw = file.read()
    file.close()
    # nltk findall() API returns the list of strings separated
    # by ; as per their current implementation.
    outList = raw.split(";")
    outList = [str(item).strip() for item in outList]
    for item in outList:
        if(len(item) > 1):
            finalOutList.append(item)
    
    
    # now we are done with the file, let's us delete it.
    os.remove(tempFileName)
    return finalOutList

使用上述方法的客户端逻辑

raw = "Management Discussion and Analysis"
raw = raw.lower()
tokens = nltk.word_tokenize(raw)
regex  = r"<.*> <.*> <.*> <analysis>"
outList = nltk_text_findall_object(text, regex)
if(len(outList) == 0):
    print("Did Not Found")
else:
    print("Found")

如果有更好的方法来实现/实施我的问题中发布的用例,请有人告诉我。


推荐阅读