首页 > 解决方案 > 我是 NLTK 的新手,需要将输出更改为预期输出,并希望以预期的输出格式获得(搭配词)输出。,

问题描述

def performBigramsAndCollocations(textcontent, word):
    
    from nltk.corpus import stopwords
    from nltk import ConditionalFreqDist
    tokenizedword = nltk.regexp_tokenize(textcontent, pattern = r'\w*', gaps = False)
    tokenizedwords = [x.lower() for x in tokenizedword if x != '']
    tokenizedwordsbigrams=nltk.bigrams(tokenizedwords)
    stop_words= stopwords.words('english')
    tokenizednonstopwordsbigrams=[(w1,w2) for w1 , w2 in tokenizedwordsbigrams if (w1 not in stop_words and w2 not in stop_words)]
    cfd_bigrams=nltk.ConditionalFreqDist(tokenizednonstopwordsbigrams)
    mostfrequentwordafter=cfd_bigrams[word].most_common(3)
    tokenizedwords = nltk.Text(tokenizedwords)
    collocationwords = tokenizedwords.collocation_list()
    return mostfrequentwordafter ,collocationwords
if __name__ == '__main__':
    textcontent = input()
    word = input()
    if not os.path.exists(os.getcwd() + "/nltk_data"):
        with zipfile.ZipFile("nltk_data.zip", 'r') as zip_ref:
            zip_ref.extractall(os.getcwd())
    mostfrequentwordafter, collocationwords = performBigramsAndCollocations(textcontent, word)
    print(sorted(mostfrequentwordafter, key=lambda element: (element[1], element[0]), reverse=True))
    print(sorted(collocationwords))

输入:在为期7天的比赛中,将提供35个体育项目和4个文化活动。他带着魅力滑冰,从一个档位换到另一个档位,从一个方向换到另一个方向,比跑车还快。如果不支付电视许可费,安顿下来观看奥运会的扶手椅体育迷可能会跳高。此类邀请赛将激发体育迷的兴趣,从而吸引更多体育迷的收视率。她几乎没有注意到一辆华丽的跑车差点把他们撞倒,直到埃迪向前猛扑过去,一把抓住了她的身体。他奉承母亲,她有点生气,他说服她去骑跑车。运动的

你的输出(标准输出)

[('fans', 3), ('car', 3), ('disciplines', 1)]
[('sports', 'car'), ('sports', 'fans')]

预期产出

[('fans', 3), ('car', 3), ('disciplines', 1)]
['sports car', 'sports fans']

标签: pythonnlp

解决方案


推荐阅读