python - 我是 NLTK 的新手,需要将输出更改为预期输出,并希望以预期的输出格式获得(搭配词)输出。,
问题描述
def performBigramsAndCollocations(textcontent, word):
from nltk.corpus import stopwords
from nltk import ConditionalFreqDist
tokenizedword = nltk.regexp_tokenize(textcontent, pattern = r'\w*', gaps = False)
tokenizedwords = [x.lower() for x in tokenizedword if x != '']
tokenizedwordsbigrams=nltk.bigrams(tokenizedwords)
stop_words= stopwords.words('english')
tokenizednonstopwordsbigrams=[(w1,w2) for w1 , w2 in tokenizedwordsbigrams if (w1 not in stop_words and w2 not in stop_words)]
cfd_bigrams=nltk.ConditionalFreqDist(tokenizednonstopwordsbigrams)
mostfrequentwordafter=cfd_bigrams[word].most_common(3)
tokenizedwords = nltk.Text(tokenizedwords)
collocationwords = tokenizedwords.collocation_list()
return mostfrequentwordafter ,collocationwords
if __name__ == '__main__':
textcontent = input()
word = input()
if not os.path.exists(os.getcwd() + "/nltk_data"):
with zipfile.ZipFile("nltk_data.zip", 'r') as zip_ref:
zip_ref.extractall(os.getcwd())
mostfrequentwordafter, collocationwords = performBigramsAndCollocations(textcontent, word)
print(sorted(mostfrequentwordafter, key=lambda element: (element[1], element[0]), reverse=True))
print(sorted(collocationwords))
输入:在为期7天的比赛中,将提供35个体育项目和4个文化活动。他带着魅力滑冰,从一个档位换到另一个档位,从一个方向换到另一个方向,比跑车还快。如果不支付电视许可费,安顿下来观看奥运会的扶手椅体育迷可能会跳高。此类邀请赛将激发体育迷的兴趣,从而吸引更多体育迷的收视率。她几乎没有注意到一辆华丽的跑车差点把他们撞倒,直到埃迪向前猛扑过去,一把抓住了她的身体。他奉承母亲,她有点生气,他说服她去骑跑车。运动的
你的输出(标准输出)
[('fans', 3), ('car', 3), ('disciplines', 1)]
[('sports', 'car'), ('sports', 'fans')]
预期产出
[('fans', 3), ('car', 3), ('disciplines', 1)]
['sports car', 'sports fans']
解决方案
推荐阅读
- websocket - How does AWS Application Load balancer select a target within a target group? How to load balance the websocket traffic?
- angular - Data binding in dynamically created ng-select2 contorls in reactive form formarrays - angular 8
- azure - az ssh - 找不到 ssh-keygen.exe
- powerquery - Powerquery:要添加 6 个工作日到日期,包括假期和周末
- python - Python中的断言错误,虽然我比较的两个列表都是相同的
- javascript - React:如何解决“如果 CORS 标头 'Access-Control-Allow-Origin' 为 '*',则不支持凭据”错误?
- c++ - 魔术位板没有加速国际象棋引擎
- javascript - React Swiper 6.8.4 仅在调整窗口大小或应用过滤器时显示幻灯片
- reactjs - 在 Particle JS 背景中添加图像、文本
- excel - OSF.DDA 7000 Excel office js 对话框 API 错误