python - 从文本中排除短语
问题描述
假设我有这样一句话:
text = 'Romeo and Juliet is a tragedy written by William Shakespeare early in his career about two young star-crossed lovers whose deaths ultimately reconcile their feuding families'
和一个带有短语的列表:
phrases = ['Romeo and Juliet', 'William Shakespeare', 'career', 'lovers', 'deaths', 'feuding families']
是否可以从文本中排除这些短语以获得:
result = ['is', 'a', 'tragedy', 'written', 'by', 'early', 'in', 'his', 'about', 'two', 'young', 'star-crossed', 'whose', 'ultimately', 'reconcile', 'their']
我以前使用过过滤器,但只使用单个单词而不是短语
解决方案
您可以使用str replace将所有短语替换为空字符串,然后使用str split沿 withspaces 拆分结果字符串。
例如:
for phrase in phrases:
text = text.replace(phrase, '')
result = text.split()
print(result)