首页 > 解决方案 > 将3个或更多相似词合并为一个词python

问题描述

我有一个数据框

0         i only need uxy to hit 20 eod to make up for a...
1                                        oh this isn’t good
2         account account account has a lot of issues...
3         i'm tempted to drop my last 800 into some stup...
4         the sell offs will will will will will continue until moral improves.

我有一个单词列表

names = ['is','account','will']

如果该行包含列表中的 3 个或更多单词,我想将它们合并为一个单词。例如row has account account account有很多问题。我希望我的行看起来像这个帐户有很多问题

0         i only need uxy to hit 20 eod to make up for a...
1                                        oh this isn’t good
2         account has a lot of issues...
3         i'm tempted to drop my last 800 into some stup...
4         the sell offs will continue until moral improves.

标签: pythondataframe

解决方案


不确定这是否是您的意思,但以下代码将单词的所有多个附加出现替换为单个出现。

for name in ['is', 'account', 'will']:
    yourList = list(re.sub(r"("+name+"\s){2,}", name+" ", txt) for txt in yourList)

使用的输入:

yourList = ["the sell offs will will will will will continue until moral improves.",
            "the sell offs will will will will will is is continue until moral improves.",
            "will will is is is is account account"]

收到的输出:

['the sell offs will continue until moral improves.',
 'the sell offs will is continue until moral improves.', 
 'will is account account']

推荐阅读