python - 将3个或更多相似词合并为一个词python
问题描述
我有一个数据框
0 i only need uxy to hit 20 eod to make up for a...
1 oh this isn’t good
2 account account account has a lot of issues...
3 i'm tempted to drop my last 800 into some stup...
4 the sell offs will will will will will continue until moral improves.
我有一个单词列表
names = ['is','account','will']
如果该行包含列表中的 3 个或更多单词,我想将它们合并为一个单词。例如row has account account account有很多问题。我希望我的行看起来像这个帐户有很多问题
0 i only need uxy to hit 20 eod to make up for a...
1 oh this isn’t good
2 account has a lot of issues...
3 i'm tempted to drop my last 800 into some stup...
4 the sell offs will continue until moral improves.
解决方案
不确定这是否是您的意思,但以下代码将单词的所有多个附加出现替换为单个出现。
for name in ['is', 'account', 'will']:
yourList = list(re.sub(r"("+name+"\s){2,}", name+" ", txt) for txt in yourList)
使用的输入:
yourList = ["the sell offs will will will will will continue until moral improves.",
"the sell offs will will will will will is is continue until moral improves.",
"will will is is is is account account"]
收到的输出:
['the sell offs will continue until moral improves.',
'the sell offs will is continue until moral improves.',
'will is account account']
推荐阅读
- python - Seaborn:来自两个数据框的分组箱线图
- php - 获取输入帖子在codeigniter中不起作用
- sml - 比较 sml 中的真实列表
- vba - Vba 设置单元格等于另一个工作表中的单元格
- python - Python 正则表达式在文件的第一行中找不到匹配项
- python - 在 Windows IDE 上进行 Django 开发,但在 Ubuntu 上进行部署
- sql - 如何使用存储在 SQL 表变量中的列表更新多行
- botframework - 在信使中发送主动消息的时间间隔为数小时或数天
- javascript - 如何通过在 for 循环中使用 promise 和 then() 来防止接收到的数据混乱
- ffmpeg - 带有 NVENC 和 DVB 字幕和叠加选项的 FFMPEG