database - 如何删除只出现一次的单词
问题描述
这是我的数据集
Id Text
1. Dear Mr. John, your bag order is delivered
2. Dear Mr. Brick, your ball order is delivered
3. Dear Mrs. Blue, your ball purchase is delivered
我需要的是
Id Text
1. Dear Mr. your order is delivered
2. Dear Mr. your ball order is delivered
3. Dear your ball is delivered
所以只出现一次的单词被删除
解决方案
利用:
#split to words and create Series
all_val = df['Text'].str.split(expand=True).stack()
#remove duplicates and join together per first level of MultiIndex
df['Text'] = all_val[all_val.duplicated(keep=False)].groupby(level=0).apply(' '.join)
print (df)
Id Text
0 1.0 Dear Mr. your order is delivered
1 2.0 Dear Mr. your ball order is delivered
2 3.0 Dear your ball is delivered
或者:
#join all text together and split by whitespaces
all_val = ' '.join(df['Text']).split()
#get unique values
once = [x for x in all_val if all_val.count(x) == 1]
#remove from text by nested list comprehension
df['Text'] = [' '.join([y for y in x.split() if y not in once]) for x in df['Text']]
#apply alternative
#df['Text'] = df['Text'].apply(lambda x: ' '.join([y for y in x.split() if y not in once]))
print (df)
Id Text
0 1.0 Dear Mr. your order is delivered
1 2.0 Dear Mr. your ball order is delivered
2 3.0 Dear your ball is delivered
推荐阅读
- python - 用字符串替换特性键中的值
- php - Symfony 4 ORMException
- flutter - 如何在flutter中通过BloC模式在页面的加载(初始)处设置两个不同的列表
- python - 如何在抓取网站时浏览基于 js/ajax 的分页?
- laravel - 无法使用从管理面板创建的用户的正确凭据登录
- java - 由于错误代码,无法使用共享图像按钮:无法找到包含的已配置根
- javascript - 用 for 循环检查 3 的倍数
- php - laravel - 用户、用户信息、用户地址和用户联系人之间的雄辩关系
- c# - Net Core Identity 每个用户一个角色
- oracle - SQL 触发器错误(ORA-00942:表或视图不存在)