python - Python - 删除所有不在列表中的子字符串
问题描述
我想删除定义列表中不存在的 df 列中的所有子字符串。例如:
mylist = {good, like, bad, hated, terrible, liked}
Current: Desired:
index content index content
0 a very good idea, I like it 0 good like
1 was the bad thing to do 1 bad
2 I hated it, it was terrible 2 hated terrible
... ...
k Why do you think she liked it k liked
我已经设法定义了一个函数,它使所有单词不在列表中,但是不知道如何反转这个函数来实现我想要的:
pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')
任何帮助,将不胜感激。
解决方案
str.findall
与 一起使用str.join
:
df['column1'] = df['content'].str.findall('(' + pat + ')').str.join(' ')
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
或者使用拆分、过滤和连接列出理解:
df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
content column1
0 a very good idea, I like it good like
1 was the bad thing to do bad
2 I hated it, it was terrible hated terrible
3 Why do you think she liked it liked
推荐阅读
- javascript - Javascript Perlin 噪声实现。工件和价值绑定
- python-3.x - Itertools 组合错误/内存问题
- delphi - 我怎么知道一个 TLabel 单词是否包裹了文本?
- python - Filter Where clause in SqlAlchemy
- jquery - jQuery - 如果变量在元素内匹配,则样式
- python - removing a row and column at the same time python
- python - 线程池是如何工作的,以及如何在 NodeJS 之类的 async/await 环境中实现它?
- c# - Do I need to pass in the url to call a WCF web service?
- ios - Certificates with invalid serial no. (iPhone has denied the launch request.)
- perforce - p4 提交 -r 并保持不变