python - 从单词列表中提取所有匹配的关键字并创建一个新的数据框 pandas
问题描述
我想从意见列中提取所有匹配的关键字,如果它们与关键字列表中的单词匹配,则在新列中打印所有匹配的单词(包括重复的单词)。当前代码只提取第一个匹配的单词,不包含重复的单词。
import pandas as pd
df = pd.DataFrame({
'opinions':[
"I think the movie is fantastic. Shame it's so short!",
"How did they make it?",
"I had a fantastic time at the cinema last night!",
"I really disliked the cast",
"the film was sad and boring",
"Absolutely loved the movie! Can't wait to see part 2",
]
})
keywords = ['movie', 'great', 'fantastic', 'loved']
query = '|'.join(keywords)
df['word'] = df['opinions'].str.extract( '({})'.format(query) )
print(df)
电流输出
解决方案
您应该替换extract
为findall
:
在系列/索引中查找所有出现的模式或正则表达式。
相当于将 re.findall() 应用于 Series/Index 中的所有元素。
print(df)
opinions word
0 I think the movie is fantastic. Shame it's so ... [movie, fantastic]
1 How did they make it? []
2 I had a fantastic time at the cinema last night! [fantastic]
3 I really disliked the cast []
4 the film was sad and boring []
5 Absolutely loved the movie! Can't wait to see ... [loved, movie]
推荐阅读
- xamarin.forms - Xamarin Forms:如何从 SwipeStarted 事件中获取 SwipeDirection 值?
- perl - 如何让 perl 网络爬虫像 wget 一样进行“广度优先”检索?
- mysql - 在 MySQL 中创建一个将数据显示为报告的查询
- php - jQuery 显示功能
- python - 我能够在 heroku 上构建我的 django web 应用程序,但是在启动它时显示这些错误(错误日志如下所示)
- java - java.lang.LIKE_FOR_NO_REASON.ArrayIndexOutOfBoundsException
- laravel - 如何解决 Laravel 中的自动注销问题
- python - 无法使用 XPATH 在 Facebook 中找到搜索栏元素
- ipython - 有人可以解释一下“!”是怎么做的吗?和“%%”命令在 python 中工作?
- swift - MCBrowserViewController 不应该在浏览器中收到此回调消息?