首页 > 解决方案 > 从单词列表中提取所有匹配的关键字并创建一个新的数据框 pandas

问题描述

我想从意见列中提取所有匹配的关键字,如果它们与关键字列表中的单词匹配,则在新列中打印所有匹配的单词(包括重复的单词)。当前代码只提取第一个匹配的单词,不包含重复的单词。

import pandas as pd

df = pd.DataFrame({
    'opinions':[
        "I think the movie is fantastic. Shame it's so short!",
        "How did they make it?",
        "I had a fantastic time at the cinema last night!",
        "I really disliked the cast",
        "the film was sad and boring",
        "Absolutely loved the movie! Can't wait to see part 2",
    ]
})

keywords = ['movie', 'great', 'fantastic', 'loved']

query = '|'.join(keywords)
df['word'] = df['opinions'].str.extract( '({})'.format(query) )

print(df)

电流输出

在此处输入图像描述

标签: pythonregexpandasdataframe

解决方案


您应该替换extractfindall

在系列/索引中查找所有出现的模式或正则表达式。

相当于将 re.findall() 应用于 Series/Index 中的所有元素。

print(df)
                                                opinions                word
    0  I think the movie is fantastic. Shame it's so ...  [movie, fantastic]
    1                              How did they make it?                  []
    2   I had a fantastic time at the cinema last night!         [fantastic]
    3                         I really disliked the cast                  []
    4                        the film was sad and boring                  []
    5  Absolutely loved the movie! Can't wait to see ...      [loved, movie]

推荐阅读