首页 > 解决方案 > 如何匹配熊猫系列中文本列中的单词或字符?

问题描述

假设我有这些词,我想在一个句子中查找所有这三个关键字。

keywords_to_track = ["crypto exchange", "loses", "$"] 
# here $ is character because it could appear like "30m$"


0       The $600M Crypto Heist (And How It Impacts ...
1             What Is a Decentralized Crypto Exchange?
2    Crypto Breaches And Fraud Increasing 41% Every...
3            Crypto Exchange Binance Loses $21M in Hack
4    Cryptocurrency hacks and fraud are on track fo...
Name: title, dtype: object

如果您看到,第三个索引在我需要跟踪的句子中包含所有这些单词。我想要的输出是,

0       False
1       False
2       False
3       True
4       False
Name: title, dtype: bool

我试过了,但我不想或不想运营,而且我认为我的尝试不正确

dataframe.title.str.lower().str.match("crypto exchange|loses|$")

标签: pythonpandas

解决方案


像这样改变你keywords_to_track

# add \ before $
keywords_to_track = ["crypto exchange", "loses", "\$"]

现在使用str.findall

words = fr"({'|'.join(keywords_to_track)})"

df['match_all'] = df['title'].str.lower() \
                             .str.findall(words) \
                             .apply(lambda x: len(set(x)) == len(keywords_to_track))

输出:

>>> df
                                               title  match_all
0     The $600M Crypto Heist (And How It Impacts ...      False
1           What Is a Decentralized Crypto Exchange?      False
2  Crypto Breaches And Fraud Increasing 41% Every...      False
3         Crypto Exchange Binance Loses $21M in Hack       True
4  Cryptocurrency hacks and fraud are on track fo...      False
5                    Crypto exchange Crypto exchange      False

推荐阅读