python - 如何匹配熊猫系列中文本列中的单词或字符?
问题描述
假设我有这些词,我想在一个句子中查找所有这三个关键字。
keywords_to_track = ["crypto exchange", "loses", "$"]
# here $ is character because it could appear like "30m$"
0 The $600M Crypto Heist (And How It Impacts ...
1 What Is a Decentralized Crypto Exchange?
2 Crypto Breaches And Fraud Increasing 41% Every...
3 Crypto Exchange Binance Loses $21M in Hack
4 Cryptocurrency hacks and fraud are on track fo...
Name: title, dtype: object
如果您看到,第三个索引在我需要跟踪的句子中包含所有这些单词。我想要的输出是,
0 False
1 False
2 False
3 True
4 False
Name: title, dtype: bool
我试过了,但我不想或不想运营,而且我认为我的尝试不正确
dataframe.title.str.lower().str.match("crypto exchange|loses|$")
解决方案
像这样改变你keywords_to_track
:
# add \ before $
keywords_to_track = ["crypto exchange", "loses", "\$"]
现在使用str.findall
:
words = fr"({'|'.join(keywords_to_track)})"
df['match_all'] = df['title'].str.lower() \
.str.findall(words) \
.apply(lambda x: len(set(x)) == len(keywords_to_track))
输出:
>>> df
title match_all
0 The $600M Crypto Heist (And How It Impacts ... False
1 What Is a Decentralized Crypto Exchange? False
2 Crypto Breaches And Fraud Increasing 41% Every... False
3 Crypto Exchange Binance Loses $21M in Hack True
4 Cryptocurrency hacks and fraud are on track fo... False
5 Crypto exchange Crypto exchange False
推荐阅读
- c++ - 按词法顺序排列二维数组
- python - python中的冰雹程序
- python - 是否可以在日期时间模块中制作计数器?
- python - TypeError: 'Tk' object is not callable 为什么我会收到此错误消息
- angular - 指向组件的动态菜单链接
- c++ - C ++中的双重析构函数调用
- css - 文本元素不适合 div 父级 [Oxygen Builder / Wordpress]
- python - 在 conda 中复制 colab env
- c++ - 如何使用自己的 operator= 函数将“X”类中的 operator= 调用到“Y”类中?
- python - python-docx包无法读取图像文件