python - 从数据框中删除列表中不存在的行
问题描述
我有一个数据框
29 tech is a fucking bloodbath.
219 only 3 things guaranteed in life ATH taxes a...
255 market is at ath in zombie economy\n\nmarket c...
276 my aapl watch reminding me to breathe while i...
我有一个清单
names =
['ATH', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE', 'CLNE', 'AAPL', 'WKHS']
我的代码看起来像这样
for ticker in top_tickers:
df_ticker_lower = item[item.text.str.contains(ticker.lower())]
df_ticker_upper = item[item.text.str.contains(ticker.upper())]
df_ticker = pd.concat([df_ticker_lower, df_ticker_upper], axis=0)
df_ticker['dt'] = pd.to_datetime(df_ticker.dt)
def dedup(sentence, to_dedup):
for word in to_dedup:
while sentence.split().count(word) > 3:
sentence = ''.join(sentence.rsplit(word, 1)).replace(' ', ' ')
return sentence
def foo(row):
global names
sentence = row['text']
return dedup(sentence, names)
df_ticker['text'] = df_ticker.apply(foo, axis=1)
我想要做的是保留列表中包含任何单词的行。重要的部分是,如果列表中的任何单词周围有任何内容,则应删除该行。在这种情况下,需要删除第 29 行,因为 Bloodbath 是包含 ath 的单词。如果将其分开,我将保留此行,但在这种情况下,我希望删除此行 感谢您的帮助
解决方案
使用单词边界\b\b
省略类似bloodbath
的单词Series.str.contains
:
pat = '|'.join(r"\b{}\b".format(x) for x in names)
df = df[df['text'].str.contains(pat,case=False,na=True)]
print (df)
text
1 only 3 things guaranteed in life ATH taxes
2 market is at ath in zombie economy
4 my aapl watch reminding me to breathe while i...
如果需要提取第一个匹配值:
import re
names = ['ATH', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE', 'CLNE', 'AAPL', 'WKHS']
pat = '|'.join(r"\b{}\b".format(x) for x in names)
df['new'] = df['text'].str.extract(f'({pat})', flags=re.I)
print (df)
text new
0 tech is a fucking bloodbath. NaN
1 only 3 things guaranteed in life ATH taxes ATH
2 market is at ath in zombie economy ath
3 market c. NaN
4 my aapl watch reminding me to breathe while i... aapl
或列表的所有匹配值:
import re
names = ['ATH', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE', 'CLNE', 'AAPL', 'WKHS']
pat = '|'.join(r"\b{}\b".format(x) for x in names)
df['new'] = df['text'].str.findall(pat, flags=re.I)
print (df)
text new
0 tech is a fucking bloodbath. []
1 only 3 things guaranteed in life ATH taxes [ATH]
2 market is at ath in zombie economy [ath]
3 market c. []
4 my aapl watch reminding me to breathe while i... [aapl]
推荐阅读
- php - 是否可以结合mysql和mongodb数据库进行排序
- javascript - 滑块过渡有误
- css - “a”元素上的悬停效果不起作用但适用于按钮(同一页面)
- python - 在Python中查找椭圆的最小和最大角度
- python - 绘制所有样本的混淆矩阵
- java - url.openConnection() 发出多个服务器请求
- asp.net - 即使开始创建新的 Web 应用程序,我的所有 aspx 页面的第一行也会出错
- visual-studio-2019 - 用BIML脚本中的csv文件替换所有双引号
- php - 如何在 PHP 中计算 AWS 签名?
- python - SQL查询中的具体结果