首页 > 解决方案 > 熊猫删除不包含字符串列表的行

问题描述

我有一个具有这种数据结构的 csv 文件:

timestamp.  message.         name.   destinationUserName sourceUserName 
time.        login.          hello.    
time.        logout.         hello
time.        successful      hello1
time.        hello.          no
time.        notsuccessful   no

在我当前的代码中,我能够根据name列是否包含helloor hello1.. 来过滤列,但我想做的是不仅检查name而且能够检查message列并仅返回包含successfulor的消息notsuccesful

到目前为止,我有这个代码:

f=pd.read_csv('file.csv')
f = f[f['name'].isin(names_to_keep)]

这可以完美地返回包含我在中声明的名称列表的所有名称names_to_keep。所以我尝试更新代码以添加消息使用

f = f[f['name'].isin(names_to_keep & f[f['message'].isin(message_to_keep)])]

在这种情况下,使用&它会返回一个空文档,因为在当前文件中我没有任何message带有该字符串的字符串,这很好,但我希望脚本返回names即使没有message机器代码。

我希望我的例子足够清楚,如果您需要更多信息,请告诉我。

预期结果:

timestamp.  message.         name.   destinationUserName sourceUserName 
time.        login.          hello.    
time.        logout.         hello
time.        successful      hello1
time.        notsuccessful   no

标签: python-3.xpandasdataframe

解决方案


如果要返回 name 列包含值列表中的值或 message 列包含值列表中的值的行,则可以使用它。

import pandas as pd

df = pd.read_csv('test.csv')

names_to_keep =  ['hello', 'hello1', 'hello2']

messages_to_keep = ['successful', 'notsuccessful']

print(df)

df = df[df['name'].isin(names_to_keep) 
 | df['message'].isin(messages_to_keep)]

print(df)
Sample Input
  timestamp        message    name destinationuserna
0      time          login   hello             user1
1      time         logout  hello1             user2
2      time     successful  hello2             user3
3      time          hello      no             user3
4      time  notsuccessful   don't            random
Sample Output
0      time          login   hello             user1         8-8103
1      time         logout  hello1             user2         8-8103
2      time     successful  hello2             user3         8-8103
4      time  notsuccessful   don't            random         8-8103

推荐阅读