首页 > 解决方案 > 尝试根据列上的某些条件从熊猫数据框中删除一行

问题描述

我是一名从事 Mtech 工作的专业人士,并试图做一个机器学习项目。我是 python 和 ML 的新手。我有一个名为 Found 的列,它有多个值。我想删除所有与基于找到的列提到的特定条件不匹配的行

    df['Found']

0          developement
1          func-test
2          func-test
3         regression
4          func-test
5        integration
6          func-test
7          func-test
8         regression
9          func-test

我想将具有 Found 值的行保留为“任何具有测试和回归的

我写了以下代码。

remove_list = []
for x in range(df.shape[0]):
    text = df.iloc[x]['Found']
    if not re.search('test|regression', text, re.I):
        remove_list.append(x)
print(remove_list) 
df.drop(remove_list, inplace = True)
print(df)

但 remove_list 是空的。我在这里做错什么了吗?还是有更好的方法来实现这一目标?

[]
      Identifier Status  Priority  Severity         Found       Age  \
0     Bug 1      V       NaN         2         development        1   
1     Bug 2      R       NaN         6         func-test         203   
2     Bug 3      V       NaN         2         func-test          9   
3     Bug 4      D       NaN         3        regression          4   
4     Bug 5      V       NaN         2        func-test           9   

我什至试过这个,但我收到以下错误:

for x in range(df.shape[0]):
    if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
        df.drop(x, inplace = True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-14f97ad6d00a> in <module>
      1 for x in range(df.shape[0]):
----> 2     if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
      3         df.drop(x, inplace = True)

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/re.py in search(pattern, string, flags)
    183     """Scan through string looking for a match to the pattern, returning
    184     a Match object, or None if no match was found."""
--> 185     return _compile(pattern, flags).search(string)
    186 
    187 def sub(pattern, repl, string, count=0, flags=0):

TypeError: expected string or bytes-like object

标签: pythonpandas

解决方案


.str.contains()您可以使用布尔索引简洁地做到这一点:

df = df[df['Found'].str.contains('test|regression')]

#   Identifier Status  Priority  Severity       Found  Age
# 1      Bug 2      R       NaN         6   func-test  203
# 2      Bug 3      V       NaN         2   func-test    9
# 3      Bug 4      D       NaN         3  regression    4
# 4      Bug 5      V       NaN         2   func-test    9

如果您需要处理nan,请在前面添加replace(np.nan, '')

df = df[df['Found'].replace(np.nan, '').str.contains('test|regression')]

正如@sophocles 所提到的,您还可以使用以下命令使其不区分大小写case=False

df = df[df['Found'].str.contains('test|regression', case=False)]

推荐阅读