首页 > 解决方案 > 如何组合 n 个布尔索引来过滤 Pandas DataFrame?

问题描述

Pandas 布尔索引通常与逻辑运算符结合使用:

vdf = (df1['status'] == 'DENIED') | (df1['status'] == 'VOIDED') | (df1['void?'] == True)

我正在处理各种 DF 表。一个表可能有零个或多个我想要过滤的列。当我说“过滤器”时,我的意思是删除条件为真的行。如果交易无效,我想放弃它。如果交易与特定类别匹配,我想删除它。

如何组合 n 布尔索引?

table = [('2019-01-01', 10.00, False, 'CAPTURED'),
         ('2019-01-04', 10.00, False, 'CAPTURED'),
         ('2019-01-05', 10.00, False, 'DENIED'),
         ('2019-01-06', 10.00, True, 'VOIDED'),
cols = ['date', 'amount', 'void?', 'status']
df1 = pd.DataFrame.from_records(table, columns=cols)

filter_headers = ['void?', 'status']
status_vals = ['VOIDED', 'DENIED']

try:
    if filter_headers:
        vdfs = []
        for fcol in filter_headers:
            if df1[fcol].dtype == 'bool':
                vdfs.append(df1[fcol] == True)
            elif df1[fcol].dtype == 'object':
                vdfs.append(df1[fcol].isin(status_vals))
            else:
                print("Unhandled type.")
        # Obviously wrong...
        df2 = df1[~sum(vdfs)]
    else:
        df2 = df1
except Exception as e:
    print("(%s) Filter Headers produced no results." % e)
    pass

标签: pandas

解决方案


而不是sum,您可以使用np.anyaxis = 0,例如:

import numpy as np
# mostly all your code except this line df2 = df1[~sum(vdfs)] that you replace by
df2 = df1[~np.any(vdfs, axis=0)]

在您的示例中,结果df2

         date  amount  void?    status
0  2019-01-01    10.0  False  CAPTURED
1  2019-01-04    10.0  False  CAPTURED

推荐阅读