首页 > 解决方案 > 按列分组,然后根据条件过滤

问题描述

我有一个 df,我想根据分组过滤掉一个列。如果 day > 4,我想按组合((、、、和)保持分组cc,然后odd保留它,否则放弃它tree1tree2

df = pd.DataFrame()
df['cc'] = ['BB', 'BB', 'BB', 'BB','BB', 'BB','BB', 'BB', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ']
df['odd'] = [3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435]
df['tree1'] = ['ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP']
df['tree2'] = ['ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK']
df['day'] = [1, 2, 3, 4, 3, 4, 5, 6, 2, 3, 4, 5, 1, 3, 5, 7, 1, 2, 6, 8, 2, 4, 6, 8]
df

我试过了,但这会删除任何天值小于 4 的行

df_grouped = df.groupby(['cc', 'odd', 'tree1', 'tree2']).filter(df['day'] > 4)

我收到这个错误TypeError: 'Series' object is not callable

并尝试了这个

df_grouped = df.groupby(['cc', 'odd', 'tree1', 'tree2']).filter(lambda x: x['day'] > 4)

我得到这个错误TypeError: filter function returned a Series, but expected a scalar bool

我搜索并尝试解决这些错误,但建议的解决方案对我不起作用。我想得到一个如下的df:

df1 = pd.DataFrame()
df1['cc'] = ['BB', 'BB','BB', 'BB', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ']
df1['odd'] = [3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435]
df1['tree1'] = ['SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP']
df1['tree2'] = ['ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK']
df1['day'] = [3, 4, 5, 6, 2, 3, 4, 5, 1, 3, 5, 7, 1, 2, 6, 8, 2, 4, 6, 8]
df1

我试图使用的逻辑函数,any但我无法使其工作,它只返回True或返回False给我,而不是过滤的数据帧。

标签: pythonpandasdataframegroup-bypandas-groupby

解决方案


现在我已经理解了你想要什么,让我们尝试一下transform+ any

df[df.assign(key=df.day > 4)
     .groupby(['cc', 'odd', 'tree1', 'tree2']).key.transform('any')
]

或者,

df[df.day.gt(4).groupby([df.cc, df.odd, df.tree1, df.tree2]).transform('any')]

    cc   odd tree1 tree2  day
4   BB  3435   SAP   ATK    3
5   BB  3435   SAP   ATK    4
6   BB  3435   SAP   ATK    5
7   BB  3435   SAP   ATK    6
8   DD  3434   ASP   ATK    2
9   DD  3434   ASP   ATK    3
10  DD  3434   ASP   ATK    4
11  DD  3434   ASP   ATK    5
12  DD  3435   SAP   ATK    1
13  DD  3435   SAP   ATK    3
14  DD  3435   SAP   ATK    5
15  DD  3435   SAP   ATK    7
16  ZZ  3434   ASP   ATK    1
17  ZZ  3434   ASP   ATK    2
18  ZZ  3434   ASP   ATK    6
19  ZZ  3434   ASP   ATK    8
20  ZZ  3435   SAP   ATK    2
21  ZZ  3435   SAP   ATK    4
22  ZZ  3435   SAP   ATK    6
23  ZZ  3435   SAP   ATK    8

推荐阅读