python - 按列分组,然后根据条件过滤
问题描述
我有一个 df,我想根据分组过滤掉一个列。如果 day > 4,我想按组合((、、、和)保持分组cc
,然后odd
保留它,否则放弃它tree1
tree2
df = pd.DataFrame()
df['cc'] = ['BB', 'BB', 'BB', 'BB','BB', 'BB','BB', 'BB', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ']
df['odd'] = [3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435]
df['tree1'] = ['ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP']
df['tree2'] = ['ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK']
df['day'] = [1, 2, 3, 4, 3, 4, 5, 6, 2, 3, 4, 5, 1, 3, 5, 7, 1, 2, 6, 8, 2, 4, 6, 8]
df
我试过了,但这会删除任何天值小于 4 的行
df_grouped = df.groupby(['cc', 'odd', 'tree1', 'tree2']).filter(df['day'] > 4)
我收到这个错误TypeError: 'Series' object is not callable
并尝试了这个
df_grouped = df.groupby(['cc', 'odd', 'tree1', 'tree2']).filter(lambda x: x['day'] > 4)
我得到这个错误TypeError: filter function returned a Series, but expected a scalar bool
。
我搜索并尝试解决这些错误,但建议的解决方案对我不起作用。我想得到一个如下的df:
df1 = pd.DataFrame()
df1['cc'] = ['BB', 'BB','BB', 'BB', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'DD', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ', 'ZZ']
df1['odd'] = [3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435, 3434, 3434, 3434, 3434, 3435, 3435, 3435, 3435]
df1['tree1'] = ['SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP', 'ASP', 'ASP', 'ASP', 'ASP', 'SAP', 'SAP', 'SAP', 'SAP']
df1['tree2'] = ['ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK', 'ATK', 'ATK','ATK','ATK','ATK','ATK','ATK','ATK']
df1['day'] = [3, 4, 5, 6, 2, 3, 4, 5, 1, 3, 5, 7, 1, 2, 6, 8, 2, 4, 6, 8]
df1
我试图使用的逻辑函数,any
但我无法使其工作,它只返回True
或返回False
给我,而不是过滤的数据帧。
解决方案
现在我已经理解了你想要什么,让我们尝试一下transform
+ any
:
df[df.assign(key=df.day > 4)
.groupby(['cc', 'odd', 'tree1', 'tree2']).key.transform('any')
]
或者,
df[df.day.gt(4).groupby([df.cc, df.odd, df.tree1, df.tree2]).transform('any')]
cc odd tree1 tree2 day
4 BB 3435 SAP ATK 3
5 BB 3435 SAP ATK 4
6 BB 3435 SAP ATK 5
7 BB 3435 SAP ATK 6
8 DD 3434 ASP ATK 2
9 DD 3434 ASP ATK 3
10 DD 3434 ASP ATK 4
11 DD 3434 ASP ATK 5
12 DD 3435 SAP ATK 1
13 DD 3435 SAP ATK 3
14 DD 3435 SAP ATK 5
15 DD 3435 SAP ATK 7
16 ZZ 3434 ASP ATK 1
17 ZZ 3434 ASP ATK 2
18 ZZ 3434 ASP ATK 6
19 ZZ 3434 ASP ATK 8
20 ZZ 3435 SAP ATK 2
21 ZZ 3435 SAP ATK 4
22 ZZ 3435 SAP ATK 6
23 ZZ 3435 SAP ATK 8
推荐阅读
- angular6 - Angular 6对目录的非法操作,打开'/Users/
/.npm-global/lib/node_modules/ - ios - NSLayoutConstraints 的 UIViewPropertyAnimator 使视图消失
- php - 如何为每张图片创建一个唯一的名称并上传
- laravel - 路由 [user.verification.notice] 未定义/覆盖 EnsureEmailIsVerified?
- ms-access-2010 - MS Access 查询问题,在短文本字段中有条件
- c# - 计时器内的秒表在 C# 中不起作用
- javascript - 数字加一个数字在Javascript中是NaN
- excel - 如何在 Excel VBA 中获取更新单元格的行?
- python - Python ElasticSearch dsl:如何将字段表示为从字符串到整数列表的映射?
- python-3.x - 如何将数据库添加到 wxpython 的 wx.grid 中?