首页 > 解决方案 > 从 {condition} 的 pandas 中删除组

问题描述

我有这样的数据框:

+---+--------------------------------------+-----------+
|   |              envelopeid              |  message  |
+---+--------------------------------------+-----------+
| 1 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.00002 |
| 2 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.00004 |
| 3 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.11001 |
| 4 | 5cb72b9c-adb8-4e1c-9296-db2080cb3b6d | CMN.00002 |
| 5 | 5cb72b9c-adb8-4e1c-9296-db2080cb3b6d | CMN.00001 |
| 6 | f4260b99-6579-4607-bfae-f601cc13ff0c | CMN.00202 |
| 7 | 8f673ae3-0293-4aca-ad6b-572f138515e6 | CMN.00002 |
| 8 | fee98470-aa8f-4ec5-8bcd-1683f85727c2 | TKP.00001 |
| 9 | 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00002 |
| 10| 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00004 |
+---+--------------------------------------+-----------+

我已将其分组,grouped = df.groupby('envelopeid') 并且我需要从数据框中删除所有组,并仅保留那些仅包含消息 (CMN.00002) 或 (CMN.00002 和 CMN.00004) 的组。所需的数据框:

+---+--------------------------------------+-----------+
|   |              envelopeid              |  message  |
+---+--------------------------------------+-----------+
| 7 | 8f673ae3-0293-4aca-ad6b-572f138515e6 | CMN.00002 |
| 9 | 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00002 |
| 10| 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00004 |
+---+--------------------------------------+-----------+ 

试过了

(grouped.message.transform(lambda x: x.eq('CMN.00001').any() or (x.eq('CMN.00002').any() and x.ne('CMN.00002' or 'CMN.00004').any()) or x.ne('CMN.00002').all()))

但它不能正常工作

标签: python-3.xpandas

解决方案


尝试:

grouped = df.loc[df['message'].isin(['CMN.00002', 'CMN.00002', 'CMN.00004'])].groupby('envelopeid')

推荐阅读