pandas - Filtering a pandas dataframe by aggregating on two columns
问题描述
I have a pandas dataframe. Here are the first five rows:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
I would like to group by StockCode
and CustomerID
, and sum Quantity
. Then, I'd like to throw out all of the StockCode
/CustomerID
pairs where this sum is negative. The desired final product is the original dataframe with the rows corresponding to these StockCode/CustomerID pairs removed.
I have a working solution:
retail_df.groupby(['CustomerID','StockCode']).filter(lambda x: x['Quantity'].sum() >= 0)
However, it takes my laptop four minutes to run it. There are 406829 rows. Is there a faster way?
解决方案
这应该可以解决问题:
df2=retail_df.groupby(['CustomerID','StockCode'])["Quantity"].sum().ge(0)
retail_df=retail_df.set_index(['CustomerID','StockCode']).loc[df2.loc[df2].index].reset_index(drop=False)
推荐阅读
- microservices - 微服务架构中客户端到 API 网关的通信
- php - 为什么我得到 Class '\App\Teacher' not found 错误
- vb.net - 将字体样式更改为 Richtextbox vb.net
- java - 使用 Apache Spark Java“转换”传感器数据
- java - 错误:类 Droid 中的构造函数 Droid 不能应用于给定类型;
- sql-server - 如何使用 docker 在 nginx 上部署具有 angular 6 作为前端、后端为 Asp dot net core 和 SQL 服务器作为数据库的 Web 应用程序
- java - java WebProject的依赖jar文件的ClassNotFoundException
- algorithm - 在单链表和双链表中删除的时间复杂度是多少?
- javascript - AngularJS(路由)在 Eclipse 的 Maven SpringBoot 项目中找不到 .html 模板
- javascript - “devicemotion”的事件侦听器未运行功能