首页 > 解决方案 > How to delete all drop duplicate records and keep only first two using pandas

问题描述

I have a dataset with multiple customer transaction records and unique customer ID. I need to delete all duplicate records except first TWO. I know the functionality of drop_duplicates but I need to figure out how to delete all except first two.

Example

cust_ID  transaction_Date
------   ---------------
abc         01/01/2013
abc         02/09/2013
abc         06/06/2015
abc         09/09/2019
def         02/01/2015
ghi         09/09/2013
def         09/02/2014

My result should be:

cust_ID  transaction_Date
------   ---------------
abc         01/01/2013
abc         02/09/2013
def         02/01/2015
ghi         09/09/2013
def         09/02/2014

Here two records of abc are maintained. Others are deleted. def have only two records and all two are maintained, nothing is deleted.

Is there any way to do? Appreciate any slight help. Thanks in advance

标签: pythonpandasdataframeduplicatespandas-groupby

解决方案


A simple head(2)

df.groupby('cust_ID').head(2)

Out[8]:
  cust_ID transaction_Date
0     abc       01/01/2013
1     abc       02/09/2013
4     def       02/01/2015
5     ghi       09/09/2013
6     def       09/02/2014

推荐阅读