python - How to delete all drop duplicate records and keep only first two using pandas
问题描述
I have a dataset with multiple customer transaction records and unique customer ID. I need to delete all duplicate records except first TWO. I know the functionality of drop_duplicates but I need to figure out how to delete all except first two.
Example
cust_ID transaction_Date
------ ---------------
abc 01/01/2013
abc 02/09/2013
abc 06/06/2015
abc 09/09/2019
def 02/01/2015
ghi 09/09/2013
def 09/02/2014
My result should be:
cust_ID transaction_Date
------ ---------------
abc 01/01/2013
abc 02/09/2013
def 02/01/2015
ghi 09/09/2013
def 09/02/2014
Here two records of abc are maintained. Others are deleted. def have only two records and all two are maintained, nothing is deleted.
Is there any way to do? Appreciate any slight help. Thanks in advance
解决方案
A simple head(2)
df.groupby('cust_ID').head(2)
Out[8]:
cust_ID transaction_Date
0 abc 01/01/2013
1 abc 02/09/2013
4 def 02/01/2015
5 ghi 09/09/2013
6 def 09/02/2014
推荐阅读
- spring-cloud - 我们可以在同一个 SpringBootApplication 上配置 spring 云网关和服务发现(eureka 服务器)吗
- sql - 在连接字符串上设置 JPA 查询参数
- c# - 如何使用 VSCode 分析和测量代码性能?
- r - 将 PDF 文件导入 R 并组织数据
- cordova - 离子应用程序图标未出现在任务/应用程序视图中
- python - Amazon S3 下载文件方法返回空文件
- clickhouse - 从 *MergeTree 系列的分布式表中进行 SELECT 时,最后插入的数据是否可用?
- javascript - 使用 Ng2-Charts 更新 Charts.js 上的图表数据
- kubernetes - Kubernetes 准入控制器 webhook 执行竞争安全吗?
- javascript - 下拉菜单中的多个负 z-index