python - Filtering pandas dataframe groups based on groups comparison
问题描述
I am trying to remove corrupted data from my pandas dataframe. I want to remove groups from dataframe that has difference of value bigger than one from the last group. Here is an example:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 8 <- here number of group if I groupby by Value is larger than
7 8 the last groups number by 6, so I want to remove this
8 3 group from dataframe
9 3
Expected result:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
Edit: jezrael solution is great, but in my case it is possible that there will be dubplicate group values:
Value
0 1
1 1
2 1
3 3
4 3
5 3
6 1
7 1
Sorry if I was not clear about this.
解决方案
First remove duplicates for unique rows, then compare difference with shifted values and last filter by boolean indexing:
s = df['Value'].drop_duplicates()
v = s[s.diff().gt(s.shift())]
df = df[~df['Value'].isin(v)]
print (df)
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3
推荐阅读
- java - 如何获取此代码的父 A+ 的名称?
- xamarin - 如何将滚动视图添加到其中再次具有线性布局的线性布局
- node.js - neo4j - 关于使用 randomUUID() 属性的关系有时设置为 null
- c++ - matlab和c++中while循环的区别
- java - 在 android studio 中获取 android 4.0 中的位置的问题
- python-3.x - 从 Pyspark 数据框中选择列时,“where 子句”中的未知列“id”
- azure-devops - 从 Azure Devops 导出 Sprint 容量
- spring-boot - Spring Boot 2.1.6.RELEASE - 执行器不可用 HTTP 404
- django - 如何在模型中搜索多个类
- android - 如何将 Not Null 表列迁移到 Android Room 数据库中的 Null