首页 > 解决方案 > Filtering pandas dataframe groups based on groups comparison

问题描述

I am trying to remove corrupted data from my pandas dataframe. I want to remove groups from dataframe that has difference of value bigger than one from the last group. Here is an example:

   Value
0      1
1      1
2      1
3      2
4      2
5      2
6      8 <- here number of group if I groupby by Value is larger than
7      8    the last groups number by 6, so I want to remove this
8      3    group from dataframe
9      3

Expected result:

   Value
0      1
1      1
2      1
3      2
4      2
5      2
6      3
7      3

Edit: jezrael solution is great, but in my case it is possible that there will be dubplicate group values:

   Value
0      1
1      1
2      1
3      3
4      3
5      3
6      1
7      1

Sorry if I was not clear about this.

标签: pythonpandasdataframe

解决方案


First remove duplicates for unique rows, then compare difference with shifted values and last filter by boolean indexing:

s = df['Value'].drop_duplicates()
v = s[s.diff().gt(s.shift())]

df = df[~df['Value'].isin(v)]
print (df)
   Value
0      1
1      1
2      1
3      2
4      2
5      2
8      3
9      3

推荐阅读