首页 > 解决方案 > get groups that contain all needed values

问题描述

df = pd.DataFrame({'A' : ['bar', 'bar', 'bar', 'foo',
                          'foo', 'foo'],
                    'B' : [1, 2, 3, 4, 5, 6],
                  'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> df
     A  B    C
0  bar  1  2.0
1  bar  2  5.0
2  bar  3  8.0
3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0

How can I get the groups with both neededVals = [1.0,2.0] in C if I groupby('A'):

3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0 

And just those values as well:

3  foo  4  1.0
4  foo  5  2.0

标签: pythonpandas

解决方案


我认为需要比较setGroupBy.transform过滤boolean indexing

neededVals = [1.0,2.0] 
df = df[df.groupby('A')['C'].transform(lambda x: set(x) >= set(neededVals))]
print (df)
     A  B    C
3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0

详情

print (df.groupby('A')['C'].transform(lambda x: set(x) >= set(neededVals)))
0    False
1    False
2    False
3     True
4     True
5     True
Name: C, dtype: bool

第二个首先过滤掉不必要的行isin,然后比较相等性:

df = df[df['C'].isin(neededVals)]
df = df[df.groupby('A')['C'].transform(lambda x: set(x) == set(neededVals))]
print (df)
     A  B    C
3  foo  4  1.0
4  foo  5  2.0

推荐阅读