python - 对 groupby 函数应用过滤
问题描述
我有一组可以在不同高度有许多天线的单元。我需要构建一个数据框,其中包含具有超过 1 个天线但高度不同的单元格
我尝试使用 groupby 函数,它返回单元格的数量,但我无法弄清楚如何使用它进行过滤
import pandas as pd
df1 = pd.DataFrame( {
"Cell" : ["AAAA", "BBBB","BBBB","CCCC","CCCC","DDDD","DDDD"] ,
"antenna" : ["A1", "A1","A1","A2","A4","A1","A2"] ,
"height": ["5","30","30","45","45","30","15"] ,
"function":
["LTE1800","LTE700","LTE700","LTE700","LTE700","LTE2100","LTE2100"]} )
df1['count'] = df1.groupby('Cell')['Cell'].transform('count')
返回:
Cell antenna height function count
0 AAAA A1 5 LTE1800 1
1 BBBB A1 30 LTE700 2
2 BBBB A1 30 LTE700 2
3 CCCC A2 45 LTE700 2
4 CCCC A4 45 LTE700 2
5 DDDD A1 30 LTE2100 2
6 DDDD A2 15 LTE2100 2
我想要的输出是:
Cell antenna height function count
1 DDDD A1 30 LTE2100 2
2 DDDD A2 15 LTE2100 2
或相反:
Cell antenna height function count
0 AAAA A1 5 LTE1800 1
1 BBBB A1 30 LTE700 2
2 BBBB A1 30 LTE700 2
3 CCCC A2 45 LTE700 2
4 CCCC A4 45 LTE700 2
我对 groupby 查询的经验有限,所以我不知道如何实现这一点。
解决方案
所以你基本上想要 agroup by
和 ahaving
如果这是 SQL,你可以像这样实现:
df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() >= 2)
Cell antenna height function
5 DDDD A1 30 LTE2100
6 DDDD A2 15 LTE2100
df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() < 2)
Cell antenna height function
0 AAAA A1 5 LTE1800
1 BBBB A1 30 LTE700
2 BBBB A1 30 LTE700
3 CCCC A2 45 LTE700
4 CCCC A4 45 LTE700