首页 > 解决方案 > 对 groupby 函数应用过滤

问题描述

我有一组可以在不同高度有许多天线的单元。我需要构建一个数据框,其中包含具有超过 1 个天线但高度不同的单元格

我尝试使用 groupby 函数,它返回单元格的数量,但我无法弄清楚如何使用它进行过滤

import pandas as pd
df1 = pd.DataFrame( { 
    "Cell" : ["AAAA", "BBBB","BBBB","CCCC","CCCC","DDDD","DDDD"] , 
    "antenna" : ["A1", "A1","A1","A2","A4","A1","A2"] ,
    "height": ["5","30","30","45","45","30","15"] ,
    "function": 
["LTE1800","LTE700","LTE700","LTE700","LTE700","LTE2100","LTE2100"]} )

df1['count'] = df1.groupby('Cell')['Cell'].transform('count')

返回:

    Cell    antenna height  function    count
0   AAAA    A1      5       LTE1800     1
1   BBBB    A1      30      LTE700      2
2   BBBB    A1      30      LTE700      2
3   CCCC    A2      45      LTE700      2
4   CCCC    A4      45      LTE700      2
5   DDDD    A1      30      LTE2100     2
6   DDDD    A2      15      LTE2100     2

我想要的输出是:

    Cell    antenna height  function    count
1   DDDD    A1      30      LTE2100     2
2   DDDD    A2      15      LTE2100     2

或相反:

    Cell    antenna height  function    count
0   AAAA    A1      5       LTE1800     1
1   BBBB    A1      30      LTE700      2
2   BBBB    A1      30      LTE700      2
3   CCCC    A2      45      LTE700      2
4   CCCC    A4      45      LTE700      2

我对 groupby 查询的经验有限,所以我不知道如何实现这一点。

标签: pythonpandas

解决方案


所以你基本上想要 agroup by和 ahaving如果这是 SQL,你可以像这样实现:

df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() >= 2)

   Cell antenna height function
5  DDDD      A1     30  LTE2100
6  DDDD      A2     15  LTE2100


df1.groupby(['Cell'], as_index=False).filter(lambda g: g['height'].nunique() < 2)

   Cell antenna height function
0  AAAA      A1      5  LTE1800
1  BBBB      A1     30   LTE700
2  BBBB      A1     30   LTE700
3  CCCC      A2     45   LTE700
4  CCCC      A4     45   LTE700

推荐阅读