首页 > 解决方案 > Pandas:根据另一列过滤 groupby 重复的行

问题描述

我有以下数据框,并按 Patientid 对它们进行分组。现在我想向具有相同(重复)就诊日期的患者显示他们的药物列不是 NA。

import pandas as pd    
df = pd.DataFrame({'patientid':["s1001","s1002","s1001","s1003","s1001","s1002","s1003","s1001","s1002","s1003"],
                   'visitdate':["2016/01/01","2017/05/01","2016/01/01","2016/08/01","2019/01/01","2016/01/01","2016/01/01","2015/01/01","2016/03/01","2016/05/01"],
                   'medication1':["Copaxone","Copaxone","NA","NA","NA","NA","Rituximab","Rituximab","Rebif","Copaxone"],
                   'medication2':["NA","NA","Rebif","Rituximab","Copaxone","NA","NA","NA","NA","Copaxone"]
                  })

例如,患者 s1001 有两个重复的访问日期 2016/01/01,两个日期的药物 1 和药物 2 都不为空。

grouped = df.groupby("patientid")
for key, group in grouped:
    print(key)
    print(group)
s1001
  patientid   visitdate medication1 medication2
0     s1001  2016/01/01    Copaxone          NA
2     s1001  2016/01/01          NA       Rebif
4     s1001  2019/01/01          NA    Copaxone
7     s1001  2015/01/01   Rituximab          NA
s1002
  patientid   visitdate medication1 medication2
1     s1002  2017/05/01    Copaxone          NA
5     s1002  2016/01/01          NA          NA
8     s1002  2016/03/01       Rebif          NA
s1003
  patientid   visitdate medication1 medication2
3     s1003  2016/08/01          NA   Rituximab
6     s1003  2016/01/01   Rituximab          NA
9     s1003  2016/05/01    Copaxone    Copaxone

如何过滤 groupby 以仅显示重复的访问日期。我试过下面的代码:

df.groupby(by= 'patientid', dropna=False).filter(lambda x: (x.visitdate.duplicated()).any())

    patientid   visitdate   medication1     medication2
0   s1001   2016/01/01  Copaxone    NA
2   s1001   2016/01/01  NA  Rebif
4   s1001   2019/01/01  NA  Copaxone
7   s1001   2015/01/01  Rituximab   NA

但它显示了患者 s1001 的所有就诊日期。有什么想法吗?

标签: pythonpandas-groupby

解决方案


推荐阅读