首页 > 解决方案 > 如果在 Pandas 中删除重复项时,如果一列的值不是 None,则保留行

问题描述

给定一个玩具数据框如下:

id       type      name     purpose
1       retail    tower a    sell
        retail    tower a    rent
        office      t1       sell  
2       office      t1       rent
        retail      t2       sell
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我想根据子集列删除重复项,typename不是保留firstlast( ),如果列不是df.drop_duplicates(subset = ['type', 'name'], keep= 'last'),我希望保留该行。idNone

预期的结果将是这样的:

id       type      name     purpose
1       retail    tower a    sell
2       office      t1       rent
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我怎么能在 Python 中做到这一点?谢谢。

标签: pythonpython-3.xpandasdataframe

解决方案


您可以通过测试非缺失值来创建帮助列,更改行的顺序iloc并通过最大值获取索引,这意味着最后一个非错误DataFrameGroupBy.idxmax,最后传递到loc

idx = df.assign(tmp = df['id'].notna()).iloc[::-1].groupby(['type','name'])['tmp'].idxmax()
df = df.loc[idx.iloc[::-1]]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
5  NaN  retail       t2    rent
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

如果要保留第一个值:

idx = df.assign(tmp = df['id'].notna()).groupby(['type','name'], sort=False)['tmp'].idxmax()
df = df.loc[idx]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
4  NaN  retail       t2    sell
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

推荐阅读