首页 > 解决方案 > 删除重复项后替换列值

问题描述

我有一个数据框:

id    time
Uk6   year
36h   year
Uk6   two-year
rf5   month
gg7   year
rf5   half-year

我需要根据列“id”删除重复项,并将重复行的时间值替换为“未知”。结果应该是:

id      time
Uk6    unknown
36h    year
rf5    unknown
gg7    year

我为之前的问题(like_this)尝试了建议的答案,但它们不起作用。

标签: pythonpandasdataframeduplicates

解决方案


尝试以下

# create the dataframe
df = pd.DataFrame(data={'id': ['Uk6', '36h', 'Uk6', 'rf5', 'gg7', 'rf5'],
                        'time': ['year', 'year', 'two-year', 'month', 'year', 'half-year']})

# get duplicated id's
dups_id = df[df.duplicated(subset='id')]['id']

# remove rows from dataframe with id that has duplicated rows
df = df.drop_duplicates(subset='id')

# replace values of 'time' for those rows with duplicated id's with 'unknown'
df.loc[:,'time'] = df['time'].where(~df['id'].isin(dups_id), other='unknown')

输出

    id     time
0  Uk6  unknown
1  36h     year
3  rf5  unknown
4  gg7     year

推荐阅读