首页 > 解决方案 > 如何按行数在 python 中找到唯一记录?

问题描述

东风:

  Country  state      item 
0 Germany  Augsburg   Car
1 Spain    Madrid     Bike
2 Italy    Milan      Steel
3 Paris    Lyon       Bike
4 Italy    Milan      Steel
5 Germany  Augsburg   Car

在上面的数据框中,如果我们采用唯一记录外观。

  Country  state      item  Appeared
0 Germany  Augsburg   Car     1
1 Spain    Madrid     Bike    1
2 Italy    Milan      Steel   1
3 Paris    Lyon       Bike    1
4 Italy    Milan      Steel   2
5 Germany  Augsburg   Car     2

由于行号。4 和 5 第二次出现,我想更改它们的项目名称以区分两个记录。如果记录在数据中出现多次,项目名称应重命名为第一次出现的 Item_A 和第二次出现的 Item_B。 .. 输出:

Country  state      item  Appeared
0 Germany  Augsburg   Car_A   1
1 Spain    Madrid     Bike    1
2 Italy    Milan      Steel_A 1
3 Paris    Lyon       Bike    1
4 Italy    Milan      Stee_B  2
5 Germany  Augsburg   Car_B   2

标签: pythonpython-3.xpandaspython-2.7

解决方案


您可以先通过 获取Apprearedgroupby().cumcount,然后添加后缀:

# unique values
duplicates = df.duplicated(keep=False)

# Appearance count
df['Appeared'] = df.groupby([*df]).cumcount().add(1)

# add the suffixes
suffixes = np.array(list('ABC'))
df.loc[duplicates, 'item'] = df['item'] + '_' + suffixes[df.Appeared-1]

输出:

   Country     state     item  Appeared
0  Germany  Augsburg    Car_A         1
1    Spain    Madrid     Bike         1
2    Italy     Milan  Steel_A         1
3    Paris      Lyon     Bike         1
4    Italy     Milan  Steel_B         2
5  Germany  Augsburg    Car_B         2

推荐阅读