首页 > 解决方案 > Count and assign categories based on majority voting

问题描述

I have a pandas dataframe in the below format:

 Class   Category
 XYZ     ABC
 XYZ     ABC
 XYZ     DEF
 XYZ1    ABC
 XYZ1    ABC
 XYZ1    ABC
 XYZ1    HLR
 XYZ2    ABC

For every unique class, if there are duplicates, based on "majority voting", I assign the corresponding category to that class. For example, for "XYZ", Category should be "ABC". For "XYZ1", category has to be "ABC" as well as "HLR" appears only once. If there are no discrepencies, then its straightforward (for "XYZ2", it would be "ABC").

Wondering is there a way to achieve this without storing the value counts in a table and then loop over it to groupby and assign categories based on majority voting.

Any leads would be appreciated.

标签: pandasgroup-by

解决方案


尝试通过mode

from statistics import mode
df['New_Categroy'] = df.groupby('Class').transform(mode)
输出:
  Class Category New_Categroy
0   XYZ      ABC          ABC
1   XYZ      ABC          ABC
2   XYZ      DEF          ABC
3  XYZ1      ABC          ABC
4  XYZ1      ABC          ABC
5  XYZ1      ABC          ABC
6  XYZ1      HLR          ABC
7  XYZ2      ABC          ABC

推荐阅读