首页 > 解决方案 > Rename category of a categorical variable that's less than 0.5% of the mode count, Value_counts()

问题描述

I have a very large df, lots of rows and columns. I want to rename the category of the categorical variable as "other" if it's less than 0.5% of the count of the mode.

I know df[colname].value_counts(normalize=True) gives me distribution of all categories. how do i extract the ones less than 0.5% of the mode, and how to rename it as other?

  apple
large 100
medium 50
small  3

desired output

  apple
large 100
medium 50
other  3

标签: pythonpandasdataframe

解决方案


使用Series.mapwithSeries.value_counts和 compre by less bySeries.lt掩码与原始列相同的大小,因此在 中设置新值Series.mask

m = df['apple'].map(df['apple'].value_counts(normalize=True).lt(0.005))
df['apple'] = df['apple'].mask(m, 'other')

对于计数:

s = df['apple'].value_counts()
print (s)
large     100
medium     50
other       3
Name: apple, dtype: int64

推荐阅读