python - Rename category of a categorical variable that's less than 0.5% of the mode count, Value_counts()
问题描述
I have a very large df, lots of rows and columns. I want to rename the category of the categorical variable as "other" if it's less than 0.5% of the count of the mode.
I know df[colname].value_counts(normalize=True)
gives me distribution of all categories. how do i extract the ones less than 0.5% of the mode, and how to rename it as other?
apple
large 100
medium 50
small 3
desired output
apple
large 100
medium 50
other 3
解决方案
使用Series.map
withSeries.value_counts
和 compre by less bySeries.lt
掩码与原始列相同的大小,因此在 中设置新值Series.mask
:
m = df['apple'].map(df['apple'].value_counts(normalize=True).lt(0.005))
df['apple'] = df['apple'].mask(m, 'other')
对于计数:
s = df['apple'].value_counts()
print (s)
large 100
medium 50
other 3
Name: apple, dtype: int64
推荐阅读
- json - 使用参数从 Wix 调用 PowerShell 脚本
- reactjs - Redux 工具包操作不知何故不同步
- php - Facebook Conversions API - AuthorizationException(无效参数)
- python - import_public_link() 中的 file_key 参数是什么?
- javascript - 承诺链中的异常处理
- pyspark - 如何批量收集RDD中的元素
- spring-boot - Spring @Transactional 在 CompletionHandler 回调方法中不起作用
- javascript - 用于切换“显示您的位置”的 Javascript 传单地图
- oracle - DBMS_METADATA.PUT(CLOB) 上的 ORA-31607: 调用了 (1) FETCH_XML
- powershell - 我想做一个以计算机名作为参数的脚本