首页 > 解决方案 > 如果条件不满足,则计算类别数并按列表删除

问题描述

鉴于:

import pandas as pd

lis1= ('apple','orange','strawberry','strawberry','strawberry','apple','orange','orange','orange','strawberry')
lis2= ("lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review")

pd.DataFrame({'category':lis1, 'review': lis2})

     category              review
0       apple  lorem ipsum review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
5       apple  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review

需要:

lis1= ('orange','strawberry','strawberry','strawberry','orange','orange','orange','strawberry')
lis2= ("lorem ipsum review","lorem ipsum review", "lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review","lorem ipsum review")

pd.DataFrame({'category':lis1, 'review': lis2})

     category              review
0      orange  lorem ipsum review
1  strawberry  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4      orange  lorem ipsum review
5      orange  lorem ipsum review
6      orange  lorem ipsum review
7  strawberry  lorem ipsum review

我需要代码来计算唯一类别 (nunique()) 并删除仅出现少于 3 次的类别。该示例显示,由于 apple 是唯一出现两次的类别,因此应用了列表删除。

标签: pythonpandasdataframe

解决方案


您可以过滤groupbyand的结果transform

df[df.groupby('category')['category'].transform('count').gt(2)]

     category              review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review

另一个解决方案是value_counts+ map

df[df.category.map(df['category'].value_counts()).gt(2)]

     category              review
1      orange  lorem ipsum review
2  strawberry  lorem ipsum review
3  strawberry  lorem ipsum review
4  strawberry  lorem ipsum review
6      orange  lorem ipsum review
7      orange  lorem ipsum review
8      orange  lorem ipsum review
9  strawberry  lorem ipsum review

推荐阅读