首页 > 解决方案 > How to group items into buckets of 1-10?

问题描述

I am testing a very basic line of code.

modDF['RatingDecile'] = pd.cut(modDF['RatingScore'], 10)

This gives me ranges of rating scores in 10 buckets. Instead of the range, how can I see 1, 2, 3, etc., up to 10?

So, instead of this.

      Score RatingQuantile  
0     (26.3, 29.0]  
6     (23.6, 26.3]  
7     (23.6, 26.3]  
8     (26.3, 29.0]  
10    (18.2, 20.9]  
       ...       ...  
9763  (23.6, 26.3]  
9769  (20.9, 23.6]  
9829  (20.9, 23.6]  
9889  (23.6, 26.3]  
9949  (20.9, 23.6] 

How can I get something like this?

      Score RatingQuantile  
0     10  
6     8 
7     8 
8     10  
10    6  
       ...      ...  
9763  8  
9769  5  
9829  5 
9889  5  
9949  5 

I tried this.

modDF['DecileRank'] = pd.qcut(modDF['RatingScore'],10,labels=False)

I got this error.

ValueError: Bin edges must be unique: array([ 2., 20., 25., 27., 27., 27., 27., 27., 27., 27., 29.]).
You can drop duplicate edges by setting the 'duplicates' kwarg

The error makes sense to me. I just don't know the work-around for this issue. Thoughts?

标签: pythonpython-3.xpandasdataframe

解决方案


qcut()如果通过一个系列,我没有问题。我假设您的数据看起来像我正在使用的数据。

import pandas as pd
import numpy as np
data = {'values':np.random.randint(1,30,size=1000)}
df = pd.DataFrame(data)
df['ranks'] = pd.qcut(df['values'],10,labels=False)
print(df)

输出:

     values  ranks
0        18      5
1        22      7
2         5      1
3        12      3
4        14      4
..      ...    ...
995      22      7
996      13      4
997      26      8
998       3      0
999      22      7

groupby()之后您可以使用或其他一系列功能检查简单操作(例如垃圾箱的限制) :

df_info = df.groupby('ranks').agg(
        min_score=pd.NamedAgg(column='values',aggfunc='min'),
        max_score=pd.NamedAgg(column='values',aggfunc='max'),
        count_cases=pd.NamedAgg(column='values',aggfunc='count'))
print(df_info)

输出:

       min_score  max_score  count_cases
ranks                                   
0              1          3          137
1              4          5           72
2              6          8          105
3              9         11           96
4             12         14           98
5             15         17          107
6             18         20           91
7             21         23           99
8             24         27          121
9             28         29           74

推荐阅读