Max_val" ),python-3.x,pandas,binning"/>

首页 > 解决方案 > Binning with pd.Cut Beyond range(replacing Nan with "Max_val" )

问题描述

df= pd.DataFrame({'days': [0,31,45,35,19,70,80 ]})
df['range'] = pd.cut(df.days, [0,30,60])    
df

Here as code is reproduced , where pd.cut is used to convert a numerical column to categorical column . pd.cut usually gives category as per the list passed [0,30,60]. In this row's 0 , 5 & 6 categorized as Nan which is beyond the [0,30,60]. what i want is 0 should categorized as <0 & 70 should categorized as >60 and similarly 80 should categorized as >60 respectively, If possible dynamic text labeling of A,B,C,D,E depending on no of category created. Expected Output

标签: python-3.xpandasbinning

解决方案


For the first part, adding -np.inf and np.inf to the bins will ensure that everything gets a bin:

In [5]: df= pd.DataFrame({'days': [0,31,45,35,19,70,80]})
   ...: df['range'] = pd.cut(df.days, [-np.inf, 0, 30, 60, np.inf])
   ...: df
   ...:
Out[5]:
   days         range
0     0   (-inf, 0.0]
1    31  (30.0, 60.0]
2    45  (30.0, 60.0]
3    35  (30.0, 60.0]
4    19   (0.0, 30.0]
5    70   (60.0, inf]
6    80   (60.0, inf]

For the second, you can use .cat.codes to get the bin index and do some tweaking from there:

In [8]: df['range'].cat.codes.apply(lambda x: chr(x + ord('A')))
Out[8]:
0    A
1    C
2    C
3    C
4    B
5    D
6    D
dtype: object

推荐阅读