首页 > 解决方案 > 如何为每个类别分年龄列

问题描述

import numpy as np 
import pandas as pd

df = pd.DataFrame({
    'age':np.random.choice( [12,15,17,95,13], 20),
    'category':np.random.choice(['A','B','C', 'D'], 20)
    })

Category Age
A        12
A        95
B        17
B        14
D        12
C        14
B        16

考虑到类别,我想对年龄值进行分类。假设,我对类别 A,取其最小值和最大值,然后找到 bin。如何找到不同类别的垃圾箱?我用这个作为整列的行bins = np.linspace(df[col_name].min(), df[col_name].max(), 11)。然后像这样分组grp = df.groupby(pd.cut(df[col_name], bins))

标签: pythonpython-3.xpandasdata-analysis

解决方案


第一种方法可以是:

def bin_age(sr):
    start = sr.min()
    stop = sr.max()
    num = 11
    bins = list(np.linspace(start, stop, num)) if len(sr) > 1 else [start]
    bins = [-np.inf] + bins + [np.inf]
    return pd.cut(sr, bins=bins, include_lowest=True)

df['Bins'] = df.groupby('Category')['Age'].apply(bin_age)

输出:

>>> df
  Category  Age           Bins
0        A   12   (-inf, 12.0]
1        A   95   (86.7, 95.0]
2        B   17   (16.7, 17.0]
3        B   14   (-inf, 14.0]
4        D   12  (11.999, inf]
5        C   14  (13.999, inf]
6        B   16   (15.8, 16.1]

推荐阅读