首页 > 解决方案 > How to make a new dataframe to store the average values of the original dataframe's columns' bins?

问题描述

Say I have a dataframe, df:

>>> df

Age    Score
19     1
20     2
24     3
19     2
24     3
24     1
24     3
20     1
19     1
20     3
22     2
22     1

I want to construct a new dataframe that bins Age and stores their average scores of the bins in Score:

Age       Score
19-21     1.6667
22-24     2.1667

This is my way of doing it, which I feel is kind of convoluted:

import numpy as np
import pandas as pd

data = pd.DataFrame(columns=['Age', 'Score'])
data['Age'] = [19,20,24,19,24,24,24,20,19,20,22,22]
data['Score'] = [1,2,3,2,3,1,3,1,1,3,2,1]

_, bins = np.histogram(data['Age'], 2)

df1 = data[data['Age']<int(bins[1])]
df2 = data[data['Age']>int(bins[1])]

new_df = pd.DataFrame(columns=['Age', 'Score'])
new_df['Age'] = [str(int(bins[0]))+'-'+str(int(bins[1])), str(int(bins[1]))+'-'+str(int(bins[2]))]
new_df['Score'] = [np.mean(df1.Score), np.mean(df2.Score)]

Apart from being lengthy, this way doesn't scale well for more bins (as we'd need to write each entry for each bin in new_df).

Is there a more efficient, clean way of doing this?

标签: pythonpandasdataframegroupingbinning

解决方案


用于cut将 bin 值转换为离散间隔,最后聚合mean

bins = [19, 21, 24]
#dynamically create labels
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])] 
labels[0] = '{}-{}'.format(bins[0], bins[1])
print (labels)
['19-21', '22-24']

binned = pd.cut(data['Age'], bins=bins, labels=labels, include_lowest=True)
df = data.groupby(binned)['Score'].mean().reset_index()
print (df)
     Age     Score
0  19-21  1.666667
1  22-24  2.166667

推荐阅读