python - How to make a new dataframe to store the average values of the original dataframe's columns' bins?
问题描述
Say I have a dataframe, df
:
>>> df
Age Score
19 1
20 2
24 3
19 2
24 3
24 1
24 3
20 1
19 1
20 3
22 2
22 1
I want to construct a new dataframe that bins Age
and stores their average scores of the bins in Score
:
Age Score
19-21 1.6667
22-24 2.1667
This is my way of doing it, which I feel is kind of convoluted:
import numpy as np
import pandas as pd
data = pd.DataFrame(columns=['Age', 'Score'])
data['Age'] = [19,20,24,19,24,24,24,20,19,20,22,22]
data['Score'] = [1,2,3,2,3,1,3,1,1,3,2,1]
_, bins = np.histogram(data['Age'], 2)
df1 = data[data['Age']<int(bins[1])]
df2 = data[data['Age']>int(bins[1])]
new_df = pd.DataFrame(columns=['Age', 'Score'])
new_df['Age'] = [str(int(bins[0]))+'-'+str(int(bins[1])), str(int(bins[1]))+'-'+str(int(bins[2]))]
new_df['Score'] = [np.mean(df1.Score), np.mean(df2.Score)]
Apart from being lengthy, this way doesn't scale well for more bins (as we'd need to write each entry for each bin in new_df
).
Is there a more efficient, clean way of doing this?
解决方案
用于cut
将 bin 值转换为离散间隔,最后聚合mean
:
bins = [19, 21, 24]
#dynamically create labels
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
labels[0] = '{}-{}'.format(bins[0], bins[1])
print (labels)
['19-21', '22-24']
binned = pd.cut(data['Age'], bins=bins, labels=labels, include_lowest=True)
df = data.groupby(binned)['Score'].mean().reset_index()
print (df)
Age Score
0 19-21 1.666667
1 22-24 2.166667
推荐阅读
- angularjs - 我们如何在更改下拉菜单时获得兄弟姐妹的输入值
- service-fabric-on-premises - ServiceFabric 中的反向代理是否可配置?
- php - How to use class property in another class ?
- c# - 添加视图时如何解决“找不到名为'ScaffoldingAssembly'的指令的处理器'Scaffolding Assembly Loader'”?
- r - get 函数如何以不同的方式评估带引号和不带引号的参数 ' '
- web-scraping - 为什么相同的 URL 给出不同的结果?
- docker - 在 docker 中在基于 Alpine 的映像上安装 Redis
- docker - 如何在 docker swarm 模式下使用 express api 网关?
- dependency-injection - 如何在 SAPUI5 中注入自定义服务
- html - 如何将内容元素与 wordpress 自定义 css 居中对齐?