python - 从分类数据类型列中提取平均值
问题描述
例如有这个表:
list_1=[['1','y',474.0],
['2','n',482.0],
['3','n',564.0,],
['4','y',549.0,],
['5','y',551.0,],
['6','y',555.0,],
['7','n',600.0,],
['8','y',357.0,],
['9','y',542.0,],
['10','n',462.0,],
['11','n',513.0,],
['12','y',526.0,]]
labels=['id','order_?','hours_spend']
df=pd.DataFrame(list_1,columns=labels)
df
结果:
id order_? hours_spend
0 1 y 474.0
1 2 n 482.0
2 3 n 564.0
3 4 y 549.0
4 5 y 551.0
5 6 y 555.0
6 7 n 600.0
7 8 y 357.0
8 9 y 542.0
9 10 n 462.0
10 11 n 513.0
11 12 y 526.0
我将使用不带标签的 NTILE 方法将 hours_spend 列分为 3 组:
df['ntile']=pd.qcut(df['hours_spend'],3)
df
结果:
id order_? hours_spend ntile
0 1 y 474.0 (356.999, 502.667]
1 2 n 482.0 (356.999, 502.667]
2 3 n 564.0 (549.667, 600.0]
3 4 y 549.0 (502.667, 549.667]
4 5 y 551.0 (549.667, 600.0]
5 6 y 555.0 (549.667, 600.0]
6 7 n 600.0 (549.667, 600.0]
7 8 y 357.0 (356.999, 502.667]
8 9 y 542.0 (502.667, 549.667]
9 10 n 462.0 (356.999, 502.667]
10 11 n 513.0 (502.667, 549.667]
11 12 y 526.0 (502.667, 549.667]
现在我有数据类型为“类别”的列“ntile”:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 4 columns):
id 12 non-null object
order_? 12 non-null object
hours_spend 12 non-null float64
ntile 12 non-null category
dtypes: category(1), float64(1), object(2)
memory usage: 556.0+ bytes
如何在“ntile”列中添加具有范围平均值的新列?
解决方案
您可以简单地将其定义为:
df['mean_ntile'] = (df['ntile'].apply(lambda x: x.left).astype(int) + df['ntile'].apply(lambda x: x.right).astype(int))/2
print(df)
输出:
id order_? hours_spend ntile mean_ntile
0 1 y 474.0 (356.999, 502.667] 429.0
1 2 n 482.0 (356.999, 502.667] 429.0
2 3 n 564.0 (549.667, 600.0] 574.5
3 4 y 549.0 (502.667, 549.667] 525.5
4 5 y 551.0 (549.667, 600.0] 574.5
5 6 y 555.0 (549.667, 600.0] 574.5
6 7 n 600.0 (549.667, 600.0] 574.5
7 8 y 357.0 (356.999, 502.667] 429.0
8 9 y 542.0 (502.667, 549.667] 525.5
9 10 n 462.0 (356.999, 502.667] 429.0
10 11 n 513.0 (502.667, 549.667] 525.5
11 12 y 526.0 (502.667, 549.667] 525.5
正如@ALlolz 建议的那样,一种更简单的方法是:
df['mean_ntile'] = df['ntile'].apply(lambda x: x.mid)
推荐阅读
- python - 关于 Ubuntu 18.04 LTS 和 python 的问题
- python - Python:如何反转列表中子列表的顺序?
- swift - 在 swift 中从主线程访问后,不得从后台线程对布局引擎进行修改
- ios - ios中的WebView以桌面模式打开
- json - Hibernate 不支持 JSON 的双向 @ManytoMany 映射?
- android - 无法解决 DataBinding 的 AcitivityMainBinding
- gstreamer-1.0 - Gstreamer 动态管道:HDMI 摄像头视频录制上的摄像头预览
- c# - 条纹每日付款和发送发票问题
- javascript - 如果我有多个要匹配的数字,我如何获得输出,仅使用 If else 语句并在正确的情况下获取输出:- Java 脚本
- java - Java IO读取流的区别