python - 在 Pandas 中使用 groupby、聚合函数创建多列计算
问题描述
import pandas as pd
df = pd.DataFrame({'zip,company': ["46062|A","11236|B","11236|C","11236|C","11236|C","11236|A","11236|A","11236|A","11236|B","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A"],
'goodbadscore': ["good","bad","bad","good","good","bad","bad","good","good","good","bad","good","good","good","good","bad","bad","good"],
'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
)
print(df)
-----作为建议更新了上面的示例数据框-----
我试图在 Excel 中生成结果,但是使用 countif 和 countifs 会破坏我的桌面,即使它很好,也需要几分钟才能完成任务。希望能得到一些帮助和指导。
这是我试图实现的目标:
我想根据收集的数据在几个邮政编码中对公司的声誉进行评分。生产所需的列:
- countin邮政编码
- countgoodscoreinzip
- dividegoodscore%(2/1)
- 排行
我能够生产1:
op = df.groupby(['zip+company'])['zip+company'].count()
对2有困难:想保持输出为 1,但应用后变为 0。只想显示第 2 列的优点
op = op.groupby(['zip+company'])[['zip+company','countgoodscoreunderzip']].apply(lambda x: x[x=='good'].count())
然后3,我想这是选择2并除以1的问题
4还不知道如何在 pandas 中排名,这可能是一个简单的排名
excel 的图片是理想的输出(使用示例数据框更新)。
感谢您的阅读。
解决方案
命名聚合应该有助于前两列:
op = df.groupby('zip,company', as_index=False).aggregate(
countinzipcode=('zip,company', 'count'),
goodscoreinzip=('goodbadscore', lambda s: s.eq('good').sum())
)
op
:
zip,company countinzipcode goodscoreinzip
0 11236|A 7 4
1 11236|B 5 3
2 11236|C 3 2
3 46062|A 3 2
可以使用简单的数学运算来获得 3 的百分比:
op['goodscore%'] = op['goodscoreinzip'] / op['countinzipcode'] * 100
zip,company countinzipcode goodscoreinzip goodscore%
0 11236|A 7 4 57.142857
1 11236|B 5 3 60.000000
2 11236|C 3 2 66.666667
3 46062|A 3 2 66.666667
然后rank
可用于获得 4 的排名:
op['ranking'] = op['goodscore%'].rank(ascending=False, method='dense').astype(int)
op
:
zip,company countinzipcode goodscoreinzip goodscore% ranking
0 11236|A 7 4 57.142857 3
1 11236|B 5 3 60.000000 2
2 11236|C 3 2 66.666667 1
3 46062|A 3 2 66.666667 1
使用的示例数据(基于图像中的数字而不是代码构造函数):
df = pd.DataFrame({
'zip,company': ["46062|A", "11236|B", "11236|C", "11236|C",
"11236|C", "11236|A", "11236|A", "11236|A",
"11236|B", "11236|B", "11236|A", "11236|A",
"11236|B", "11236|A", "11236|A", "11236|B",
"46062|A", "46062|A"],
'goodbadscore': ["good", "bad", "bad", "good", "good", "bad",
"bad", "good", "good", "good", "bad",
"good", "good", "good", "good", "bad",
"bad", "good"],
'postlcode': ["46062", "11236", "11236", "11236", "11236",
"46062", "11236", "46062", "11236", "11236",
"11236", "11236", "11236", "11236", "11236",
"11236", "11236", "11236"],
'companyname': ["A", "B", "C", "C", "C", "A", "A", "A", "B",
"B", "A", "A", "B", "A", "A", "B", "A", "A"]
})
zip,company goodbadscore postlcode companyname
0 46062|A good 46062 A
1 11236|B bad 11236 B
2 11236|C bad 11236 C
3 11236|C good 11236 C
4 11236|C good 11236 C
5 11236|A bad 46062 A
6 11236|A bad 11236 A
7 11236|A good 46062 A
8 11236|B good 11236 B
9 11236|B good 11236 B
10 11236|A bad 11236 A
11 11236|A good 11236 A
12 11236|B good 11236 B
13 11236|A good 11236 A
14 11236|A good 11236 A
15 11236|B bad 11236 B
16 46062|A bad 11236 A
17 46062|A good 11236 A
推荐阅读
- android - 无法编辑文本类型密码
- concurrency - Google Cloud Functions 和 Sheets v4 API 之间的并发问题
- python - 重建两个最初是浮点数的(字符串连接的)数字
- c# - 需要帮助将自定义着色器转换为 URP
- javascript - 文件系统访问 API:是否可以存储已保存或已加载文件的文件句柄以供以后使用?
- amazon-web-services - AWS Kubernetes 集群在可用状态下创建额外的 ebs 卷
- docker - 区分 Docker Google Cloud Logging 驱动程序中的 STDOUT 和 STDERR
- r - 在闪亮的应用程序中使用滑块和操作按钮
- android - 如何检查 Number TextEdit 是否为空?
- arduino-c++ - 内部具有不同类型的类型的数组