pandas - Pandas 选择行并根据另一列获取最高列值
问题描述
我有数据集,我想根据每个“路径”中的“主题”获得最高分。基本上,我正在努力为学生获得高分。
l1 = ['/some/path/A','/some/path/A','/some/path/A','/some/path/2','/some/path/2']
l2 = ['paper1','paper1','paper1','paper2','paper2']
l3 = ['Tree','Tree','Axe','Leaf','Leaf']
l4 = [.68,.90,.80,.65,.30]
df = pd.DataFrame(
{'path': l1,
'paper_name': l2,
'topic': l3,
'score':l4
})
df
当前df
path paper_name topic score
/some/path/A paper1 Tree 0.68
/some/path/A paper1 Tree 0.90
/some/path/A paper1 Axe 0.80
/some/path/2 paper2 Leaf 0.65
/some/path/2 paper2 Leaf 0.30
我试过这个逻辑:
df[df['topic']=='Tree' & df['score'].max()] # hard coding. need something more simpler not hard coded like the way I am doing it. It also errors out because of data types. How do get this?
需要的输出:
path paper_name topic score
/some/path/A paper1 Tree 0.90
/some/path/A paper1 Axe 0.80
/some/path/2 paper2 Leaf 0.65
解决方案
你可以sort
和drop_duplicates
df_final = df.sort_values('score').drop_duplicates('topic', keep='last').sort_index()
Out[76]:
paper_name path score topic
1 paper1 /some/path/A 0.90 Tree
2 paper1 /some/path/A 0.80 Axe
3 paper2 /some/path/2 0.65 Leaf
或者groupby.idxmax
切片
df_final = df.loc[df.groupby('topic', sort=False).score.idxmax()]
Out[82]:
paper_name path score topic
1 paper1 /some/path/A 0.90 Tree
2 paper1 /some/path/A 0.80 Axe
3 paper2 /some/path/2 0.65 Leaf
推荐阅读
- c# - OrmLite 不对连接应用软删除过滤器
- multithreading - 如何序列化异步 pthreads 代码段
- javascript - else 条件不适用于 jquery 中的 src 属性
- vba - VBA:从 HTTPS 下载数据
- javascript - 我想将函数链接到变量
- python - numpy/scipy 中的重复列
- azure - 遇到混合连接异常:所有连接的侦听器均未在允许的超时时间内接受连接
- android - FirebaseRecyclerAdapter 不显示来自 notifyDataSetChanged() 的更改
- javascript - 在 HTML 文件的标头中的源源下运行 JavaScript 文件
- javascript - 将字符串导出到文件中而不解释 `${process.env.REACT_APP_ENV}`