pandas - Fast implementation of max value per user pandas
问题描述
Following is a piece of code i'm using, it takes each user and takes a single value for each user, according to a sorting scheme, the problem is that it runs relatively slow to my needs, was wondering if it can be implemented faster:
import pandas as pd
df1 = pd.DataFrame({'user': ['a', 'b', 'c', 'd'],
'user_info': [1, 3, 5, 6]},
columns=['user', 'user_info'])
df2 = pd.DataFrame({'user': ['a', 'b', 'f', 'h'],
'user_info': [3, 5, 5, 6]},
columns=['user', 'user_info'])
data_frames_dict_with_importance_score = {2: df2,
1: df1}
def apply_importance(df, importance):
df['tag_max'] = importance
return df
join_list = ['user', 'user_info']
final_recommendations = pd.concat([apply_importance(df[join_list], importance)
for importance, df in data_frames_dict_with_importance_score.items()])
final_recommendations = final_recommendations.sort_values(['user', 'tag_max'], ascending=False).groupby(
['user'], as_index=False).head(1)
final_recommendations.reset_index(inplace=True)
Any help on that one would be awsome!
解决方案
You can assign the tag_max in a list comprehension then concat with sort_values followed by drop duplicates:
out = pd.concat((v.assign(tag_max=k) for
k,v in data_frames_dict_with_importance_score.items()))\
.sort_values(['user', 'tag_max'], ascending=False).drop_duplicates('user')
Or:
out = pd.concat(data_frames_dict_with_importance_score,names=['tag_max','Index'])\
.reset_index().sort_values(['user', 'tag_max'], ascending=False).drop_duplicates('user')
user user_info tag_max
3 h 6 2
2 f 5 2
3 d 6 1
2 c 5 1
1 b 5 2
0 a 3 2
推荐阅读
- r - 如何在ggplot2中制作累积图层图
- c# - 来自 Microsoft Graph 的用户信息
- php - 如何覆盖laravel默认登录功能
- javascript - Rally SDK:从 Workspace 中的所有项目中提取数据
- c# - Web API HttpGetAttribute 错误
- java - XStream 不会使用具有 CannotResolveClassException 的相同类解组由 XStream 编组的文件
- java - SpringBoot 2 事务传播嵌套不支持
- excel - Excel获取另一个工作表中字符串的位置并从另一个工作表的同一行返回一个值
- jhipster - 如何更改 JHipster 应用程序中的密码策略?
- python - 如何设置模型属性,在 Django 查询中切换布尔值?