python - 如何评价混合系统推荐器?
问题描述
我正在使用 MovieLens 数据构建一个混合系统推荐器——更准确地说,我首先制作基于内容的模型,然后是协同过滤方法。最后,采用混合方法,我首先运行基于内容的过滤并确定我们想要向用户推荐的电影,然后使用 SVD 预测评级对 CF 的推荐进行过滤和排序。
我也在尝试评估所有模型,为此,我正在计算命中率。但是,对于我能够获得的单个模型,我不确定如何使用混合方法来计算它,这是否有意义?任何帮助是极大的赞赏!
先感谢您!
这是我所拥有的(注释行是我对命中率和混合模型的试验):
user_id = 50
df_movies=movies
def hybrid_content_svd_model(userId):
"""
hydrid the functionality of content based and svd based model to recommend user top 10 movies.
:param userId: userId of user
:return: list of movies recommended with rating given by svd model
"""
recommended_movies_by_content_model = get_recommendation_content_model(userId)
recommended_movies_by_content_model = df_movies[df_movies.apply(lambda movie: movie["title"] in recommended_movies_by_content_model, axis=1)]
for key, columns in recommended_movies_by_content_model.iterrows():
predict = svd.predict(userId, columns["movieId"])
recommended_movies_by_content_model.loc[key, "svd_rating"] = predict.est
# #count=recommended_movies_by_content_model[(recommended_movies_by_content_model['svd_rating'])>=3]['movieId'].count()
#total=recommended_movies_by_content_model.shape[0]
#hit_ratio= count/total
return recommended_movies_by_content_model.sort_values("svd_rating", ascending=False).iloc[0:11]
hybrid_content_svd_model(user_id)
以下是我计算 CF 命中率的方法:
def evaluation_collaborative_svd_model(userId,userOrItem):
"""
hydrid the functionality of Collaborative based and SVD based model to see if ratings of predicted movies
:param userId: userId of user, userOrItem is a boolean value if True it is User-User and if false Item-Item
:return: dataframe of movies and ratings
"""
movieIdsList= list()
movieRatingList=list()
movieIdRating= pd.DataFrame(columns=['movieId','rating'])
if userOrItem== True:
movieIdsList=getRecommendedMoviesAsperUserSimilarity(userId)
else:
movieIdsList=recommendedMoviesAsperItemSimilarity(user_id)
for movieId in movieIdsList:
predict = svd.predict(userId, movieId)
movieRatingList.append([movieId,predict.est])
movieIdRating = pd.DataFrame(np.array(movieRatingList), columns=['movieId','rating'])
count=movieIdRating[(movieIdRating['rating'])>=3]['movieId'].count()
total=movieIdRating.shape[0]
hit_ratio= count/total
return hit_ratio