首页 > 解决方案 > 使用余弦相似度获取带有潜在客户段落和 web_url 的 NewsId

问题描述

我正在尝试使用 cosine Similarity 获得推荐的 news_id 及其 lead_paragraph 和 web_url 。我的数据集有 3 列 news_id、lead_paragraph、web_url

以下代码仅返回推荐的 news_id

import pandas as pd 
import numpy 
ds = pd.read_csv("nytimes.csv") 

from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')


ds['lead_paragraph'] = ds['lead_paragraph'].fillna('')


tfidf_matrix = tfidf.fit_transform(ds['lead_paragraph'])

from sklearn.metrics.pairwise import linear_kernel


cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

indices = pd.Series(ds.index, index=ds['_id']).drop_duplicates()

def get_recommendations(id, cosine_sim=cosine_sim):

    idx = indices[id]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)


   sim_scores = sim_scores[1:11]


  movie_indices = [i[0] for i in sim_scores]


return ds['_id'].iloc[movie_indices]

get_recommendations('4fd2b43e8eb7c8105d8a67e8')

如何通过列表中的lead_paragraph 和web_url 获得推荐的news_id?

标签: pythonpandascosine-similarity

解决方案


推荐阅读