首页 > 解决方案 > 如何提出 SVD 建议

问题描述

我正在尝试使用 SVD 根据购买情况进行图书推荐。我正在使用一个矩阵,其中列是书籍,行是用户名和内容,如果他/她购买了有问题的书(如果用户购买了则为 1,如果他没有购买则为 0 [我不知道是否这是最好的选择])。我还创建了一个 cosine_similarity 矩阵,我可以在其中看到基于用户购买的用户之间的相似性。

# Matrix with users as rows, products as columns and count as content
df_matrix = pd.pivot_table(order_df, values='count', index='username', columns='product')
# Replace all NaN contents with 0
df_matrix_dummy = df_matrix.copy().fillna(0)
#Compute the cosine similarity matrix using the dummy matrix
cosine_sim = cosine_similarity(df_matrix_dummy, df_matrix_dummy)
#Convert into pandas dataframe 
cosine_sim = pd.DataFrame(cosine_sim, index=df_matrix.index, columns=df_matrix.index)


reader = Reader()
# Famous SVD algorithm 
algo = SVDpp()
data = Dataset.load_from_df(order_df[['username', 'product', 'count']], reader)
# Divide the data into train and test (80-20)
trainset, testset = train_test_split(data, test_size=0.2)
algo.train(trainset)

由于我是这种技术的新手,我的问题是如何在训练 SVD 后获得推荐列表。

标签: pythonpython-2.7machine-learningsvdrecommender-systems

解决方案


您可以遵循此示例并将其与您的代码集成。

from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

trainset, testset = train_test_split(data, test_size=.25)

algo = SVD()
algo.fit(trainset)
predictions = algo.test(testset)
test = pd.DataFrame(predictions)
test = test.rename(columns={'uid':'userId', 'iid': 'movieId', 
                            'r_ui':'actual', 'est':'prediction'})
cf_model = test.pivot_table(index='userId', 
                            columns='movieId', values='prediction').fillna(0)

def get_users_predictions(user_id, n, model):
    recommended_items = pd.DataFrame(model.loc[user_id])
    recommended_items.columns = ["predicted_rating"]
    recommended_items = recommended_items.sort_values('predicted_rating', ascending=False)    
    recommended_items = recommended_items.head(n)
    return recommended_items.index.tolist()

def get_recs(model, k):
    recs = []
    for user in model.index:
        cf_predictions = get_users_predictions(user, k, model)
        recs.append(cf_predictions)
    return recs    

# Top-10 recommendations for each user
k = 10
recs = get_recs(cf_model, k)
preds = pd.DataFrame(index=cf_model.index)
preds[f'Top-{k} Recommendation'] = recs
preds.head()

输出:

|   userId | Top-10 Recommendation                                                     |
|---------:|:--------------------------------------------------------------------------|
|        1 | ['50', '174', '173', '268', '183', '89', '64', '251', '100', '136']       |
|       10 | ['174', '178', '98', '530', '474', '511', '478', '132', '493', '709']     |
|      100 | ['887', '292', '1235', '268', '1238', '348', '1234', '271', '326', '886'] |
|      101 | ['181', '742', '237', '596', '405', '118', '255', '1028', '717', '928']   |
|      102 | ['50', '98', '96', '187', '168', '79', '530', '435', '228', '185']        |

推荐阅读