r - 当我执行 sapply 时从 return() 语句中获取一个值,当我手动执行语句时获取另一个值
问题描述
SO用户-我正在为电影推荐算法练习编写一些代码。我写了一个 sapply 语句来评估最佳惩罚 (lambda),以减少小组评级的权重。我期望 RMSE 值在 0.8 和 1 之间的输出,但我从我的 sapply 语句中得到的值超过 1000。当我手动执行 RMSE 函数时,我得到的输出符合我的期望。为什么我从 sapply 语句与手动执行相同的 RMSE 函数得到如此截然不同的结果?
# Load packages
if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(caret)) install.packages("caret", repos = "http://cran.us.r-project.org")
if(!require(data.table)) install.packages("data.table", repos = "http://cran.us.r-project.org")
# Determine the average rating across all movies, from all users
mu <- mean(edx_train$rating)
# Generate a sequence of lambdas for tuning, to determine what penalty value produces the least RMSE with training data
lambdas <- seq(0, 10, 0.25)
# Test various values of lambda
rmses <- sapply(lambdas, function(l){
# Determine how each movie's average rating deviates from the average for all movies (movie effect)
b_i <- edx_train %>%
group_by(movieId) %>%
summarize(b_i = sum(rating - mu)/(n()+l))
# Join the movie effect data back to the training data and determine user effect
b_u <- edx_train %>%
left_join(b_i, by="movieId") %>%
group_by(userId) %>%
summarize(b_u = sum(rating - b_i - mu/(n()+l)))
# Join movie effect and user effect data to the partitioned test data (from the training data set)
predicted_ratings <-
edx_test %>%
left_join(b_i, by = "movieId") %>%
left_join(b_u, by = "userId") %>%
mutate(pred = mu + b_i + b_u) %>%
.$pred
# Identify any N/A values resulting from cases in the test data where there is no corresponding training data
# from which to base a prediction
na_index <- which(is.na(predicted_ratings))
# Create data set for predicted ratings of test data, removing any N/A values
predicted_ratings <- predicted_ratings[-na_index]
# Remove any test records that correspond with the predictions identified as having N/A values
edx_test <- edx_test[-na_index]
# Evaluate RMSE of remaining predictions and corresponding actual ratings from the test data
return(RMSE(edx_test$rating, predicted_ratings))
})
edx_test 和 edx_train 数据集的样本:
edx_test_ex <- structure(
list(userId = c(3L, 3L, 3L, 4L, 4L, 4L),
movieId = c(7155, 8783, 27821, 39, 153, 344),
rating = c(3.5, 5, 4.5, 3, 5, 2),
timestamp = c(1164885564L, 1136075857L, 1136418616L, 844417037L,844416699L, 844416699L),
title = c("Calendar Girls (2003)",
"Village, The (2004)",
"Interpreter, The (2005)",
"Clueless (1995)",
"Batman Forever (1995)",
"Ace Ventura: Pet Detective (1994)"),
genres = c("Comedy",
"Drama|Mystery|Thriller",
"Drama|Thriller",
"Comedy|Romance",
"Action|Adventure|Comedy|Crime",
"Comedy")),
row.names = c(NA, -6L),
class = "data.frame"
)
edx_train_ex <- structure(
list(userId = c(1L, 1L, 1L, 1L, 1L, 1L),
movieId = c(377, 520, 539, 588, 589, 594),
rating = c(5, 5, 5, 5, 5, 5),
timestamp = c(838983834L, 838984679L, 838984068L, 838983339L, 838983778L, 838984679L),
title = c("Speed (1994)",
"Robin Hood: Men in Tights (1993)",
"Sleepless in Seattle (1993)",
"Aladdin (1992)",
"Terminator 2: Judgment Day (1991)",
"Snow White and the Seven Dwarfs (1937)"),
genres = c("Action|Romance|Thriller",
"Comedy",
"Comedy|Drama|Romance",
"Adventure|Animation|Children|Comedy|Musical",
"Action|Sci-Fi",
"Animation|Children|Drama|Fantasy|Musical")),
row.names = c(NA, -6L),
class = "data.frame"
)
我收到一系列错误消息:“在 true_ratings - predict_ratings 中:较长的对象长度不是较短对象长度的倍数”但我已经确认真实评分和预测评分输出的长度相同。正如我所提到的,当我突出显示并仅运行该行时,RMSE 函数工作正常
RMSE(edx_test$rating, predicted_ratings)
这是我的第一篇文章,如果我提供的信息过多、信息不足或其他内容迟钝,请见谅。
琥珀色
解决方案
推荐阅读
- python - 尝试在应用程序之间使用 pyserial,但我得到了终端
- java - Java Spring 国际化(i18n)问题
- angular - 无法将 Angular Google Maps (AGM) 添加到 Jhipster
- pandas - 如何在另一个数据帧上通过类似 vlookup 的过程创建新的 pandas 列
- python - Pyautogui - 需要按住 shift 并单击
- angularjs - AngularJS 控制器不与 HTML 绑定
- asp.net - TransferRequestHandler 性能问题
- postgresql - 用于 spark/hadoop 的 Postgres 适配器增加了什么价值?
- c# - 为什么会抛出 SafeArrayTypeMismatchException?
- javascript - chart.js - 鼠标悬停时重新启动值