首页 > 解决方案 > 当我执行 sapply 时从 return() 语句中获取一个值,当我手动执行语句时获取另一个值

问题描述

SO用户-我正在为电影推荐算法练习编写一些代码。我写了一个 sapply 语句来评估最佳惩罚 (lambda),以减少小组评级的权重。我期望 RMSE 值在 0.8 和 1 之间的输出,但我从我的 sapply 语句中得到的值超过 1000。当我手动执行 RMSE 函数时,我得到的输出符合我的期望。为什么我从 sapply 语句与手动执行相同的 RMSE 函数得到如此截然不同的结果?

# Load packages
if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(caret)) install.packages("caret", repos = "http://cran.us.r-project.org")
if(!require(data.table)) install.packages("data.table", repos = "http://cran.us.r-project.org")

# Determine the average rating across all movies, from all users  
mu <- mean(edx_train$rating)

# Generate a sequence of lambdas for tuning, to determine what penalty value produces the least RMSE with training data
lambdas <- seq(0, 10, 0.25)

# Test various values of lambda
rmses <- sapply(lambdas, function(l){
  # Determine how each movie's average rating deviates from the average for all movies (movie effect)
  b_i <- edx_train %>% 
    group_by(movieId) %>%
    summarize(b_i = sum(rating - mu)/(n()+l))
  # Join the movie effect data back to the training data and determine user effect
  b_u <- edx_train %>% 
    left_join(b_i, by="movieId") %>%
    group_by(userId) %>%
    summarize(b_u = sum(rating - b_i - mu/(n()+l)))
  # Join movie effect and user effect data to the partitioned test data (from the training data set)
  predicted_ratings <- 
    edx_test %>% 
    left_join(b_i, by = "movieId") %>%
    left_join(b_u, by = "userId") %>%
    mutate(pred = mu + b_i + b_u) %>%
    .$pred

  # Identify any N/A values resulting from cases in the test data where there is no corresponding training data 
  # from which to base a prediction
  na_index <- which(is.na(predicted_ratings))
  # Create data set for predicted ratings of test data, removing any N/A values
  predicted_ratings <- predicted_ratings[-na_index]
  # Remove any test records that correspond with the predictions identified as having N/A values
  edx_test <- edx_test[-na_index]

  # Evaluate RMSE of remaining predictions and corresponding actual ratings from the test data
  return(RMSE(edx_test$rating, predicted_ratings))
})

edx_test 和 edx_train 数据集的样本:

edx_test_ex <- structure(
    list(userId = c(3L, 3L, 3L, 4L, 4L, 4L),
         movieId = c(7155, 8783, 27821, 39, 153, 344),
         rating = c(3.5, 5, 4.5, 3, 5, 2),
         timestamp = c(1164885564L, 1136075857L, 1136418616L, 844417037L,844416699L, 844416699L),
         title = c("Calendar Girls (2003)",
                   "Village, The (2004)",
                   "Interpreter, The (2005)",
                   "Clueless (1995)",
                   "Batman Forever (1995)",
                   "Ace Ventura: Pet Detective (1994)"),
         genres = c("Comedy",
                    "Drama|Mystery|Thriller",
                    "Drama|Thriller", 
                    "Comedy|Romance",
                    "Action|Adventure|Comedy|Crime",
                    "Comedy")),
    row.names = c(NA, -6L),
    class = "data.frame"
)

edx_train_ex <- structure(
    list(userId = c(1L, 1L, 1L, 1L, 1L, 1L),
         movieId = c(377, 520, 539, 588, 589, 594),
         rating = c(5, 5, 5, 5, 5, 5),
         timestamp = c(838983834L, 838984679L, 838984068L, 838983339L, 838983778L, 838984679L), 
         title = c("Speed (1994)",
                   "Robin Hood: Men in Tights (1993)",
                   "Sleepless in Seattle (1993)",
                   "Aladdin (1992)",
                   "Terminator 2: Judgment Day (1991)",
                   "Snow White and the Seven Dwarfs (1937)"),
         genres = c("Action|Romance|Thriller",
                    "Comedy",
                    "Comedy|Drama|Romance",
                    "Adventure|Animation|Children|Comedy|Musical",
                    "Action|Sci-Fi",
                    "Animation|Children|Drama|Fantasy|Musical")),
    row.names = c(NA, -6L),
    class = "data.frame"
)

我收到一系列错误消息:“在 true_ratings - predict_ratings 中:较长的对象长度不是较短对象长度的倍数”但我已经确认真实评分和预测评分输出的长度相同。正如我所提到的,当我突出显示并仅运行该行时,RMSE 函数工作正常 RMSE(edx_test$rating, predicted_ratings)

这是我的第一篇文章,如果我提供的信息过多、信息不足或其他内容迟钝,请见谅。

琥珀色

标签: rdplyrmagrittr

解决方案


推荐阅读