首页 > 解决方案 > 将基于训练集的两步预测模型应用于测试集

问题描述

我试图让 R 将基于训练数据建模的 2 步预测应用于测试集。我想预测是否有支出,使用逻辑回归(基于训练数据的模型,应用于测试),然后,如果有支出,它有多大,使用从训练数据创建的线性模型有支出的行数据。支出编码为“1”。我觉得两者都可以通过索引命令或 ifelse 语句来完成,但我无法完全弄清楚。任何帮助表示赞赏。

下面的示例代码

library(pROC)
library(dplyr)
library(tidyverse)
set.seed(1)
#Training Set
train_a <- runif(100)
train_b <- runif(100, min=2, max=8)
train_c <- runif(100, min=25, max = 75)
train_payout_indicator <- sample(c(0,1),100, replace = TRUE)
train_df <- data.frame(train_a,train_b,train_c,train_payout_indicator)
train_df$payout <- ifelse(train_payout_indicator==0,0,runif(50,200,350))

#Create logistic regression probabilities based on the training set
sample_logistic <- glm(formula = train_payout_indicator~.-payout, data=train_df, family=binomial)
sample_log_probs <-predict(sample_logistic, type="response")

#Build linear model for amount of payout, based only on rows from training set for which there are a payout

subset_train_df <- train_df %>% filter(train_payout_indicator == 1)
payout_amount_lm <-lm(payout~train_a+train_b+train_c,  data=subset_train_df)

#Create a test set

test_a <- runif(100)
test_b <- runif(100, min=2, max=8)
test_c <- runif(100, min=25, max = 75)
test_payout_indicator <- rep(0,100)
test_payout <- rep(0,100)
test_df <- data.frame(test_a,test_b,test_c)
# Apply the logistic regression model based on the training data to the test data to predict if there is payout - How to do this?
# Then apply the lm model based on the training data to the test data, but only to the rows for which it has been predicted that there would be a payout..

标签: rlinear-regressionlogistic-regression

解决方案


推荐阅读