r - 将基于训练集的两步预测模型应用于测试集
问题描述
我试图让 R 将基于训练数据建模的 2 步预测应用于测试集。我想预测是否有支出,使用逻辑回归(基于训练数据的模型,应用于测试),然后,如果有支出,它有多大,使用从训练数据创建的线性模型有支出的行数据。支出编码为“1”。我觉得两者都可以通过索引命令或 ifelse 语句来完成,但我无法完全弄清楚。任何帮助表示赞赏。
下面的示例代码
library(pROC)
library(dplyr)
library(tidyverse)
set.seed(1)
#Training Set
train_a <- runif(100)
train_b <- runif(100, min=2, max=8)
train_c <- runif(100, min=25, max = 75)
train_payout_indicator <- sample(c(0,1),100, replace = TRUE)
train_df <- data.frame(train_a,train_b,train_c,train_payout_indicator)
train_df$payout <- ifelse(train_payout_indicator==0,0,runif(50,200,350))
#Create logistic regression probabilities based on the training set
sample_logistic <- glm(formula = train_payout_indicator~.-payout, data=train_df, family=binomial)
sample_log_probs <-predict(sample_logistic, type="response")
#Build linear model for amount of payout, based only on rows from training set for which there are a payout
subset_train_df <- train_df %>% filter(train_payout_indicator == 1)
payout_amount_lm <-lm(payout~train_a+train_b+train_c, data=subset_train_df)
#Create a test set
test_a <- runif(100)
test_b <- runif(100, min=2, max=8)
test_c <- runif(100, min=25, max = 75)
test_payout_indicator <- rep(0,100)
test_payout <- rep(0,100)
test_df <- data.frame(test_a,test_b,test_c)
# Apply the logistic regression model based on the training data to the test data to predict if there is payout - How to do this?
# Then apply the lm model based on the training data to the test data, but only to the rows for which it has been predicted that there would be a payout..
解决方案
推荐阅读
- python - 我的 Django 密码加密不起作用
- postgresql - SQLAlchemy中的自动递增非主列
- c# - 为什么我的表单验证器不验证我的表单?
- python - 如何使用 Python Selenium Webdriver 选择此元素?
- django - 我实际上不明白为什么 {%if%} {%else%} 在那里不起作用。如果作者和用户名相等,我只需要渲染它
- android - 如何模仿长按工具栏操作项的类似 Toast 的视图?
- python - 从集合中弹出多个元素
- python - Python API 调用+ for 循环
- php - PHP:排序对象数组
- python - 从 .txt 文件中获取数据并将其排序到列表中