r - 我在 R 循环上有问题,我想生成一个特定的输出,其中包含对不同样本应用线性模型的预测和置信区间
问题描述
我有一个像这样的 N 个观察值的群体
*Y X ID
…… ….. 1
…… … 2
…… ……. 3
…… ….. .
……. …….. .*
我生成了这段代码来获取不同的样本并将线性模型应用于它们:
N=1000
X=rnorm(N,2,1)
Y=8*X+rnorm(N,0,1)
POP=cbind(X,Y)
POPULATION=as.data.frame(POP)
POPULATION$ID=seq.int(nrow(population))
J=10
n=100
PREDICTIONS=matrix(,nrow = n,ncol=J)
for (i in 1:J) {
SAMPLE=POPULATION[sample(nrow(POPULATION),size = n,replace = F),]
Y1=SAMPLE$Y
X1=SAMPLE$X
LM=lm(Y1~X1)
PREDICTIONS[,i]=as.array(predict(LM,SAMPLE))
}
我想将预测和置信区间合并到总体数据框。也就是说,我想要这样的东西:
ID Estimate1 LW UP Estimate2 LW UP … …. ….
1 NA NA NA 8.25 4.3 5.7 NA NA NA
2 3.5 1.2 4.2 NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA ... ... . .
4 7.8 4.2 10.5 7.14 6.2 8.1 NA NA NA .......
5 . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .*
我怎样才能调整循环以获得类似的东西?
解决方案
这就是你可以做到的方式。
set.seed(2) # with a seed your example is reproducible!
N <- 1000
X <- rnorm(N,2,1)
Y <- 8*X + rnorm(N,0,1)
POPULATION <- data.frame(X = X, Y = Y, ID = seq_len(N))
J <- 10
n <- 100
for (i in 1:J) {
rows <- sample(nrow(POPULATION), size = n, replace = FALSE)
SAMPLE <- POPULATION[rows,]
LM <- lm(Y~X, SAMPLE)
PR <- predict(LM, SAMPLE, interval = "confidence")
cols <- paste(colnames(PR), i, sep = "_")
POPULATION[rows,cols] <- asplit(PR,2)
}
head(POPULATION)[1:9]
#> X Y ID fit_1 lwr_1 upr_1 fit_2 lwr_2 upr_2
#> 1 1.1030855 9.290884 1 NA NA NA NA NA NA
#> 2 2.1848492 18.433460 2 NA NA NA NA NA NA
#> 3 3.5878453 27.755556 3 NA NA NA NA NA NA
#> 4 0.8696243 6.995558 4 NA NA NA NA NA NA
#> 5 1.9197482 14.527104 5 NA NA NA 15.57564 15.38493 15.76634
#> 6 2.1324203 17.616534 6 NA NA NA NA NA NA
然而,像这样你会得到很多丢失的数据POPULATION
。
您确定不想应用于predict
整个数据吗?
像这样:
for (i in 1:J) {
rows <- sample(nrow(POPULATION), size = n, replace = FALSE)
SAMPLE <- POPULATION[rows,]
LM <- lm(Y~X, SAMPLE)
PR <- predict(LM, POPULATION, interval = "confidence")
cols <- paste(colnames(PR), i, sep = "_")
POPULATION[,cols] <- asplit(PR,2)
}
head(POPULATION)[1:9]
#> X Y ID fit_1 lwr_1 upr_1 fit_2 lwr_2 upr_2
#> 1 1.1030855 9.290884 1 8.782858 8.498869 9.066846 9.018652 8.741132 9.296172
#> 2 2.1848492 18.433460 2 17.395832 17.181911 17.609754 17.704131 17.516264 17.891998
#> 3 3.5878453 27.755556 3 28.566451 28.186621 28.946281 28.968784 28.610305 29.327263
#> 4 0.8696243 6.995558 4 6.924046 6.606492 7.241600 7.144193 6.829553 7.458833
#> 5 1.9197482 14.527104 5 15.285106 15.072070 15.498142 15.575636 15.384929 15.766343
#> 6 2.1324203 17.616534 6 16.978395 16.765726 17.191063 17.283179 17.096002 17.470357
推荐阅读
- javascript - Node.js fs.readFile 将读取的数据存储在内存中
- python - 对两个 pandas 列执行逐行操作
- reactjs - 提交时如何获取json数据的输入值
- postgresql - 分布式排序和分页
- java - 如何使用 Spring MVC 进行正确的重定向?
- ios - 在范围内找不到类型“GADRequestError”
- wso2 - wso2 api manager 2.6.0 Suspending endpoint : AnonymousEndpoint with address https://{uri.var.hostname}:{uri.var.portnum}/oauth2/token
- shopify - Shoppify include image with max height in liquid template
- django - Using Taiga6 with Docker and an SMTP Server that does not require authentication
- reporting-services - SSRS Ignore some data in column group totals