首页 > 解决方案 > 为什么这个线性模型的拟合值不连续?

问题描述

我想根据他们的年龄和性别估计个人的长度,使用 R 中的 OLS。为此,我建立了以下模型:m1 <- lm(length ~ age + sex, data = data.frame). 接下来,我通过运行创建了一些简单的残差图:

op <- par(mfrow = c(2,2))
plot(resid(m1) ~ fitted(m1))
plot(resid(m1) ~ data.frame$age)
plot(resid(m1) ~ data.frame$sex)
qqnorm(resid(m1)); qqline(resid(m1))
par(op)

产生这个图: 在此处输入图像描述 奇怪的是,拟合值似乎没有范围[165,180],而是[165,170] ∪ [175,180](左上图)。我不明白为什么会这样。

下面是一些产生上述图的样本数据:

    structure(list(length = c(173, 170, 172, 160, 162.5, 180, 179.5, 
175, 168, 186.5, 163.5, 170.5, 160, 175.5, 186.5, 176.5, 168, 
180.5, 179, 167, 183, 188.5, 176, 165, 170, 171, 176, 172, 187, 
189, 180, 175.5, 162.5, 187, 164, 177, 170.5, 159.5, 161.5, 167, 
178, 180.5, 168.5, 162, 171, 173, 171.5, 174.5, 177, 158, 175, 
170, 183.5, 166, 174.5, 174, 176, 165, 163.5, 171.5, 161, 173, 
165, 186, 171, 164.5, 182.5, 156.5, 156, 175, 168.5, 195, 164, 
167.5, 168, 165.5, 172.5, 167, 175, 190, 170.5, 166, 155, 179.5, 
175, 185, 174, 158.5, 172.5, 172.5, 173, 177, 161.5, 173.5, 159, 
181, 176, 181.5, 167.5, 170.5), age = c(31.0965092402464, 67.7481177275838, 
60.9062286105407, 54.776180698152, 57.8316221765914, 42.0287474332649, 
47.1786447638604, 51.315537303217, 68.0876112251882, 32.3613963039014, 
52.1259411362081, 50.7652292950034, 53.6947296372348, 64.6242299794661, 
66.9733059548255, 66.8829568788501, 63.668720054757, 73.533196440794, 
57.7659137577002, 43.7262149212868, 51.2416153319644, 30.7953456536619, 
73.0403832991102, 52.2984257357974, 46.2614647501711, 35.7618069815195, 
74.0670773442847, 35.6878850102669, 43.3894592744695, 65.0458590006845, 
55.9671457905544, 71.2306639288159, 58.5653661875428, 40.0520191649555, 
39.9698836413415, 44.0109514031485, 34.4722792607803, 47.5400410677618, 
51.8822724161533, 46.9596167008898, 39.0143737166324, 49.0349075975359, 
39.3812457221081, 48.2518822724162, 37.0376454483231, 30.425735797399, 
31.5838466803559, 74.9459274469541, 46.3353867214237, 56.0602327173169, 
54.4476386036961, 58.4120465434634, 47.64681724846, 39.047227926078, 
45.2183436002738, 48.0246406570842, 41.5140314852841, 61.0732375085558, 
52.2600958247776, 62.9760438056126, 70.715947980835, 70.5735797399042, 
40.2436687200548, 35.0198494182067, 41.1772758384668, 57.2210814510609, 
64.2710472279261, 59.6221765913758, 63.0088980150582, 48.5366187542779, 
30.0369609856263, 48.8898015058179, 49.7741273100616, 54.7624914442163, 
61.284052019165, 37.0102669404517, 58.4695414099932, 55.3483915126626, 
39.4579055441478, 49.3333333333333, 37.9712525667351, 57.388090349076, 
70.8199863107461, 37.0212183436003, 51.3675564681725, 48.3860369609856, 
35.895961670089, 39.5208761122519, 37.4209445585216, 46.8692676249144, 
65.3826146475017, 51.9425051334702, 33.2594113620808, 55.1156741957563, 
33.9493497604381, 33.2895277207392, 42.0369609856263, 29.4976043805613, 
54.9514031485284, 36.2327173169062), sex = structure(c(1L, 2L, 
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 
1L, 2L), .Label = c("0", "1"), class = "factor")), row.names = c(1L, 
2L, 3L, 4L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L, 
18L, 19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 32L, 
33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 
47L, 48L, 49L, 50L, 51L, 53L, 54L, 56L, 57L, 58L, 59L, 60L, 62L, 
63L, 64L, 65L, 66L, 67L, 68L, 70L, 73L, 74L, 75L, 76L, 77L, 78L, 
79L, 81L, 82L, 85L, 86L, 87L, 88L, 90L, 92L, 94L, 95L, 96L, 97L, 
98L, 99L, 100L, 102L, 103L, 104L, 105L, 106L, 107L, 109L, 110L, 
111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L), class = "data.frame")

有谁知道缺陷吗?

标签: rregressionlmleast-squares

解决方案


遵循@Roland 的评论:该图显示预测值的组内变化远小于观察值。

library(ggplot2); theme_set(theme_bw())
lm1 <- lm(length~age+sex,data=dd)
pp <- expand.grid(age=30:75,sex=factor(0:1))
pp$length <- predict(lm1,newdata=pp)
ggplot(dd,aes(age,length,colour=sex))+
    geom_point()+
    geom_point(data=pp,shape=2)

在此处输入图像描述


推荐阅读