r - 使用 rbind 为回归准备数据
问题描述
我有 2 个数据集,我想用它们来进行回归。prep_data 是一个项目列表,每个项目都有一个 24 小时的值:
device_id h city mean_t
1 2C7970 00 Auckland 18.60507
2 2C7970 01 Auckland 18.56266
3 2C7970 02 Auckland 18.46109
4 2C7970 03 Auckland 18.38914
5 2C7970 04 Auckland 18.29095
6 2C7970 05 Auckland 18.28618
7 2C7970 06 Auckland 18.28742
8 2C7970 07 Auckland 18.28725
9 2C7970 08 Auckland 18.34685
10 2C7970 09 Auckland 18.52965
11 2C7970 10 Auckland 18.78257
12 2C7970 11 Auckland 19.05493
13 2C7970 12 Auckland 19.13627
14 2C7970 13 Auckland 19.30245
15 2C7970 14 Auckland 19.49134
16 2C7970 15 Auckland 19.49379
17 2C7970 16 Auckland 19.42745
18 2C7970 17 Auckland 19.30474
19 2C7970 18 Auckland 19.25752
20 2C7970 19 Auckland 19.15033
21 2C7970 20 Auckland 19.02484
22 2C7970 21 Auckland 18.88421
23 2C7970 22 Auckland 18.79243
24 2C7970 23 Auckland 18.70263
25 2C797D 00 Auckland 16.35536
26 2C797D 01 Auckland 16.13241
27 2C797D 02 Auckland 15.91293
28 2C797D 03 Auckland 15.64948
29 2C797D 04 Auckland 15.34483
30 2C797D 05 Auckland 15.19293
31 2C797D 06 Auckland 15.16948
32 2C797D 07 Auckland 15.23397
33 2C797D 08 Auckland 15.69017
34 2C797D 09 Auckland 16.18345
35 2C797D 10 Auckland 16.60586
36 2C797D 11 Auckland 16.80259
37 2C797D 12 Auckland 16.92842
38 2C797D 13 Auckland 17.31712
39 2C797D 14 Auckland 17.57376
40 2C797D 15 Auckland 17.63069
41 2C797D 16 Auckland 17.72552
42 2C797D 17 Auckland 17.69621
43 2C797D 18 Auckland 17.56276
44 2C797D 19 Auckland 17.45534
45 2C797D 20 Auckland 17.29776
46 2C797D 21 Auckland 17.07296
47 2C797D 22 Auckland 16.83681
48 2C797D 23 Auckland 16.65069
49 2C7985 00 Auckland 17.29410
50 2C7985 01 Auckland 16.88867
51 2C7985 02 Auckland 16.71007
52 2C7985 03 Auckland 16.54133
53 2C7985 04 Auckland 16.27081
54 2C7985 05 Auckland 16.02517
55 2C7985 06 Auckland 15.98891
56 2C7985 07 Auckland 15.93858
57 2C7985 08 Auckland 15.87868
58 2C7985 09 Auckland 15.91705
59 2C7985 10 Auckland 16.57119
60 2C7985 11 Auckland 17.14488
61 2C7985 12 Auckland 17.53762
62 2C7985 13 Auckland 18.23825
63 2C7985 14 Auckland 18.69074
64 2C7985 15 Auckland 18.90296
65 2C7985 16 Auckland 19.11993
66 2C7985 17 Auckland 19.10441
67 2C7985 18 Auckland 18.76062
68 2C7985 19 Auckland 18.52761
69 2C7985 20 Auckland 18.33885
70 2C7985 21 Auckland 17.98059
71 2C7985 22 Auckland 17.66859
72 2C7985 23 Auckland 17.58604
73 2C7A55 00 Auckland 17.82138
74 2C7A55 01 Auckland 17.67769
75 2C7A55 02 Auckland 17.43013
76 2C7A55 03 Auckland 17.28240
77 2C7A55 04 Auckland 17.13849
78 2C7A55 05 Auckland 16.93355
79 2C7A55 06 Auckland 16.71875
80 2C7A55 07 Auckland 16.63651
81 2C7A55 08 Auckland 16.58069
82 2C7A55 09 Auckland 16.60646
83 2C7A55 10 Auckland 16.70066
84 2C7A55 11 Auckland 16.82496
85 2C7A55 12 Auckland 17.16405
86 2C7A55 13 Auckland 17.52680
87 2C7A55 14 Auckland 17.78660
88 2C7A55 15 Auckland 18.03415
89 2C7A55 16 Auckland 18.26683
90 2C7A55 17 Auckland 18.39262
91 2C7A55 18 Auckland 18.41612
92 2C7A55 19 Auckland 18.39901
93 2C7A55 20 Auckland 18.35181
94 2C7A55 21 Auckland 18.25581
95 2C7A55 22 Auckland 18.11842
96 2C7A55 23 Auckland 17.94852
要绑定的表如下所示:
city_name h outdoor_hum outdoor_temp
1 Auckland 00 87.00000 11.62608
2 Auckland 01 86.87500 11.52487
3 Auckland 02 87.10596 11.34609
4 Auckland 03 87.82895 11.29961
5 Auckland 04 88.20395 11.15151
6 Auckland 05 88.07237 11.06513
7 Auckland 06 88.80921 10.96934
8 Auckland 07 88.46053 10.96316
9 Auckland 08 87.50658 11.59500
10 Auckland 09 85.03947 12.58395
11 Auckland 10 79.73684 13.80546
12 Auckland 11 75.23684 14.79230
13 Auckland 12 73.01974 15.25612
14 Auckland 13 71.24837 15.50320
15 Auckland 14 69.88889 15.62948
16 Auckland 15 70.81699 15.51739
17 Auckland 16 71.79739 15.14314
18 Auckland 17 74.93464 14.30928
19 Auckland 18 79.57516 13.50771
20 Auckland 19 81.19608 13.03699
21 Auckland 20 82.69281 12.56595
22 Auckland 21 84.43791 12.28150
23 Auckland 22 85.07843 12.05542
24 Auckland 23 85.75163 11.82458
25 Wellington 00 87.00000 11.62608
26 Wellington 01 86.87500 11.52487
27 Wellington 02 87.10596 11.34609
.. Wellington 03 87.82895 11.29961
.. Wellington 04 88.20395 11.15151
.. Wellington 05 88.07237 11.06513
.. Wellington 06 88.80921 10.96934
.. Wellington 07 88.46053 10.96316
.. Wellington 08 87.50658 11.59500
.. Wellington 09 85.03947 12.58395
.. Wellington 10 79.73684 13.80546
.. Wellington 11 75.23684 14.79230
.. Wellington 12 73.01974 15.25612
.. Wellington 13 71.24837 15.50320
.. Wellington 14 69.88889 15.62948
.. Wellington 15 70.81699 15.51739
.. Wellington 16 71.79739 15.14314
.. Wellington 17 74.93464 14.30928
.. Wellington 18 79.57516 13.50771
.. Wellington 19 81.19608 13.03699
.. Wellington 20 82.69281 12.56595
.. Wellington 21 84.43791 12.28150
.. Wellington 22 85.07843 12.05542
.. Wellington 23 85.75163 11.82458
对于顶部 data.table 中的每个唯一 device_id,我需要对第二个 data.table 中给出的匹配城市的 mean_t 与 Outdoor_temp 进行回归。至关重要的是,有些物品需要使用惠灵顿天气套装,有些则需要使用奥克兰套装。实际上,在全套中,有几十个地点。我的想法是,如果我可以使用 rbind 输出如下格式:
h 2C7985 2C7985_outdoor 2C797D 2C797D_outdoor ... 0 1 2 ... 23
其中 _outdoor 是来自匹配城市的数据。然后我可以通过搜索每个不包含 _outdoor 的标题并将其与相应的列进行比较来进行回归测试......这是只有一个城市的情况的代码:
table_4 <- c()
# for each column (except h and "Auckland")
for (i in names(reg_data)[!(names(reg_data) %in% c("h", "Auckland"))]) {
# make a line of linear model comparing the column to the weather.
model <- lm(reg_data[[i]] ~ Auckland, data = reg_data)
# bind this new chunk of model action to the table thingy.
table_4 <- rbind(table_4_1_1, dust(model) %>% as.tibble())
}
这是一个棘手的问题,但我认为那里有一位 R 大师愿意帮助这个可怜的旧嵌入式 C 编码器!
> dput(reg_data[1:48, ])
structure(list(device_id = c("2C7970", "2C7970", "2C7970", "2C7970",
"2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C7970",
"2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C7970",
"2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C7970", "2C797D",
"2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D",
"2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D",
"2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D", "2C797D",
"2C797D", "2C797D"), h = c("00", "01", "02", "03", "04", "05",
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "00", "01", "02", "03",
"04", "05", "06", "07", "08", "09", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23"), city = c("Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland"),
mean_t = c(18.6050736497545, 18.5626644736842, 18.4610927152318,
18.3891447368421, 18.2909539473684, 18.2861842105263, 18.287417218543,
18.2872516556291, 18.3468543046358, 18.5296540362438, 18.7825657894737,
19.0549342105263, 19.1362745098039, 19.3024509803922, 19.491339869281,
19.4937908496732, 19.4274509803922, 19.3047385620915, 19.2575163398693,
19.1503267973856, 19.0248366013072, 18.8842105263158, 18.7924342105263,
18.7026315789474, 16.3553633217993, 16.1324137931034, 15.9129310344828,
15.6494827586207, 15.3448275862069, 15.1929310344828, 15.1694827586207,
15.2339655172414, 15.6901724137931, 16.1834482758621, 16.6058620689655,
16.8025862068966, 16.9284246575342, 17.3171232876712, 17.573756432247,
17.6306896551724, 17.7255172413793, 17.6962068965517, 17.5627586206897,
17.4553448275862, 17.2977586206897, 17.0729636048527, 16.8368055555556,
16.6506944444444)), row.names = c(NA, -48L), groups = structure(list(
device_id = c("2C7970", "2C797D"), .rows = structure(list(
1:24, 25:48), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
dput(Hourly_Weather)
structure(list(city_name = c("Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland", "Auckland", "Auckland", "Auckland",
"Auckland", "Auckland", "Auckland"), h = c("00", "01", "02",
"03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23"),
outdoor_hum = c(87, 86.875, 87.1059602649007, 87.8289473684211,
88.2039473684211, 88.0723684210526, 88.8092105263158, 88.4605263157895,
87.5065789473684, 85.0394736842105, 79.7368421052632, 75.2368421052632,
73.0197368421053, 71.2483660130719, 69.8888888888889, 70.8169934640523,
71.797385620915, 74.9346405228758, 79.5751633986928, 81.1960784313726,
82.6928104575163, 84.437908496732, 85.078431372549, 85.7516339869281
), outdoor_temp = c(11.6260784313725, 11.5248684210526, 11.3460927152318,
11.2996052631579, 11.1515131578947, 11.0651315789474, 10.9693421052632,
10.9631578947368, 11.595, 12.5839473684211, 13.8054605263158,
14.7923026315789, 15.2561184210526, 15.5032026143791, 15.629477124183,
15.517385620915, 15.143137254902, 14.3092810457516, 13.5077124183007,
13.0369934640523, 12.5659477124183, 12.2815032679739, 12.0554248366013,
11.8245751633987)), row.names = c(NA, -24L), groups = structure(list(
city_name = "Auckland", .rows = structure(list(1:24), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
解决方案
一种tidyverse
方法是为每个device_id
.
library(tidyverse)
library(pixiedust)
inner_join(reg_data, weather, by = c('city' = 'city_name', 'h')) %>%
group_by(device_id) %>%
summarise(model = list(lm(mean_t ~ outdoor_temp)),
data = purrr::map(model, ~dust(.x) %>% as.tibble)) %>%
unnest(data) %>%
select(-model)
# device_id term estimate std.error statistic p.value
# <chr> <chr> <chr> <chr> <chr> <chr>
#1 2C7970 (Intercept) 15.861 0.284 55.831 0
#2 2C7970 outdoor_temp 0.23 0.022 10.526 0
#3 2C797D (Intercept) 11.146 0.85 13.112 0
#4 2C797D outdoor_temp 0.422 0.065 6.449 0
推荐阅读
- android - CryptographyManager 使用的 secretKeyName 是什么?
- c++ - 我不知道为什么它不打印链接列表
- vue.js - 从 vue.js 中的 slot 访问组件数据(没有在郊区声明模板)
- reactjs - 使用 ThreeJS 和 React 三纤执行 React 应用程序时出现“编译失败”错误
- python - python sort numpy ValueError:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()
- vba - 使用 VBA 自动化 AutoCAD 中的数据
- database - 两个连续的mongodb文档字段之间的区别
- performance - 在项目中同时使用 mokito 和 mockk
- javascript - 不处理响应数据只想使用 fetch 获取响应状态
- blockchain - 我该如何解决这个错误?“找不到命令:傀儡”