r - NA values and extra rows when spreading repeated measures of multiple variables into wide format?
问题描述
With the data below (included with dput
), I have a varying number of repeat Lat and Long locations for three individuals and would like to spread them into wide format using dplyr
.
The data look like so:
> head(Dat)
IndIDII IndYear WintLat WintLong
1 BHS_265 BHS_265-2015 47.61025 -112.7210
2 BHS_265 BHS_265-2016 47.59884 -112.7089
3 BHS_770 BHS_770-2016 42.97379 -109.0400
4 BHS_770 BHS_770-2017 42.97129 -109.0367
5 BHS_770 BHS_770-2018 42.97244 -109.0509
6 BHS_377 BHS_377-2015 43.34744 -109.4821
This post provided a slick solution that has been a big help. Nonetheless, I am unable to obtain my desired result. Modifying the code I have the following:
Dat %>%
group_by(IndIDII) %>%
#Make YearNum (as intiger not calnader year) for each IndIDII
mutate(YearNum = row_number()) %>%
gather(Group, LatLong, c(WintLat, WintLong)) %>%
unite(GroupNew, YearNum, Group, sep = "-") %>%
spread(GroupNew, LatLong) %>%
as.data.frame()
Which produces a nearly correct result, but has multiple rows for each IndIDII
, each containing the lat and long for a single year.
IndIDII IndYear 1-WintLat 1-WintLong 2-WintLat 2-WintLong 3-WintLat 3-WintLong 4-WintLat 4-WintLong
1 BHS_265 BHS_265-2015 47.61025 -112.7210 NA NA NA NA NA NA
2 BHS_265 BHS_265-2016 NA NA 47.59884 -112.7089 NA NA NA NA
3 BHS_377 BHS_377-2015 43.34744 -109.4821 NA NA NA NA NA NA
4 BHS_377 BHS_377-2016 NA NA 43.35559 -109.4445 NA NA NA NA
5 BHS_377 BHS_377-2017 NA NA NA NA 43.35195 -109.4566 NA NA
6 BHS_377 BHS_377-2018 NA NA NA NA NA NA 43.34765 -109.4892
7 BHS_770 BHS_770-2016 42.97379 -109.0400 NA NA NA NA NA NA
8 BHS_770 BHS_770-2017 NA NA 42.97129 -109.0367 NA NA NA NA
9 BHS_770 BHS_770-2018 NA NA NA NA 42.97244 -109.0509 NA NA
I am trying to have all lat and longs for an IndIDII
in a single row (i.e. wide format) as shown below. NA
values will appear when individuals have fewer than the max number of years. I suspect the issue is with the GroupNew
variable and have tried different options, but to no avail...
Dat <- structure(list(IndIDII = c("BHS_265", "BHS_265", "BHS_770", "BHS_770",
"BHS_770", "BHS_377", "BHS_377", "BHS_377", "BHS_377"), IndYear = c("BHS_265-2015",
"BHS_265-2016", "BHS_770-2016", "BHS_770-2017", "BHS_770-2018",
"BHS_377-2015", "BHS_377-2016", "BHS_377-2017", "BHS_377-2018"
), WintLat = c(47.6102519805014, 47.5988417247191, 42.9737859090909,
42.9712914772727, 42.9724390816327, 43.3474354347826, 43.3555934579439,
43.3519543396226, 43.3476466990291), WintLong = c(-112.720994832869,
-112.708887595506, -109.039964727273, -109.036693522727, -109.050923061224,
-109.482114456522, -109.444522149533, -109.45659254717, -109.489241553398
)), class = "data.frame", row.names = c(NA, -9L))
解决方案
你快到了。lat
和long
进入不同的行,因为它们是IndYear
不同的。由于您仅在 final 中保留IndYear
for each的第一个值,因此 add将为您提供所需的结果。IndiDII
data.frame
IndYear = first(IndYear)
Dat %>%
group_by(IndIDII) %>%
mutate(YearNum = row_number(), IndYear = first(IndYear)) %>%
gather(Group, LatLong, c(WintLat, WintLong)) %>%
unite(GroupNew, YearNum, Group, sep = "-") %>%
spread(GroupNew, LatLong) %>%
as.data.frame()
# IndIDII IndYear 1-WintLat 1-WintLong 2-WintLat 2-WintLong 3-WintLat 3-WintLong 4-WintLat 4-WintLong
# 1 BHS_265 BHS_265-2015 47.61025 -112.7210 47.59884 -112.7089 NA NA NA NA
# 2 BHS_377 BHS_377-2015 43.34744 -109.4821 43.35559 -109.4445 43.35195 -109.4566 43.34765 -109.4892
# 3 BHS_770 BHS_770-2016 42.97379 -109.0400 42.97129 -109.0367 42.97244 -109.0509 NA NA
推荐阅读
- android - Button 的背景是 colorPrimary 着色的,而不是 colorAccent 着色的吗?
- google-chrome-extension - 开发了一个 Chrome 扩展程序,可以从上下文菜单点击发送消息到弹出窗口。寻找重构建议
- python - Flask:For循环在HTML模板中不起作用
- c++ - 为什么Redis使用skiplist来实现zset
- angular - 我无法在 Angular 11 中使用 @ViewChild、渲染和渲染器隐藏 mat-menu 元素
- mysql - pmadb 的配置...不正常 PhpMyAdmin 中的错误
- cognos - Cognos Report Studio:CASE 和 IF 语句
- c# - 为什么会发生 System.TypeInitializationException?
- django - 如何维护认证数据
- r - 如何通过 sparklyr 将 sf 数据导出到 Spark?