r - 强制 .name_repair 创建名称
问题描述
我有一些数据df_single
和df_multi
. 效果很好,df_multi
但是当我将相同的数据应用于df_single
我运行以下代码:
df_single %>%
as_tibble(., .name_repair = "universal") %>%
summarise_at(.vars = 8:ncol(.), .funs = c(mean = "mean", sd = "sd"))
这给了我以下信息:
# A tibble: 1 x 2
mean sd
<dbl> <dbl>
1 42.4 0.380
这很好,但不是我想要的正确格式。如果我运行以下命令:
df_multi %>%
as_tibble(., .name_repair = "universal") %>%
summarise_at(.vars = 8:ncol(.), .funs = c(mean = "mean", sd = "sd"))
我得到:
# A tibble: 1 x 8
pza_del_carmen_… pza_de_espana_m… escuelas_aguirr… retiro_mean pza_del_carmen_… pza_de_espana_sd
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 29.5 23.8 31.8 11.8 21.2 18.3
# … with 2 more variables: escuelas_aguirre_sd <dbl>, retiro_sd <dbl>
我希望它采用正确的格式。
我的预期输出df_single
将是:
# A tibble: 1 x 2
tres_olivos_mean tres_olivos_sd
<dbl> <dbl>
1 42.4 0.380
名字从何而来。我发现“问题”来自于,.name_repair =
因为数据中的列名没有冲突df_signle
。看着df_single
:
# A tibble: 6 x 8
date day month year quarter semester weekday tres_olivos
<date> <int> <dbl> <dbl> <int> <int> <dbl> <dbl>
1 2010-01-01 1 1 2010 1 1 0 42.9
2 2010-01-02 2 1 2010 1 1 0 42.7
3 2010-01-03 3 1 2010 1 1 0 42.5
4 2010-01-04 4 1 2010 1 1 0 42.3
5 2010-01-05 5 1 2010 1 1 0 42.1
6 2010-01-06 6 1 2010 1 1 0 41.9
我想tres_olivos
从感兴趣的列中获取。df_multi
看起来像:
# A tibble: 6 x 11
date day month year quarter semester weekday pza_del_carmen pza_de_espana escuelas_aguirre retiro
<date> <int> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2010-01-01 1 1 2010 1 1 0 6 4 18 3
2 2010-01-02 2 1 2010 1 1 0 26 20 28 9
3 2010-01-03 3 1 2010 1 1 0 51 50 41 22
4 2010-01-04 4 1 2010 1 1 0 57 39 48 21
5 2010-01-05 5 1 2010 1 1 0 29 25 37 12
6 2010-01-06 6 1 2010 1 1 0 8 5 19 4
数据:
df_single <- structure(list(date = structure(c(14610, 14611, 14612, 14613,
14614, 14615), class = "Date"), day = 1:6, month = c(1, 1, 1,
1, 1, 1), year = c(2010, 2010, 2010, 2010, 2010, 2010), quarter = c(1L,
1L, 1L, 1L, 1L, 1L), semester = c(1L, 1L, 1L, 1L, 1L, 1L), weekday = c(0,
0, 0, 0, 0, 0), tres_olivos = c(42.8840939928959, 42.6809748158197,
42.4778556387312, 42.2747364616426, 42.0716172845541, 41.8684981074656
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
df_multi <- structure(list(date = structure(c(14610, 14611, 14612, 14613,
14614, 14615), class = "Date"), day = 1:6, month = c(1, 1, 1,
1, 1, 1), year = c(2010, 2010, 2010, 2010, 2010, 2010), quarter = c(1L,
1L, 1L, 1L, 1L, 1L), semester = c(1L, 1L, 1L, 1L, 1L, 1L), weekday = c(0,
0, 0, 0, 0, 0), pza_del_carmen = c(6, 26, 51, 57, 29, 8), pza_de_espana = c(4,
20, 50, 39, 25, 5), escuelas_aguirre = c(18, 28, 41, 48, 37,
19), retiro = c(3, 9, 22, 21, 12, 4)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
编辑:从文档
tibble() 和 as_tibble() 的 .name_repair 参数指的是这些级别。或者,用户可以通过自己的姓名修复功能。它应该预期最小的名称作为输入,并且同样应该返回至少最小的名称。
传递我自己的名字修复函数可能会很有趣。
编辑:
Hes是数据的样子:
my_list <- list(list(structure(list(date = structure(c(14610, 14611, 14612,
14613, 14614, 14615), class = "Date"), day = 1:6, month = c(1,
1, 1, 1, 1, 1), year = c(2010, 2010, 2010, 2010, 2010, 2010),
quarter = c(1L, 1L, 1L, 1L, 1L, 1L), semester = c(1L, 1L,
1L, 1L, 1L, 1L), weekday = c(0, 0, 0, 0, 0, 0), pza_del_carmen = c(6,
26, 51, 57, 29, 8), pza_de_espana = c(4, 20, 50, 39, 25,
5), escuelas_aguirre = c(18, 28, 41, 48, 37, 19), retiro = c(3,
9, 22, 21, 12, 4)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L)), structure(list(date = structure(c(14611,
14612, 14613, 14614, 14615, 14616), class = "Date"), day = 2:7,
month = c(1, 1, 1, 1, 1, 1), year = c(2010, 2010, 2010, 2010,
2010, 2010), quarter = c(1L, 1L, 1L, 1L, 1L, 1L), semester = c(1L,
1L, 1L, 1L, 1L, 1L), weekday = c(0, 0, 0, 0, 0, 0), pza_del_carmen = c(26,
51, 57, 29, 8, 22), pza_de_espana = c(20, 50, 39, 25, 5,
12), escuelas_aguirre = c(28, 41, 48, 37, 19, 26), retiro = c(9,
22, 21, 12, 4, 7)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))), list(structure(list(date = structure(c(14610,
14611, 14612, 14613, 14614, 14615), class = "Date"), day = 1:6,
month = c(1, 1, 1, 1, 1, 1), year = c(2010, 2010, 2010, 2010,
2010, 2010), quarter = c(1L, 1L, 1L, 1L, 1L, 1L), semester = c(1L,
1L, 1L, 1L, 1L, 1L), weekday = c(0, 0, 0, 0, 0, 0), tres_olivos = c(42.8840939928959,
42.6809748158197, 42.4778556387312, 42.2747364616426, 42.0716172845541,
41.8684981074656)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L)), structure(list(date = structure(c(14611,
14612, 14613, 14614, 14615, 14616), class = "Date"), day = 2:7,
month = c(1, 1, 1, 1, 1, 1), year = c(2010, 2010, 2010, 2010,
2010, 2010), quarter = c(1L, 1L, 1L, 1L, 1L, 1L), semester = c(1L,
1L, 1L, 1L, 1L, 1L), weekday = c(0, 0, 0, 0, 0, 0), tres_olivos = c(42.6809748158197,
42.4778556387312, 42.2747364616426, 42.0716172845541, 41.8684981074656,
41.6653789303771)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))))
我尝试使用以下方法尽可能多地复制原始列表:
mylist <- list(
list(head(map(rolled_splits[[2]]$splits, ~ analysis(.x))[[1]]),
head(map(rolled_splits[[2]]$splits, ~ analysis(.x))[[2]])),
list(head(map(rolled_splits[[3]]$splits, ~ analysis(.x))[[1]]),
head(map(rolled_splits[[3]]$splits, ~ analysis(.x))[[2]]))
)
解决方案
这里我们可以做一个小技巧,因为一列的名称将默认获取函数名称,请参见?summarise_at
library(dplyr)
df_single %>%
summarise_at(.vars = 7:ncol(.), .funs = c(mean = "mean", sd = "sd")) %>%
rename_all(~paste0('_',.))
# A tibble: 1 x 2
tres_olivos_mean tres_olivos_sd
<dbl> <dbl>
1 42.4 0.380
从?summarise_at
命名部分:
创建列的名称源自输入变量的名称和函数的名称。
- 如果只有一个未命名的变量,则使用函数的名称来命名创建的列。
map(my_list, ~map(.,~if(ncol(.)>8) .x %>% summarise_at(.vars = 7:ncol(.), .funs = c(mean = "mean", sd = "sd"))
else .x %>% summarise_at(.vars = 7:ncol(.), .funs = c(mean = "mean", sd = "sd")) %>% select(2,4)))
#A robust solution is to depend on names rather than positions
summarise_fun <- function(df){
#browser()
nms <- setdiff(names(df), c("date", "day", "month", "year", "quarter", "semester", "weekday"))
if(length(nms)>1){
df %>% summarise_at(.vars = nms, .funs = c(mean = "mean", sd = "sd"))
}else{
df %>% summarise_at(.vars = nms, .funs = c(mean = "mean", sd = "sd")) %>% rename_all(~paste0(nms,'_',.))
}
}
map(my_list, ~map(., summarise_fun))
推荐阅读
- java - No validator could be found for constraint 'javax.validation.constraints.Size'
- r - dplyr - consecutive occurrences in same column, label depending on # of occurrences
- python - 后台进程和 tkinter
- python - 注释堆叠的 barplot matplotlib 和 pandas
- python - Why does `flask_bootstrap` fail to import?
- r - R 的官员包无法识别主布局中的 Powerpoint 占位符
- swift - How to encode SecKey to base64 string using swift
- html - JQuery 自动完成永远不会触发
- opencv - Is it possible to find camera position using 8-10 non-coplanar points, if their 3D coordinates are unknown?
- vue.js - 在 Vue.js 模板的每个循环中显示 JSON 数据属性