r - 计算测量值的汇总统计数据并将它们旋转到 R 中的列
问题描述
我有一个这样的数据框
Step <- c("1","1","4","3","2","2","3","4","4","3","1","3","2","4","3","1","2")
Length <- c(0.1,0.5,0.7,0.8,0.2,0.1,0.3,0.8,0.9,0.15,0.25,0.27,0.28,0.61,0.15,0.37,0.18)
Breadth <- c(0.13,0.35,0.87,0.38,0.52,0.71,0.43,0.8,0.9,0.15,0.45,0.7,0.8,0.11,0.11,0.47,0.28)
Height <- c(0.31,0.35,0.37,0.38,0.32,0.51,0.53,0.48,0.9,0.15,0.35,0.32,0.22,0.11,0.17,0.27,0.38)
Width <- c(0.21,0.25,0.27,0.8,0.2,0.21,0.3,0.28,0.29,0.65,0.55,0.37,0.26,0.31,0.5,0.7,0.8)
df <- data.frame(Step,Length,Breadth,Height,Width)
我正在尝试计算按步骤分组的测量值的最大值、最小值、平均值、中值、标准偏差,然后将这些具有测量值的列旋转为一列。
我想要的输出是
Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2 median_2 sd_2 max_3 min_3 mean_3 median_3 sd_3 max_4 min_4 mean_4 median_4 sd_4
Length 0.50 0.10 0.3050 0.31 0.17058722 0.28 0.10 0.1900 0.190 0.07393691 0.80 0.15 0.334 0.27 0.2693139 0.90 0.61 0.7525 0.750 0.12526638
Breadth 0.47 0.13 0.3500 0.40 0.15577760 0.80 0.28 0.5775 0.615 0.23012680 0.70 0.11 0.354 0.38 0.2383904 0.90 0.11 0.6700 0.835 0.37567720
Height 0.35 0.27 0.3200 0.33 0.03829708 0.51 0.22 0.3575 0.350 0.12120919 0.53 0.15 0.310 0.32 0.1570032 0.90 0.11 0.4650 0.425 0.32888701
Width 0.70 0.21 0.4275 0.40 0.23669601 0.80 0.20 0.3675 0.235 0.28952547 0.80 0.30 0.524 0.50 0.2040343 0.31 0.27 0.2875 0.285 0.01707825
我正在尝试以这种方式计算汇总统计信息,但这不是一种有效的方法。
library(dplyr)
df1 <- df %>%
group_by(Step) %>%
summarise(Length_Mean = mean(Length),
Breadth_Mean = mean(Breadth),
Height_Mean = mean(Height),
Width_Mean = mean(Width))
如何以最少的代码高效地完成我想要的输出?有人能指出我正确的方向吗?
解决方案
您可以使用“范围”版本summarize
来一次计算多列的相同汇总统计信息。来自?scoped
:
以 _if、_at 或 _all 为后缀的变体将一个表达式(有时是几个)应用于指定子集中的所有变量。该子集可以包含所有变量(_all 变体)、vars() 选择(_at 变体)或使用谓词选择的变量(_if 变体)。
这里summarize_all
可能是一个不错的选择;它选择除分组列之外的所有列。您还可以提供几个汇总函数来计算选择中的每个变量。
library(tidyverse)
# Calculate the summary statistics
sums <- df %>%
group_by(Step) %>%
summarize_all(funs(max, min, mean, median, sd))
sums
#> # A tibble: 4 x 21
#> Step Length_max Breadth_max Height_max Width_max Length_min Breadth_min
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.5 0.47 0.35 0.7 0.1 0.13
#> 2 2 0.28 0.8 0.51 0.8 0.1 0.28
#> 3 3 0.8 0.7 0.53 0.8 0.15 0.11
#> 4 4 0.9 0.9 0.9 0.31 0.61 0.11
#> # ... with 14 more variables: Height_min <dbl>, Width_min <dbl>,
#> # Length_mean <dbl>, Breadth_mean <dbl>, Height_mean <dbl>,
#> # Width_mean <dbl>, Length_median <dbl>, Breadth_median <dbl>,
#> # Height_median <dbl>, Width_median <dbl>, Length_sd <dbl>,
#> # Breadth_sd <dbl>, Height_sd <dbl>, Width_sd <dbl>
现在我们有了汇总统计数据,剩下要做的就是重塑数据以实现所需的输出。为此,gather
,spread
和
separate
from unite
tidyr派上用场:
sums %>%
# Reshape to long format
gather(col, val, -Step) %>%
# Separate the measurement and the summary statistic
separate(col, into = c("Measurement", "stat")) %>%
arrange(Step) %>%
# Create the desired column headings
unite(col, stat, Step) %>%
# Need to use factors to preserve order
mutate_at(vars(col, Measurement), fct_inorder) %>%
# Reshape back to wide format
spread(col, val)
#> # A tibble: 4 x 21
#> Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Length 0.5 0.1 0.305 0.31 0.171 0.28 0.1 0.19
#> 2 Breadth 0.47 0.13 0.35 0.4 0.156 0.8 0.28 0.578
#> 3 Height 0.35 0.27 0.32 0.330 0.0383 0.51 0.22 0.358
#> 4 Width 0.7 0.21 0.428 0.4 0.237 0.8 0.2 0.368
#> # ... with 12 more variables: median_2 <dbl>, sd_2 <dbl>, max_3 <dbl>,
#> # min_3 <dbl>, mean_3 <dbl>, median_3 <dbl>, sd_3 <dbl>, max_4 <dbl>,
#> # min_4 <dbl>, mean_4 <dbl>, median_4 <dbl>, sd_4 <dbl>
由reprex 包(v0.2.0) 于 2018 年 5 月 24 日创建。
推荐阅读
- reactjs - 如何使用 react 将数据传递给 Modal。编辑还是删除?
- ruby-on-rails - 将自定义参数传递给 devise.en.yml 以在错误消息中使用
- python - Python的点图,每个不同的颜色?
- java - 为什么我的 JavaCL 只检测到集成显卡?
- json - 通过 Spark yiels 将 Dataframe 写入 Cassandra:java.lang.NullPointerException:参数值不能为空
- spring - 如何在oracle数据库中存储巨大的字符串数据?
- ios - 覆盖 VideoPreviewLayer 上方的任何内容的问题 | 迅速
- java - Java 反射 - 通过传递名称获取类变量
- c++ - 有效地将大数存储为路径问题的 2 的幂
- flutter - 颤动将数据传递到下拉按钮