r - 如何计算同一组中同一列中 2 个非日期值之间的差异?
问题描述
这是这篇文章的后续问题:在 R 中,我如何计算其中一行的列值大于另一行的列值的分组对的数量?
这是我对数据集 df1 的输入:
structure(list(Name = c("A.J. Ellis", "A.J. Ellis", "A.J. Pierzynski",
"A.J. Pierzynski", "Aaron Boone", "Adam Kennedy", "Adam Melhuse",
"Adrian Beltre", "Adrian Beltre", "Adrian Gonzalez", "Alan Zinter",
"Albert Pujols", "Albert Pujols"), Age = c(37, 36, 37, 36, 36,
36, 36, 37, 36, 36, 36, 37, 36), Year = c(2018, 2017, 2014, 2013,
2009, 2012, 2008, 2016, 2015, 2018, 2004, 2017, 2016), Tm = c("SDP",
"MIA", "TOT", "TEX", "HOU", "LAD", "TOT", "TEX", "TEX", "NYM",
"ARI", "LAA", "LAA"), Lg = c("NL", "NL", "ML", "AL", "NL", "NL",
"ML", "AL", "AL", "NL", "NL", "AL", "AL"), G = c(66, 51, 102,
134, 10, 86, 15, 153, 143, 54, 28, 149, 152), PA = c(183, 163,
362, 529, 14, 201, 32, 640, 619, 187, 40, 636, 650)), row.names = c(NA,
13L), class = "data.frame")
这是我之前的问题的代码正确匹配对:
df1 %>%
arrange(Name, Age) %>%
group_by(Name) %>%
filter(last(G) < first(G))
每个分组对有两个观察值。每个还有一个名为 G 的列和一个名为 Year 的列。
以下是使用上述代码对数据进行分组后的样子:https ://www.dropbox.com/s/hh2qgkbn4cy4k4l/Data%20after%20grouping.png?dl=0
现在,我想知道每个匹配对的是“37 岁”值和“36 岁”值之间“G 列”值的差异:(36 岁值)-(37 岁)价值)。阴性结果是可以的。
此外,对于数据集中所有匹配的对,我想要这些差异的总和。
解决方案
如果我理解正确:
df <- structure(list(Name = c("A.J. Ellis", "A.J. Ellis", "A.J. Pierzynski",
"A.J. Pierzynski", "Aaron Boone", "Adam Kennedy", "Adam Melhuse",
"Adrian Beltre", "Adrian Beltre", "Adrian Gonzalez", "Alan Zinter",
"Albert Pujols", "Albert Pujols"), Age = c(37, 36, 37, 36, 36,
36, 36, 37, 36, 36, 36, 37, 36), Year = c(2018, 2017, 2014, 2013,
2009, 2012, 2008, 2016, 2015, 2018, 2004, 2017, 2016), Tm = c("SDP",
"MIA", "TOT", "TEX", "HOU", "LAD", "TOT", "TEX", "TEX", "NYM",
"ARI", "LAA", "LAA"), Lg = c("NL", "NL", "ML", "AL", "NL", "NL",
"ML", "AL", "AL", "NL", "NL", "AL", "AL"), G = c(66, 51, 102,
134, 10, 86, 15, 153, 143, 54, 28, 149, 152), PA = c(183, 163,
362, 529, 14, 201, 32, 640, 619, 187, 40, 636, 650)), row.names = c(NA,
13L), class = "data.frame")
df1 <- df %>%
arrange(Name, Age) %>%
group_by(Name) %>%
filter(last(G) < first(G)) %>%
mutate(g_diff = G[1] - G[2]) %>%
ungroup() %>%
mutate(sum_g_diff = sum(unique(g_diff)))
> df1
# A tibble: 4 x 9
Name Age Year Tm Lg G PA g_diff sum_g_diff
<chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 A.J. Pierzynski 36 2013 TEX AL 134 529 32 35
2 A.J. Pierzynski 37 2014 TOT ML 102 362 32 35
3 Albert Pujols 36 2016 LAA AL 152 650 3 35
4 Albert Pujols 37 2017 LAA AL 149 636 3 35
或者,如果 的累积总和(运行总和)g_diff
是所需的输出(不汇总数据):
df1 %>%
group_by(Name) %>%
mutate(cols = c(g_diff[1], rep(0, n() -1))) %>%
ungroup() %>%
mutate(cum_sum = cumsum(cols)) %>%
select(-cols)
# A tibble: 4 x 9
Name Age Year Tm Lg G PA g_diff cum_sum
<chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 A.J. Pierzynski 36 2013 TEX AL 134 529 32 32
2 A.J. Pierzynski 37 2014 TOT ML 102 362 32 32
3 Albert Pujols 36 2016 LAA AL 152 650 3 35
4 Albert Pujols 37 2017 LAA AL 149 636 3 35
(此解决方案基于此问题)
推荐阅读
- javascript - 如何将组件 props 持久化为 redux 状态?
- spring-batch - SpringBatch 处理器:在这种情况下是什么对象?
- wordpress - Woocommerce - 如果未选择产品变体选项,如何添加文本?
- python - pipenv 同步和 pipenv install --system --ignore-pipfile 在 docker 环境中
- python - 沿每个轴将 n 维数组拆分为 2D 数组
- android - Android 数据绑定/MVVM 模式:基于 Enum 计算视图颜色
- ios - iOS:显示 UILabel 时如何在字符串中设置 0.5 个空格?
- list - 如何访问球拍列表中的特定位置
- angular - 为什么 webpack 不构建?
- nlp - 伯克利解析器英语 - 在线与离线的不同结果