r - column mean for subset of rows in R data.table
问题描述
I have a data.table like this:
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)
I need to compute the mean across multiple days, but different ones between id's.
To get the mean across days 1 to 2 for the first individual I've tried the following (inspired by this post):
example[id==1, mean_over_days := mean(mean_day,na.rm=T), by=(day %in% 1:2)]
> example
id day time mean_day mean_over_days
1: 1 1 1 -1.53685184 -0.8908466
2: 1 1 2 0.77445521 -0.8908466
3: 1 1 3 -0.56048917 -0.8908466
4: 1 2 1 -1.78388960 -0.8908466
5: 1 2 2 0.01787129 -0.8908466
6: 1 2 3 -2.25617538 -0.8908466
7: 1 3 1 -0.44886190 -0.0955414
8: 1 3 2 -1.31086985 -0.0955414
9: 1 3 3 1.47310754 -0.0955414
10: 2 1 1 0.53560356 NA
11: 2 1 2 1.16654996 NA
12: 2 1 3 -0.06704728 NA
13: 2 2 1 -0.83897719 NA
14: 2 2 2 -0.85209939 NA
15: 2 2 3 -0.41392341 NA
16: 2 3 1 -0.03014190 NA
17: 2 3 2 0.43835822 NA
18: 2 3 3 -1.62432188 NA
I want all the lines for id==1 of column mean_over_days to have the same value (-0.8908466
), but it happens that for day 3 this column has the mean over that day only.
How can I change the code to correct this?
解决方案
Don't subset in by
it would create different groups for TRUE
and FALSE
values.
library(data.table)
example[id==1, mean_over_days := mean(mean_day[day %in% 1:2],na.rm=TRUE)][]
# id day time mean_day mean_over_days
# 1: 1 1 1 -0.56047565 0.4471527
# 2: 1 1 2 -0.23017749 0.4471527
# 3: 1 1 3 1.55870831 0.4471527
# 4: 1 2 1 0.07050839 0.4471527
# 5: 1 2 2 0.12928774 0.4471527
# 6: 1 2 3 1.71506499 0.4471527
# 7: 1 3 1 0.46091621 0.4471527
# 8: 1 3 2 -1.26506123 0.4471527
# 9: 1 3 3 -0.68685285 0.4471527
#10: 2 1 1 -0.44566197 NA
#11: 2 1 2 1.22408180 NA
#12: 2 1 3 0.35981383 NA
#13: 2 2 1 0.40077145 NA
#14: 2 2 2 0.11068272 NA
#15: 2 2 3 -0.55584113 NA
#16: 2 3 1 1.78691314 NA
#17: 2 3 2 0.49785048 NA
#18: 2 3 3 -1.96661716 NA
data
set.seed(123)
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)
推荐阅读
- angular - 在`中追加“%”
- ios - 如何在 iPhone SE 和 5S 上设置自动布局?
- visual-studio - 如何轻松更改 XAML 代码样式(仅列出一行)
- drake - 逆运动轨迹上的速度级约束
- c++ - 为什么我们不能在 C++ 中初始化一个类的成员变量
- c# - 更新 X、Y、Z 浮点数和特定的 While 循环场景 C#
- java - 注册名为“dispatcher”的 servlet 失败。请检查是否有另一个以相同名称注册的 servlet。
- reactjs - CORS 标头“Access-Control-Allow-Origin”缺失 - Access-Control-Allow-Origin 已给出
- javafx - 为 Java 8 构建的 JavaFX 应用程序 - 如何继续使用 Java 11 运行?
- python - Python Tkinter - 不确定如何使用 messagebox.askretrycancel() 函数