r - 如何使用函数 data.table 指示用于计算不同统计信息的“n”
问题描述
数据框df1
总结了不同人的一小时时间间隔。
df1<- data.frame(Round_datetime=c("2016-08-23 11:00:00","2016-08-23 11:00:00","2016-08-23 12:00:00","2016-08-23 12:00:00"),
Person= c("Sophie","Anna","Sophie","Anna"))
df1$Round_datetime<-as.POSIXct(df1$Round_datetime, format="%Y-%m-%d %H",tz="UTC")
df1
Round_datetime Person
1 2016-08-23 11:00:00 Sophie
2 2016-08-23 11:00:00 Anna
3 2016-08-23 12:00:00 Sophie
4 2016-08-23 12:00:00 Anna
df2
随着时间的推移,数据框提供了有关这些人的一些信息。
df2<- data.frame(DateTime=c("2016-08-23 10:29:08.324","2016-08-23 10:39:36.326","2016-08-23 10:44:08.724","2016-08-23 10:59:46.324","2016-08-23 11:19:22.324","2016-08-23 11:29:53.324","2016-08-23 11:34:14.324","2016-08-23 11:47:49.324","2016-08-23 11:54:58.324","2016-08-23 11:59:13.324","2016-08-23 12:12:34.324","2016-08-23 12:23:43.324","2016-08-23 12:32:14.324","2016-08-23 12:29:28.324"),
Person=c("Sophie","Anna","Sophie","Anna","Sophie","Anna","Sophie","Anna","Sophie","Anna","Sophie","Anna","Sophie","Anna"),
Value=c(10,15,5,10,20,15,10,5,25,15,10,5,10,20))
df2$DateTime<-as.POSIXct(df2$DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
df2
DateTime Person Value
1 2016-08-23 10:29:08.323 Sophie 10
2 2016-08-23 10:39:36.325 Anna 15
3 2016-08-23 10:44:08.723 Sophie 5
4 2016-08-23 10:59:46.323 Anna 10
5 2016-08-23 11:19:22.323 Sophie 20
6 2016-08-23 11:29:53.323 Anna 15
7 2016-08-23 11:34:14.323 Sophie 10
8 2016-08-23 11:47:49.323 Anna 5
9 2016-08-23 11:54:58.323 Sophie 25
10 2016-08-23 11:59:13.323 Anna 15
11 2016-08-23 12:12:34.323 Sophie 10
12 2016-08-23 12:23:43.323 Anna 5
13 2016-08-23 12:32:14.323 Sophie 10
14 2016-08-23 12:29:28.323 Anna 20
我使用下面显示的代码添加统计信息mean
,standard deviation
并standard error
根据df1
中的信息df2
。
library(plotrix)
setDT(df1)[, Round_datetime := ymd_hms(Round_datetime)]
setDT(df2)[, dt_floor := round_date(ymd_hms(DateTime), unit = "hour")]
df2[df1, .(mean = mean(Value),
sd = sd(Value),
se = std.error(Value)),
on = .(Person, dt_floor = Round_datetime), by = .EACHI]
Person dt_floor mean sd se
1: Sophie 2016-08-23 11:00:00 12.50000 10.606602 7.49
2: Anna 2016-08-23 11:00:00 13.33333 2.886751 1.66
3: Sophie 2016-08-23 12:00:00 15.00000 8.660254 4.99
4: Anna 2016-08-23 12:00:00 11.25000 7.500000 3.75
但是,我需要包含另一个名为的变量,该变量n
表示每个一小时时间间隔内采集的样本数。我期望的是:
Person dt_floor mean sd se n
1: Sophie 2016-08-23 11:00:00 12.50000 10.606602 7.49 2
2: Anna 2016-08-23 11:00:00 13.33333 2.886751 1.66 3
3: Sophie 2016-08-23 12:00:00 15.00000 8.660254 4.99 3
4: Anna 2016-08-23 12:00:00 11.25000 7.500000 3.75 4
有谁知道该怎么做?
解决方案
只需添加.N
到最后一部分:
df2[df1, .(mean = mean(Value),
sd = sd(Value),
n = .N),
on = .(Person, dt_floor = Round_datetime), by = .EACHI]
输出:
Person dt_floor mean sd n
1: Sophie 2016-08-23 11:00:00 12.50000 10.606602 2
2: Anna 2016-08-23 11:00:00 13.33333 2.886751 3
3: Sophie 2016-08-23 12:00:00 15.00000 8.660254 3
4: Anna 2016-08-23 12:00:00 11.25000 7.500000 4
推荐阅读
- c++ - 当我尝试在类中创建数组时,为什么会出现“无效使用非静态数据成员”错误?
- python - 如何在 pytest-qt 中处理模态对话框而不模拟对话框
- http - 是否可以将发送到 https://ip 的流量重定向到 http://ip?
- django - 我无法让项目显示在我的表中来自数据库
- arrays - 如何在车把循环数组中使用 lang var?
- google-cloud-platform - 合并文件并存储在字符串中
- ruby - 乘客 + Apache 没有产生 padrino 应用程序
- html - 如何在 HTML 中调用除 (click)="method()" 之外的方法
- java - 如何在 Firebase 中更改未知父级的子级值 - Android
- python - 当 python 程序由 shell 脚本运行时,如何调试它?