r - 如何减少用于使用 set.seed() 和 sample() 创建可重现数据帧的代码?
问题描述
我想创建一个相当大且可重现的数据集Activity
,在 StackOverFlow 上提出一个问题。我的数据框将包含以下变量:
DateTime
:日期和时间,单位为毫秒,数据速率为每秒 11 个值,即每秒 11 行。ID
: 指个人。我想创建一个包含 3 个人( 和 )的数据的A
数据B
集C
。x
:随机数据,范围从-1到+1。y
:随机数据,范围从-1到+1。z
:从-1到+1的随机数据。
我最初使用此代码:
set.seed(100)
fmt <- "%Y-%m-%d %H:%M:%OS"
DateTime = seq(from=as.POSIXct("2017-08-05 14:03:55.300", format=fmt, tz="UTC"), by=1/11, length.out=67)
ID = rep("A", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity1<- data.frame(DateTime,ID, x, y, z)
DateTime = seq(from=as.POSIXct("2017-08-05 16:18:12.100", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("B", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity2<- data.frame(DateTime,ID, x, y, z)
DateTime = seq(from=as.POSIXct("2017-08-05 20:34:31.540", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("C", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity3<- data.frame(DateTime,ID, x, y, z)
Activity<- rbind(Activity1,Activity2,Activity3)
head(Activity)
DateTime ID x y z
1 2017-08-05 14:03:55.29999 A 0.01 0.82 -0.56
2 2017-08-05 14:03:55.39090 A 0.11 0.74 0.07
3 2017-08-05 14:03:55.48182 A 0.50 0.95 -0.64
4 2017-08-05 14:03:55.57273 A 0.97 -0.89 0.95
5 2017-08-05 14:03:55.66364 A -0.97 0.78 -0.01
6 2017-08-05 14:03:55.75454 A -0.46 0.20 1.00
如何使用更少的代码创建相同的数据框?我需要在 StackOverFlow 的另一篇文章中创建一个可重现的数据框,其他用户告诉我应该使用更少的代码来创建我的示例。
解决方案
有许多不同的方法可以达到相同的结果。这就是我使用我喜欢的工具要做的事情:
library(data.table)
# define parameters to control the process
base_data <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
, DateTime := lubridate::ymd_hms(DateTime)]
# expand sequences rowwise
Activity <- base_data[, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)),
by = .(rn = seq(nrow(base_data)), ID)][
, rn := NULL][]
# create x, y, z columns by sampling
cols <- c("x", "y", "z")
set.seed(100)
Activity[, (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)]
Activity
ID DateTime x y z 1: A 2017-08-05 14:03:55 -0.38 0.91 -0.28 2: A 2017-08-05 14:03:55 -0.48 0.83 -0.12 3: A 2017-08-05 14:03:55 0.10 0.65 0.61 4: A 2017-08-05 14:03:55 -0.89 -0.36 0.04 5: A 2017-08-05 14:03:55 -0.06 0.76 0.39 --- 197: C 2017-08-05 20:34:37 -0.76 -0.52 -0.81 198: C 2017-08-05 20:34:37 0.20 0.44 -0.59 199: C 2017-08-05 20:34:37 -0.76 -0.41 -0.94 200: C 2017-08-05 20:34:37 0.58 0.02 0.16 201: C 2017-08-05 20:34:37 -0.26 -0.44 -0.69
默认情况下不打印秒的小数部分,但可以通过以下方式验证 1/11 秒的增量
head(diff(Activity$DateTime))
Time differences in secs [1] 0.09090900 0.09090924 0.09090900 0.09090900 0.09090924 0.09090900
由于 OP没有要求用我替换的给定种子值准确地重现他的结果
sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
经过
round(runif(.N, -1, +1), 2)
如果sample()
有要求,seq()
可以跳过该部分
sample((-100:100)/100, .N, replace = TRUE)
使用data.table
链接代码可以更简洁地编写为
library(data.table)
cols <- c("x", "y", "z")
set.seed(100)
Activity <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
, DateTime := lubridate::ymd_hms(DateTime)][
, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)),
by = .(rn = seq(nrow(base_data)), ID)][
, (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)][
, rn := NULL][]
推荐阅读
- python - 通过 POST 在正文中发送数据 - Flask
- c# - 如何在 SQL 中创建一个列作为 .NET Core MVC 中的数据数组?
- nginx - 网络调用中带有 nginx 的 502 网关错误
- c# - 滚动到页面底部并检查是否在最后 Selenium C#
- amazon-dynamodb - 在 DynamoDb 中,有没有办法在编写项目之前自动检查项目的二级索引?
- sql - 复制 JSON 文件并从用户数据库加载
- java - Jacoco 综合报告显示 0% 覆盖率
- wordpress - 带有 WordPress 页面加载的 Docker 非常缓慢
- azure - 无法使用 Azure SSH Web 应用复制文件
- for-loop - 如何检查C++中输入行中是否有空格