首页 > 解决方案 > 需要帮助创建基于 R 中其他三列的计算的列

问题描述

我有一个这样的数据文件:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"), t = c(60,
120, 180, 240, 300, 360), activity = c(0L, 0L, 0L, 0L, 0L, 0L
), moving = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), asleep = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), Day = c(1, 1, 1, 1, 1, 1)), row.names = c(NA,
-6L), class = c("behavr", "data.table", "data.frame"), sorted = "id", .internal.selfref = <pointer: 0x0000019c94541ef0>, metadata = structure(list(
id = structure(1L, .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"),
file_info = list(list(path = "C:/Users/ariji/Desktop/ShinyWrapperForCircadianAnalysis/Monitor1.txt",
file = "Monitor1.txt")), region_id = 1L, experiment_id = "2020-07-26 00:00:00|Monitor1.txt",
start_datetime = structure(1595721600, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), stop_datetime = structure(1596326400, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), genotype = "Early", replicate = 1L,
uid = 1L), sorted = "id", class = c("data.table", "data.frame"
), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x0000019c94541ef0>))

完整文件在这里 - https://anonymousfiles.io/1vypEs9u/(阅读fread)。我需要做的是 - 在这个 data.table 中创建另一列,称为noramct. 中的值noramct应该是(activity/sum of allactivity那天(1,2,3...7)。这必须由每个人按列完成id。所以基本上,对于每个id,我想要一个标准化的活动(即特定idactivity除以特定日期的活动id)。记住一天活动的总和有两个级别,都activity来自一个特定的 id 和一个特定day的,这可能会令人困惑,因为一天的活动activity将有多个计数ids。任何帮助将不胜感激!感谢期待。Reddit上也有人问过这个问题。

标签: rdata.table

解决方案


不确定这是否是您的意思,但我认为首先为每个观察创建分母就足够了(我理解它是其对应的 id 和日期的总活动),然后简单地将每个值除以其对应的分母。幸运的是,这在 data.table 中非常简单:

data[, day_activity_for_id := sum(activity), by = .(id, day)
   ][, noramct := activity/day_activity_for_id]

另外,对下次的友好建议:如果您向我们展示桌子头部的打印件,则更容易理解您的问题,而不是繁琐的结构!data.table 在您的控制台中非常干净地打印它

head(data)

推荐阅读