r - 如何选择仅包含 r 中常见日期的数据?
问题描述
我有一个包含两个城市的列表,城市有 5 列的数据框。这是一个例子:
date nh1_min nh2_min nh3_min cbh
2020-10-28 01:00:00 1 99999 99999 6
2020-10-28 02:00:00 1 99999 99999 6
2020-10-28 03:00:00 1 99999 99999 6
2020-10-28 04:00:00 1 99999 99999 6
2020-10-28 05:00:00 1 99999 99999 6
2020-10-28 06:00:00 1 99999 99999 6
2020-10-28 07:00:00 1 99999 99999 6
2020-10-28 08:00:00 1 99999 99999 6
2020-11-02 04:00:00 1 99999 99999 6
date nh1_min nh2_min nh3_min cbh
2020-10-25 07:00:00 1 99999 99999 6
2020-10-28 00:00:00 1 99999 99999 6
2020-10-28 01:00:00 1 99999 99999 6
2020-10-28 02:00:00 1 99999 99999 6
2020-10-28 03:00:00 1 99999 99999 6
2020-10-28 06:00:00 1 99999 99999 6
2020-10-28 07:00:00 1 99999 99999 6
2020-10-28 08:00:00 1 99999 99999 6
2020-11-02 06:00:00 1 99999 99999 6
当日期相同时,我必须选择该数据。这里例如 2020-10-28 02:00:00、2020-10-28 03:00:00 等。
所以最后我会得到两个较短的数据框,只有共同的日期。如何以这种方式选择数据?我尝试了如此复杂的 if 和 for 公式,但没有奏效。
(在我的真实项目中,我有 10 个城市和更多的日期,它只是一个较短的形式)
示例列表:
list(structure(list(ido = structure(c(1591840800, 1591858800, 1592449200, 1592452800, 1592456400, 1592463600, 1595120400, 1602529200, 1602835200, 1602993600, 1603602000, 1603609200, 1603843200, 1603846800, 1603850400, 1603854000, 1603864800, 1603868400, 1603872000, 1604296800, 1604332800, 1604358000, 1604383200, 1604430000, 1604703600, 1605290400, 1605297600, 1605301200, 1605502800, 1605510000, 1605819600, 1605823200, 1605826800, 1605830400, 1605834000, 1605837600, 1605841200, 1605844800, 1605852000, 1605859200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), nh1_min = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), nh2_min = c(6766, 662, 4572, 3720, 3635, 3737, 2915, 1144, 654, 99999, 778, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 540, 99999, 99999, 629, 1225, 971, 947, 508, 2154, 2128, 1059, 99999, 99999, 483, 390, 999, 99999, 99999, 99999, 1308), nh3_min = c(99999, 99999, 99999, 4200, 99999, 99999, 99999, 2332, 99999, 99999, 1105, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 6371, 99999, 99999, 99999, 667, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 2804), cbh = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6)), row.names = c(643L, 648L, 770L, 771L, 772L, 774L, 844L, 1011L, 1099L, 1145L, 1321L, 1323L, 1391L, 1392L, 1393L, 1394L, 1397L, 1398L, 1399L, 1510L, 1521L, 1528L, 1535L, 1549L, 1628L, 1798L, 1800L, 1801L, 1859L, 1861L, 1951L, 1952L, 1953L, 1954L, 1955L, 1956L, 1957L, 1958L, 1960L, 1962L), class = "data.frame"), structure(list(ido = structure(c(1581706800, 1581717600, 1581742800, 1581746400, 1603846800, 1603850400, 1603854000, 1603857600, 1603861200, 1603864800, 1603868400, 1603872000, 1604289600, 1604696400, 1604700000, 1604703600, 1605232800, 1605301200, 1605308400, 1605333600, 1605506400, 1605510000, 1605826800, 1605830400, 1605834000, 1605841200, 1605844800, 1607389200, 1607392800, 1607396400, 1607896800, 1607900400, 1607904000, 1608156000, 1608163200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), nh1_min = c(1, 1, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), nh2_min = c(1576, 99999, 1641, 1615, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 508, 321, 163, 99999, 99999, 1508, 99999, 99999, 99999, 420, 1084, 99999, 1070, 324, 253, 99999, 99999, 449, 473 ), nh3_min = c(99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 871, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 638, 99999, 99999, 99999, 99999), cbh = c(6, 6, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6)), row.names = c(215L, 218L, 225L, 226L, 1392L, 1393L, 1394L, 1395L, 1396L, 1397L, 1398L, 1399L, 1508L, 1626L, 1627L, 1628L, 1781L, 1801L, 1803L, 1810L, 1860L, 1861L, 1953L, 1954L, 1955L, 1957L, 1958L, 2155L, 2156L, 2157L, 2302L, 2303L, 2304L, 2377L, 2379L), class = "data.frame"))
解决方案
从列表中创建两个数据框,创建一个日期列,查找公共日期并子集两个数据框以仅保留具有公共日期的行。
one <- data[[1]]
two <- data[[2]]
one <- transform(one, date = as.Date(ido))
two <- transform(two, date = as.Date(ido))
common_dates <- as.Date(intersect(one$date, two$date), origin = '1970-01-01')
one <- subset(one, date %in% common_dates)
two <- subset(two, date %in% common_dates)
对于多个数据框,我们可以将数据保存在列表本身中。
common_dates <- as.Date(Reduce(intersect, lapply(data, function(x)
as.Date(x$ido))), origin = '1970-01-01')
data <- lapply(data, function(x) subset(x, as.Date(ido) %in% common_dates))
推荐阅读
- android - 如何使用用户数据移动 android AVD?
- javascript - 当我运行“node index.js”启动我的 Discord Bot 时,它显示:找不到模块。我该如何解决?正文中提供了更多信息
- oracle - 如何从 AWS Certificate Manager for Oracle 数据库获取 rootca 以启用 TLS
- aws-lambda - DynamoDb 使用排序键删除
- powershell - 为什么我不能安装 PSReadLine
- node.js - 节点模块没有找到自己的依赖项
- r - 如何根据值在行号之间标记组中的行?
- css - 在具有几个默认值的 CSS 变量的 calc() 上发布 CSS 解析错误
- reverse-engineering - 使用 Angr 在基本块中查找字符串
- msix - AppInstaller 在将 MainBundle 指向 HTTPS URI 时失败,但在指向 FILE:/// URI 时有效