首页 > 解决方案 > 如何选择仅包含 r 中常见日期的数据?

问题描述

我有一个包含两个城市的列表,城市有 5 列的数据框。这是一个例子:

        date        nh1_min  nh2_min  nh3_min    cbh

2020-10-28 01:00:00   1      99999     99999      6
2020-10-28 02:00:00   1      99999     99999      6
2020-10-28 03:00:00   1      99999     99999      6
2020-10-28 04:00:00   1      99999     99999      6
2020-10-28 05:00:00   1      99999     99999      6
2020-10-28 06:00:00   1      99999     99999      6
2020-10-28 07:00:00   1      99999     99999      6
2020-10-28 08:00:00   1      99999     99999      6
2020-11-02 04:00:00   1      99999     99999      6

    
        date        nh1_min  nh2_min  nh3_min    cbh

2020-10-25 07:00:00  1       99999     99999      6
2020-10-28 00:00:00  1       99999     99999      6
2020-10-28 01:00:00  1       99999     99999      6
2020-10-28 02:00:00  1       99999     99999      6
2020-10-28 03:00:00  1       99999     99999      6
2020-10-28 06:00:00  1       99999     99999      6
2020-10-28 07:00:00  1       99999     99999      6
2020-10-28 08:00:00  1       99999     99999      6
2020-11-02 06:00:00  1       99999     99999      6  

当日期相同时,我必须选择该数据。这里例如 2020-10-28 02:00:00、2020-10-28 03:00:00 等。
所以最后我会得到两个较短的数据框,只有共同的日期。如何以这种方式选择数据?我尝试了如此复杂的 if 和 for 公式,但没有奏效。
(在我的真实项目中,我有 10 个城市和更多的日期,它只是一个较短的形式)

示例列表:
list(structure(list(ido = structure(c(1591840800, 1591858800, 1592449200, 1592452800, 1592456400, 1592463600, 1595120400, 1602529200, 1602835200, 1602993600, 1603602000, 1603609200, 1603843200, 1603846800, 1603850400, 1603854000, 1603864800, 1603868400, 1603872000, 1604296800, 1604332800, 1604358000, 1604383200, 1604430000, 1604703600, 1605290400, 1605297600, 1605301200, 1605502800, 1605510000, 1605819600, 1605823200, 1605826800, 1605830400, 1605834000, 1605837600, 1605841200, 1605844800, 1605852000, 1605859200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), nh1_min = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), nh2_min = c(6766, 662, 4572, 3720, 3635, 3737, 2915, 1144, 654, 99999, 778, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 540, 99999, 99999, 629, 1225, 971, 947, 508, 2154, 2128, 1059, 99999, 99999, 483, 390, 999, 99999, 99999, 99999, 1308), nh3_min = c(99999, 99999, 99999, 4200, 99999, 99999, 99999, 2332, 99999, 99999, 1105, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 6371, 99999, 99999, 99999, 667, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 2804), cbh = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6)), row.names = c(643L, 648L, 770L, 771L, 772L, 774L, 844L, 1011L, 1099L, 1145L, 1321L, 1323L, 1391L, 1392L, 1393L, 1394L, 1397L, 1398L, 1399L, 1510L, 1521L, 1528L, 1535L, 1549L, 1628L, 1798L, 1800L, 1801L, 1859L, 1861L, 1951L, 1952L, 1953L, 1954L, 1955L, 1956L, 1957L, 1958L, 1960L, 1962L), class = "data.frame"), structure(list(ido = structure(c(1581706800, 1581717600, 1581742800, 1581746400, 1603846800, 1603850400, 1603854000, 1603857600, 1603861200, 1603864800, 1603868400, 1603872000, 1604289600, 1604696400, 1604700000, 1604703600, 1605232800, 1605301200, 1605308400, 1605333600, 1605506400, 1605510000, 1605826800, 1605830400, 1605834000, 1605841200, 1605844800, 1607389200, 1607392800, 1607396400, 1607896800, 1607900400, 1607904000, 1608156000, 1608163200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), nh1_min = c(1, 1, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), nh2_min = c(1576, 99999, 1641, 1615, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 508, 321, 163, 99999, 99999, 1508, 99999, 99999, 99999, 420, 1084, 99999, 1070, 324, 253, 99999, 99999, 449, 473 ), nh3_min = c(99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 871, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 638, 99999, 99999, 99999, 99999), cbh = c(6, 6, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6)), row.names = c(215L, 218L, 225L, 226L, 1392L, 1393L, 1394L, 1395L, 1396L, 1397L, 1398L, 1399L, 1508L, 1626L, 1627L, 1628L, 1781L, 1801L, 1803L, 1810L, 1860L, 1861L, 1953L, 1954L, 1955L, 1957L, 1958L, 2155L, 2156L, 2157L, 2302L, 2303L, 2304L, 2377L, 2379L), class = "data.frame"))

标签: rselect

解决方案


从列表中创建两个数据框,创建一个日期列,查找公共日期并子集两个数据框以仅保留具有公共日期的行。

one <- data[[1]]
two <- data[[2]]
one <- transform(one, date = as.Date(ido))
two <- transform(two, date = as.Date(ido))
common_dates <- as.Date(intersect(one$date, two$date), origin = '1970-01-01')
one <- subset(one, date %in% common_dates)
two <- subset(two, date %in% common_dates)

对于多个数据框,我们可以将数据保存在列表本身中。

common_dates <- as.Date(Reduce(intersect, lapply(data, function(x) 
                        as.Date(x$ido))), origin = '1970-01-01')

data <- lapply(data, function(x) subset(x, as.Date(ido) %in% common_dates))

推荐阅读