首页 > 解决方案 > 如何在具有最大范围的两个连续的 4 列中进行选择,并将其划分为在 4 列中的任何一个之间找到的最大范围

问题描述

df1总结了随着时间的推移不同的日期时间,并df2总结了随着时间的推移不同深度的不同水温。我想添加一个新列,df1称为Prop_rangeT特定日期时间,等于两个连续列之间的最大范围除以最大范围,考虑df2到相同日期时间的四列中的任何一列df1。举个例子:

df1<- data.frame(DateTime=c("2016-08-01 08:01:17","2016-08-01 09:17:14","2016-08-01 10:29:31","2016-08-01 11:35:02","2016-08-01 12:22:45","2016-08-01 13:19:27","2016-08-01 14:58:17","2016-08-01 15:30:10"))
df1$DateTime<- as.POSIXct(df1$DateTime, format = "%Y-%m-%d %H:%M:%S", tz= "UTC")
df2<- data.frame(DateTime=c("2016-08-01 08:00:00","2016-08-01 09:00:00","2016-08-01 10:00:00","2016-08-01 11:00:00","2016-08-01 12:00:00","2016-08-01 13:00:00","2016-08-01 14:00:00","2016-08-01 15:00:00"),T5=c(27.0,27.5,27.1,27.0,26.8,26.3,26.0,26.3),T15=c(23.0,23.4,23.1,22.7,22.5,21.5,22.0,22.3),T25=c(19.0,20.0,19.5,19.6,16.0,16.3,16.2,16.7),T35=c(16.0,16.0,16.5,16.7,16.3,16.7,16.9,16.7))
df2$DateTime<- as.POSIXct(df2$DateTime, format = "%Y-%m-%d %H:%M:%S", tz= "UTC")

df1
             DateTime
1 2016-08-01 08:01:17
2 2016-08-01 09:17:14
3 2016-08-01 10:29:31
4 2016-08-01 11:35:02
5 2016-08-01 12:22:45
6 2016-08-01 13:19:27
7 2016-08-01 14:58:17
8 2016-08-01 15:30:10

df2
             DateTime   T5  T15  T25  T35
1 2016-08-01 08:00:00 27.0 23.0 19.0 16.0 # Here max range is between T35 ans T5 (11) and the max range between two consecutive columns is either T15 and T5 or T25 and T15 (4).
2 2016-08-01 09:00:00 27.5 23.4 20.0 16.0
3 2016-08-01 10:00:00 27.1 23.1 19.5 16.5
4 2016-08-01 11:00:00 27.0 22.7 19.6 16.7
5 2016-08-01 12:00:00 26.8 22.5 16.0 16.3
6 2016-08-01 13:00:00 26.3 21.5 16.3 16.7
7 2016-08-01 14:00:00 26.0 22.0 16.2 16.9 # In this case, max range is between T25 and T5 (9.8), and the max range between two consecutive columns correspond to T25 and T15 (5.8).
8 2016-08-01 15:00:00 26.3 22.3 16.7 16.7

我怎样才能得到df1$Prop_rangeT最简单的代码列?作为我想要得到的一个例子:

df1
             DateTime Prop_rangeT
1 2016-08-01 08:01:17   0.3636364 # For instance, this is 4/11
2 2016-08-01 09:17:14   0,3565217
3 2016-08-01 10:29:31   0,3773585
4 2016-08-01 11:35:02   0.4174757
5 2016-08-01 12:22:45   0.6018519
6 2016-08-01 13:19:27   0.5200000
7 2016-08-01 14:58:17   0.5918367 # For instance, this is 5.8/9.8
8 2016-08-01 15:30:10   0.5833333

标签: r

解决方案


我们可以用fuzzy_left_join

library(tidyverse)
library(fuzzyjoin)
df1 %>%
    fuzzy_left_join(
        df2 %>%
            gather(key, val, -DateTime) %>%
            group_by(DateTime) %>%
            arrange(DateTime) %>%
            summarise(ratio = max(abs(c(0, diff(val))) / max(abs(diff(combn(x, 2)))))),
        by = "DateTime", match_fun = list(`>`)) %>%
    group_by(DateTime.x) %>%
    filter(DateTime.x - DateTime.y == min(DateTime.x - DateTime.y))
## A tibble: 8 x 3
## Groups:   DateTime.x [8]
#  DateTime.x          DateTime.y          ratio
#  <dttm>              <dttm>              <dbl>
#1 2016-08-01 08:01:17 2016-08-01 08:00:00 0.364
#2 2016-08-01 09:17:14 2016-08-01 09:00:00 0.373
#3 2016-08-01 10:29:31 2016-08-01 10:00:00 0.364
#4 2016-08-01 11:35:02 2016-08-01 11:00:00 0.391
#5 2016-08-01 12:22:45 2016-08-01 12:00:00 0.591
#6 2016-08-01 13:19:27 2016-08-01 13:00:00 0.473
#7 2016-08-01 14:58:17 2016-08-01 14:00:00 0.527
#8 2016-08-01 15:30:10 2016-08-01 15:00:00 0.509

说明:我们从宽到长重塑df2,计算连续深度(绝对)温差的最大比值和任意深度测量的最大温差。其余的是模糊连接,我们只为每个inDateTime选择日期最近的那些条目。DateTimedf1


推荐阅读