sql - Impala Last_Value() 未按预期给出结果
问题描述
我在 Impala 中有一个表,其中有 Unix-Time 的时间信息(频率为 1 毫秒)和三个变量的信息,如下所示:
ts Val1 Val2 Val3
1.60669E+12 7541.76 0.55964607 267.1613
1.60669E+12 7543.04 0.5607262 267.27805
1.60669E+12 7543.04 0.5607241 267.22308
1.60669E+12 7543.6797 0.56109643 267.25974
1.60669E+12 7543.6797 0.56107396 267.30624
1.60669E+12 7543.6797 0.56170875 267.2643
我想重新采样数据并获取新时间窗口的最后一个值。例如,如果我想重新采样为 10Sec 频率,则输出应该是 10Sec 窗口的最后一个值,如下所示:
ts val1_Last Val2_Last Val3_Last
2020-11-29 22:30:00 7541.76 0.55964607 267.1613
2020-11-29 22:30:10 7542.3994 0.5613486 267.31238
2020-11-29 22:30:20 7542.3994 0.5601791 267.22842
2020-11-29 22:30:30 7544.32 0.56069416 267.20248
为了得到这个结果,我正在运行以下查询:
select distinct *
from (
select ts,
last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,
last_value(Val2) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val2,
last_value(Val3) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
Val2 as Val2,
Val3 as Val3
FROM Sensor_Data.Table where unit='Unit1'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
我在一些有时会导致问题的论坛上阅读过,所以我尝试使用withLAST_VALUE()
来实现相同的目的。查询如下:FIRST_VALUE
ORDER BY DESC
select distinct *
from (
select ts,
first_value(Val1) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val1,
first_value(Val2) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val2,
first_value(Val3) over (partition by ts order by ts desc rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts ,
Val1 as Val1,
val2 as Val2,
Val3 as Val3
FROM product_sofcdtw_ops.as_operated_full_backup where unit='FCS05-09'
and cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00') as ttt) as tttt
order by ts
但在这两种情况下,我都没有得到预期的结果。重新采样的时间ts
按预期出现(窗口为 10 秒),但我得到了0-9 秒、10-19 秒、...窗口之间Val1
的Val2
随机值。Val3
从逻辑上讲,这个查询看起来不错,我没有发现任何问题。任何人都可以解释为什么我没有使用这个查询得到正确的答案。
谢谢 !!!
解决方案
问题是这一行:
last_value(Val1) over (partition by ts order by ts rows between unbounded preceding and unbounded following) as Val1,
您正在按同一列进行分区和排序,ts
因此没有排序(或者更具体地说,按整个分区中恒定的值排序会导致任意排序)。您需要保留原始ts 以使其工作,使用它进行订购:
select ts,
last_value(Val1) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val1,
last_value(Val2) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val2,
last_value(Val3) over (partition by ts_10 order by ts rows between unbounded preceding and unbounded following) as Val3
from (SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/10 as bigint)*10 as TIMESTAMP) as ts_10,
t.*
FROM Sensor_Data.Table t
WHERE unit = 'Unit1' AND
cast(ts/1000 as TIMESTAMP) BETWEEN '2020-11-29 22:30:00' and '2020-12-01 01:51:00'
) t
顺便说一句,问题last_value()
在于当您忽略窗口框架(窗口函数规范的rows
或range
部分)时,它会出现意外行为。
问题是默认规范是range between unbounded preceding and current row
,这意味着last_value()
它只获取当前行中的值。
另一方面,first_value()
使用默认框架可以正常工作。但是,如果您包含显式框架,则两者是等效的。
推荐阅读
- java - 如何从 Java 中的 Scala 包中导入单例对象?
- typescript - 打字稿中有没有办法确保函数的输入签名与输出签名相同?
- java - WildFly Server 20 中的 Spring 5 正确配置
- arrays - Elimante 如何在多个数组中复制数据
- regex - 正则表达式在句子中提取两个字符串
- angular - 订阅结果时未捕获来自 HTTP 调用的错误,但在我的 HTTP 拦截器中捕获了错误
- mysql - mysql更新当前数据
- java - 为什么我不能再将布局设置为 GridBagLayout?
- c# - Trouble binding C# list to DropDownList in ASP.net
- paypal - PayPal IPN empty when buying by card