首页 > 解决方案 > Spark 中的 ROWS BETWEEN 1 PRECEDING 和 1 PRECEDING 是什么?

问题描述

我正在尝试在列上使用 rowsBetween(-1, 0) ,但它似乎不起作用。但是当我使用滞后时它工作正常。

我的问题是两者lag是否rowsBetween(-1,0)相同?

更新: 我使用 rowsBetween 的原因是我正在尝试转换Teradata [ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING]

lag相当于ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING?_

示例代码:

df.show()
w = Window.partitionBy(df['ResId']).orderBy(df['vrsn_strt_dts']).rowsBetween(-1, 0)
lagWin = Window.partitionBy(df['ResId']).orderBy(df['vrsn_strt_dts'])
df.withColumn("vrsn_end_dts", max(df['vrsn_strt_dts']).over(w)).show()
df.withColumn("vrsn_end_dts", lag(df['vrsn_strt_dts']).over(lagWin)).show()

输出:

+--------+-------------------+---------+------------+-------+----------+
|   ResId|      vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt|
+--------+-------------------+---------+------------+-------+----------+
|27608171|2018-07-17 04:00:00| Standard|         11B|   1000|        10|
|27608174|2018-08-17 04:00:00| Standard|         11C|   2000|        20|
|27608173|2018-09-17 04:00:00| Standard|         11D|   3000|        30|
|27608171|2018-09-17 04:00:00| Standard|         11B|   1000|        40|
|27608174|2018-09-17 04:00:00| Standard|         11D|   5000|        50|
|27608171|2018-09-18 04:00:00| Standard|         11B|   1000|        10|
|27608171|2018-09-19 04:00:00| Standard|         11B|   1000|        10|
+--------+-------------------+---------+------------+-------+----------+

与输出之间的行

+--------+-------------------+---------+------------+-------+----------+-------------------+
|   ResId|      vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt|       vrsn_end_dts|
+--------+-------------------+---------+------------+-------+----------+-------------------+
|27608171|2018-07-17 04:00:00| Standard|         11B|   1000|        10|2018-07-17 04:00:00|
|27608171|2018-09-17 04:00:00| Standard|         11B|   1000|        40|2018-09-17 04:00:00|
|27608171|2018-09-18 04:00:00| Standard|         11B|   1000|        10|2018-09-18 04:00:00|
|27608171|2018-09-19 04:00:00| Standard|         11B|   1000|        10|2018-09-19 04:00:00|
|27608174|2018-08-17 04:00:00| Standard|         11C|   2000|        20|2018-08-17 04:00:00|
|27608174|2018-09-17 04:00:00| Standard|         11D|   5000|        50|2018-09-17 04:00:00|
|27608173|2018-09-17 04:00:00| Standard|         11D|   3000|        30|2018-09-17 04:00:00|
+--------+-------------------+---------+------------+-------+----------+-------------------+

有滞后输出

+--------+-------------------+---------+------------+-------+----------+-------------------+
|   ResId|      vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt|       vrsn_end_dts|
+--------+-------------------+---------+------------+-------+----------+-------------------+
|27608171|2018-07-17 04:00:00| Standard|         11B|   1000|        10|               null|
|27608171|2018-09-17 04:00:00| Standard|         11B|   1000|        40|2018-07-17 04:00:00|
|27608171|2018-09-18 04:00:00| Standard|         11B|   1000|        10|2018-09-17 04:00:00|
|27608171|2018-09-19 04:00:00| Standard|         11B|   1000|        10|2018-09-18 04:00:00|
|27608174|2018-08-17 04:00:00| Standard|         11C|   2000|        20|               null|
|27608174|2018-09-17 04:00:00| Standard|         11D|   5000|        50|2018-08-17 04:00:00|
|27608173|2018-09-17 04:00:00| Standard|         11D|   3000|        30|               null|

标签: pythonpysparkapache-spark-sql

解决方案


推荐阅读