python - Spark 中的 ROWS BETWEEN 1 PRECEDING 和 1 PRECEDING 是什么?
问题描述
我正在尝试在列上使用 rowsBetween(-1, 0) ,但它似乎不起作用。但是当我使用滞后时它工作正常。
我的问题是两者lag
是否rowsBetween(-1,0)
相同?
更新:
我使用 rowsBetween 的原因是我正在尝试转换Teradata [ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING]
lag
相当于ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
?_
示例代码:
df.show()
w = Window.partitionBy(df['ResId']).orderBy(df['vrsn_strt_dts']).rowsBetween(-1, 0)
lagWin = Window.partitionBy(df['ResId']).orderBy(df['vrsn_strt_dts'])
df.withColumn("vrsn_end_dts", max(df['vrsn_strt_dts']).over(w)).show()
df.withColumn("vrsn_end_dts", lag(df['vrsn_strt_dts']).over(lagWin)).show()
输出:
+--------+-------------------+---------+------------+-------+----------+
| ResId| vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt|
+--------+-------------------+---------+------------+-------+----------+
|27608171|2018-07-17 04:00:00| Standard| 11B| 1000| 10|
|27608174|2018-08-17 04:00:00| Standard| 11C| 2000| 20|
|27608173|2018-09-17 04:00:00| Standard| 11D| 3000| 30|
|27608171|2018-09-17 04:00:00| Standard| 11B| 1000| 40|
|27608174|2018-09-17 04:00:00| Standard| 11D| 5000| 50|
|27608171|2018-09-18 04:00:00| Standard| 11B| 1000| 10|
|27608171|2018-09-19 04:00:00| Standard| 11B| 1000| 10|
+--------+-------------------+---------+------------+-------+----------+
与输出之间的行
+--------+-------------------+---------+------------+-------+----------+-------------------+
| ResId| vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt| vrsn_end_dts|
+--------+-------------------+---------+------------+-------+----------+-------------------+
|27608171|2018-07-17 04:00:00| Standard| 11B| 1000| 10|2018-07-17 04:00:00|
|27608171|2018-09-17 04:00:00| Standard| 11B| 1000| 40|2018-09-17 04:00:00|
|27608171|2018-09-18 04:00:00| Standard| 11B| 1000| 10|2018-09-18 04:00:00|
|27608171|2018-09-19 04:00:00| Standard| 11B| 1000| 10|2018-09-19 04:00:00|
|27608174|2018-08-17 04:00:00| Standard| 11C| 2000| 20|2018-08-17 04:00:00|
|27608174|2018-09-17 04:00:00| Standard| 11D| 5000| 50|2018-09-17 04:00:00|
|27608173|2018-09-17 04:00:00| Standard| 11D| 3000| 30|2018-09-17 04:00:00|
+--------+-------------------+---------+------------+-------+----------+-------------------+
有滞后输出
+--------+-------------------+---------+------------+-------+----------+-------------------+
| ResId| vrsn_strt_dts|instnc_nm|ship_strm_nb|vyge_id|onhand_cnt| vrsn_end_dts|
+--------+-------------------+---------+------------+-------+----------+-------------------+
|27608171|2018-07-17 04:00:00| Standard| 11B| 1000| 10| null|
|27608171|2018-09-17 04:00:00| Standard| 11B| 1000| 40|2018-07-17 04:00:00|
|27608171|2018-09-18 04:00:00| Standard| 11B| 1000| 10|2018-09-17 04:00:00|
|27608171|2018-09-19 04:00:00| Standard| 11B| 1000| 10|2018-09-18 04:00:00|
|27608174|2018-08-17 04:00:00| Standard| 11C| 2000| 20| null|
|27608174|2018-09-17 04:00:00| Standard| 11D| 5000| 50|2018-08-17 04:00:00|
|27608173|2018-09-17 04:00:00| Standard| 11D| 3000| 30| null|
解决方案
推荐阅读
- javascript - 使用 ReactJS 时如何将 JWT 存储在 cookie 中
- c# - 带有角度错误的 web api
- javascript - Stylelint 作为 npm 脚本静默失败
- java - 如何从 Firebase (java) 中检索最后一个孩子?
- javascript - 插入 4 个空格而不是制表符
- python - 将 Pandas DataFrame 保存为没有 pdfkit 的 PDF 文件格式
- java - 表 'acme_ms.hibernate_sequence' 不存在
- python - 如何为一组 xml 文档添加新的 xml 属性?
- windows - 创建 VBS,使用当前用户的用户名、计算机名称和多个 IP 地址创建一个消息框(有线/无线)
- go - 如何在 Gin 中使用 go-bindata?