首页 > 解决方案 > Spark数据框仅按日期部分过滤时间戳

问题描述

如何过滤具有时间戳类型列但仅按日期部分过滤的火花数据框。我在下面尝试过,但它仅在时间为 00:00:00 时匹配。

基本上我希望过滤器与日期匹配所有行2020-01-01(3行)

import java.sql.Timestamp

val df = Seq(
  (1, Timestamp.valueOf("2020-01-01 23:00:01")),
  (2, Timestamp.valueOf("2020-01-01 00:00:00")),
  (3, Timestamp.valueOf("2020-01-01 12:54:00")),
  (4, Timestamp.valueOf("2019-12-15 09:54:00")),
  (5, Timestamp.valueOf("2019-12-09 10:12:43"))
).toDF("someCol","someTimeStamp")

df.filter(df("someTimeStamp") === "2020-01-01").show

+-------+-------------------+
|someCol|      someTimeStamp|
+-------+-------------------+
|      2|2020-01-01 00:00:00|   // ONLY MATCHED with time 00:00
+-------+-------------------+

标签: scalaapache-spark

解决方案


使用to_date函数从时间戳中提取日期:

scala> df.filter(to_date(df("someTimeStamp")) === "2020-01-01").show
+-------+-------------------+
|someCol|      someTimeStamp|
+-------+-------------------+
|      1|2020-01-01 23:00:01|
|      2|2020-01-01 00:00:00|
|      3|2020-01-01 12:54:00|
+-------+-------------------+

推荐阅读