首页 > 解决方案 > 如何在 Spark SQL 中将时间戳列转换为毫秒长列

问题描述

TimestampSpark SQL中将列转换为毫秒时间戳列的最短和最有效的方法是Long什么?

这是从时间戳到毫秒的转换示例

scala> val ts = spark.sql("SELECT now() as ts")
ts: org.apache.spark.sql.DataFrame = [ts: timestamp]

scala> ts.show(false)
+-----------------------+                                                       
|ts                     |
+-----------------------+
|2019-06-18 12:32:02.41 |
+-----------------------+

scala> val tss = ts.selectExpr(
 |   "ts",
 |   "BIGINT(ts) as seconds_ts",
 |   "BIGINT(ts) * 1000 + BIGINT(date_format(ts, 'SSS')) as millis_ts"
 | )
tss: org.apache.spark.sql.DataFrame = [ts: timestamp, seconds_ts: bigint ... 1 more field]

scala> tss.show(false)
+----------------------+----------+-------------+                               
|ts                    |seconds_ts|millis_ts    |
+----------------------+----------+-------------+
|2019-06-18 12:32:02.41|1560861122|1560861122410|
+----------------------+----------+-------------+

如您所见,从时间戳获取毫秒的最直接方法不起作用 - 转换为长返回秒,但是时间戳中的毫秒信息被保留。

我发现提取毫秒信息的唯一方法是使用date_formatfunction ,这并不像我想象的那么简单。

有人知道从Timestamp列中获得毫秒 UNIX 时间的方法比这更简单吗?

标签: apache-sparkapache-spark-sql

解决方案


根据 Spark 的DateTimeUtils上的代码:

“时间戳对外暴露为java.sql.Timestamp,内部存储为longs,能够以微秒级精度存储时间戳。”

因此,如果您定义一个具有 ajava.sql.Timestamp作为输入的 UDF,您可以简单地getTime以毫秒为单位调用 Long。

val tsConversionToLongUdf = udf((ts: java.sql.Timestamp) => ts.getTime)

将此应用于各种时间戳:

val df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.111", "2017-01-18 11:00:00.110", "2017-01-18 11:00:00.100")
  .toDF("timestampString")
  .withColumn("timestamp", to_timestamp(col("timestampString")))
  .withColumn("timestampConversionToLong", tsConversionToLongUdf(col("timestamp")))
  .withColumn("timestampCastAsLong", col("timestamp").cast(LongType))

df.printSchema()
df.show(false)

// returns
root
 |-- timestampString: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampConversionToLong: long (nullable = false)
 |-- timestampCastAsLong: long (nullable = true)

+-----------------------+-----------------------+-------------------------+-------------------+
|timestampString        |timestamp              |timestampConversionToLong|timestampCastAsLong|
+-----------------------+-----------------------+-------------------------+-------------------+
|2017-01-18 11:00:00.000|2017-01-18 11:00:00    |1484733600000            |1484733600         |
|2017-01-18 11:00:00.111|2017-01-18 11:00:00.111|1484733600111            |1484733600         |
|2017-01-18 11:00:00.110|2017-01-18 11:00:00.11 |1484733600110            |1484733600         |
|2017-01-18 11:00:00.100|2017-01-18 11:00:00.1  |1484733600100            |1484733600         |
+-----------------------+-----------------------+-------------------------+-------------------+

请注意,“timestampCastAsLong”列仅显示直接转换为 aLong不会以毫秒为单位返回所需的结果,而只会以秒为单位。


推荐阅读