首页 > 解决方案 > 如何使用 Scala 在 Spark 中转换为时间戳,例如 2019-03-25T00:27:46.985-0500 到 2019-03-25 00:27:46

问题描述

我想将看起来像 2019-03-25T00:27:46.985-0500 的时间戳转换为这种格式 2019-03-25 00:27:46

使用 Spark v2.3.0 Scala v2.11.8

时间 ColA ColB ColC 2019-03-25T00:27:46.985-0500 ABC 2019-03-25T00:27:46.960-0500 ABC 2019-03-25T00:27:46.839-0500 ABC 2019-03-25T00:27:46.596- 0500 ABC 2019-03-25T00:27:46.559-0500 ABC 2019-03-25T00:27:46.535-0500 ABC 2019-03-25T00:27:46.453-0500 ABC 2019-03-25T00:27:46.405-5 2019-03-25T00:27:46.393-0500 ABC

val log = spark.read.format("csv")
      .option("inferSchema", "true")
      .option("header", "true")
      .option("sep", ",")
      .option("quote", "\"")
      .option("multiLine", "true")
      .load("time.csv")

scala> log.printSchema
root
 |-- time: string (nullable = true)
 |-- ColA: string (nullable = true)
 |-- ColB: string (nullable = true)
 |-- ColC: string (nullable = true)

val logs = log.withColumn("Id", monotonicallyIncreasingId()+1)
val df = spark.sql("select Id, time, ColA from logs")

输入:2019-03-25T00:27:46.985-05:00 预期输出:2019-03-25 00:27:46

标签: scalaapache-spark-sqltimestamp-with-timezone

解决方案


您可以将 .selectExpr 与 date_format 函数一起使用

val log2 = log.selectExpr(
    "date_format(time, 'yyyy-MM-dd HH:mm:ss')"
    )

推荐阅读