首页 > 解决方案 > 以天为单位触发 sql datediff

问题描述

我正在尝试从表中计算 current_timestamp() 和 max(timestamp_field) 之间的天数。

maxModifiedDate = spark.sql("select date_format(max(lastmodifieddate), 'MM/dd/yyyy hh:mm:ss') as maxModifiedDate,date_format(current_timestamp(),'MM/dd/yyyy hh:mm:ss') as CurrentTimeStamp, datediff(current_timestamp(), date_format(max(lastmodifieddate), 'MM/dd/yyyy hh:mm:ss')) as daysDiff from db.tbl")

但我在 daysDiff 中得到了空值。为什么会这样,我该如何解决?

------------------+-------------------+--------+
|    maxModifiedDate|   CurrentTimeStamp|daysDiff|
+-------------------+-------------------+--------+
|01/29/2020 05:07:51|06/29/2020 08:36:28|    null|
+-------------------+-------------------+--------+

标签: pysparkapache-spark-sql

解决方案


检查一下:我曾经to_timestamp转换为日期格式并使用datediff函数来计算时间差。

    from pyspark.sql import functions as F

    # InputDF
    # +-------------------+-------------------+
    # |    maxModifiedDate|   CurrentTimeStamp|
    # +-------------------+-------------------+
    # |01/29/2020 05:07:51|06/29/2020 08:36:28|
    # +-------------------+-------------------+


    df.select("maxModifiedDate","CurrentTimeStamp",F.datediff( F.to_timestamp("CurrentTimeStamp", format= 'MM/dd/yyyy'), F.to_timestamp("maxModifiedDate", format= 'MM/dd/yyyy')).alias("datediff")).show()


    # +-------------------+-------------------+--------+
    # |    maxModifiedDate|   CurrentTimeStamp|datediff|
    # +-------------------+-------------------+--------+
    # |01/29/2020 05:07:51|06/29/2020 08:36:28|    152|
    # +-------------------+-------------------+--------+

使用sparksql

spark.sql("select maxModifiedDate,CurrentTimeStamp, datediff(to_timestamp(CurrentTimeStamp,  'MM/dd/yyyy'), to_timestamp(maxModifiedDate, 'MM/dd/yyyy')) as datediff from table ").show()

推荐阅读