首页 > 解决方案 > 收集时间戳列

问题描述

当我执行以下语句时:

spark.sql("SELECT CAST('0001-01-01' AS TIMESTAMP)").show()

我得到:

CAST(0001-01-01 作为时间戳)
0001-01-01 00:00:00

但是当我使用时spark.sql("SELECT CAST('0001-01-01' AS TIMESTAMP)").collect()出现以下错误:

Fail to execute line 1: spark.sql("SELECT CAST('0001-01-01' AS TIMESTAMP)").collect()
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-6127737743421449115.py", line 380, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 535, in collect
    return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 147, in load_stream
    yield self._read_with_length(stream)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 172, in _read_with_length
    return self.loads(obj)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 580, in loads
    return pickle.loads(obj, encoding=encoding)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 1396, in <lambda>
    return lambda *a: dataType.fromInternal(a)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 633, in fromInternal
    for f, v, c in zip(self.fields, obj, self._needConversion)]
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 633, in <listcomp>
    for f, v, c in zip(self.fields, obj, self._needConversion)]
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 445, in fromInternal
    return self.dataType.fromInternal(obj)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 199, in fromInternal
    return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
ValueError: year 0 is out of range

标签: pythonpyspark

解决方案


在您的时区中,0001-01-01将被视为0000-12-31日期时间包中的内部函数 ymd_to_ord() 的年份值为 0 无效。请参阅参考资料

这在 scala 版本的 spark 中不会发生。


推荐阅读