首页 > 解决方案 > S3 select - 如何通过非标准时间戳比较查询

问题描述

我正在使用 S3 存储桶,其中数据按 ID 和年/月组织到文件中——这意味着每个 ID 和月一个文件。在每个 (csv.gz) 文件中,每条记录都有一个时间戳,格式为:(YYYY-MM-dd HH:mm:ss注意缺少T)。

现在,在查询数据时,我希望将日期时间粒度支持到秒,因此自然希望在使用 Python 管理数据之前就已经在 S3 中过滤数据。

但是我找不到任何方法来做到这一点。该函数TO_TIMESTAMP不支持用户提供的格式(需要T日期/时间分隔符)并且组合 SUBSTRING 和 CAST ( CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)) 会产生The query cannot be evaluated错误。

有没有办法解决?文档指出该函数TO_TIMESTAMP是“TO_STRING 的逆运算”,这并不完全正确,因为后者支持time_format_pattern.

标签: amazon-web-servicesamazon-s3amazon-s3-select

解决方案


Having same issue over here, I went an step over and change my csv file to grant date field with require format by timestamp date type in S3 Select.The requiere format is described here S3 data types

So first, in order to response the question, based on S3 Select documentation, I think is not possible to work with a date without T at the end. By the time you correct that, you will be able to work with CAST function. Next is what I do:

select * from s3object as s where CAST('2020-01-01T' AS TIMESTAMP) < CAST('2021-01-01T' AS TIMESTAMP)

That works just okay, however as you can see, I'm not passing s."Date" which is the field header in my csv file due to following error:

Attempt to convert from one data type to another failed at line 1, column 39: cast from STRING to TIMESTAMP.

I hope have been help a little bit, and hope someone can help with this error.


推荐阅读