首页 > 解决方案 > How can I make my Athena SQL query faster

问题描述

I am running this on AWS Athena based on PrestoDB. My original plan was to query data 3 months in the past to analyze that data. However, even the query times for 2 hours in the past takes more than 30 minutes, at which point the Query times out. Is there any more efficient way for the query to be carried out?

SELECT column1, dt, column 2
FROM database1
WHERE date_parse(dt, '%Y%m%d%H%i%s') > CAST(now() - interval '1' hour AS timestamp)

The date column is recorded in the form of a string YYYYmmddhhmmss

标签: stringdatetimequery-optimizationwhere-clausepresto

解决方案


Likely, the problem is that the query applies a function on the column being filtered. This is inefficient, becase the database needs to convert the entire column before it is able to filter it. One says that this predicate is non-SARGable.

Your primary effort should go into fixing your data model and store dates as dates rather than strings.

That said, the string format that you are using to represent dates still makes it possible to use direct filtering. The idea is to convert the filter value to the target string format (rather than converting the column value to a date):

where dt > date_format(now() - interval '1' hour, '%Y%m%d%H%i%s')

推荐阅读