首页 > 解决方案 > 我想用亚马逊雅典娜按日期总结 http 状态

问题描述

我可以像这样访问 Athena 中的日志。

time                  hostname    user_agent      http_status  col_0    col_1     col_2    col_3
2018-06-01T10:00:00   host01      Mozilla/5.0     200          2018     6         1        10
2019-06-01T10:00:00   host01      Mozilla/5.0     200          2019     6         1        10
2020-06-01T10:00:00   host01      Mozilla/5.0     404          2020     6         1        10
2021-06-08T10:00:00   host01      Mozilla/5.0     404          2021     6         8        10
2021-06-09T10:00:00   host01      Mozilla/5.0     200          2021     6         9        10
2021-06-10T10:00:00   host01      Mozilla/5.0     404          2021     6         10        10
2021-06-10T11:00:00   host01      Mozilla/5.0     503          2021     6         10        11
2021-06-11T10:00:00   host01      Mozilla/5.0     200          2021     6         11        10
2021-06-12T10:00:00   host01      Mozilla/5.0     503          2021     6         12        10
2021-06-13T10:00:00   host01      Mozilla/5.0     200          2021     6         13        10
2021-06-14T10:00:00   host01      Mozilla/5.0     404          2021     6         14        10
2021-06-15T10:00:00   host01      Mozilla/5.0     200          2021     6         15        10
2021-06-15T11:00:00   host01      Mozilla/5.0     503          2021     6         15        11

我想按日期计算http状态。那时,需要显示从现在开始一周的数据。

date        status_40x  status_50x
2021-06-09  0           0
2021-06-10  1           1
2021-06-11  0           0
2021-06-12  0           1
2021-06-13  0           0
2021-06-14  1           0
2021-06-15  0           1

如何进行 sql 查询?

类型是...

time (string) 
hostname (string) 
user_agent (string) 
http_status (string) 
col_0 (string)(Partitioned) 
col_1 (string)(Partitioned) 
col_2 (string)(Partitioned) 
col_3 (string)(Partitioned)

询问

select d.dte, coalesce(status_404, 0), coalesce(status_503, 0)
from (select date '2021-06-09' + n.n * interval '1' day as dte
      from unnest(sequence(0, 6, 1)) as n(n)
     ) d left join
     (select DATE_TRUNC('DAY', cast(from_iso8601_timestamp(time) as date)) as dte, 
             count_if(http_status = '404') AS status_404,
             count_if(http_status = '503') AS status_503
      from access_logs al
      group by dte
     ) al
     on al.dte = d.dte

错误

Error running query: SYNTAX_ERROR: line 9:16: Column 'dte' cannot be resolved

标签: sqlamazon-athena

解决方案


您可以按日期分组(例如,将时间戳转换为日期)并使用count_if

WITH dataset AS
  (SELECT *
   FROM (VALUES (TIMESTAMP '2018-06-01 10:00:00', 401), 
                (TIMESTAMP '2018-06-01 11:00:00', 400),
                (TIMESTAMP '2018-06-01 10:00:00', 500), 
                (TIMESTAMP '2018-06-02 11:00:00', 400)) AS t (TIME, http_status))

SELECT cast(TIME AS date),
       count_if(http_status BETWEEN 400 AND 499) AS "40x",
       count_if(http_status BETWEEN 500 AND 599) AS "50x"
FROM dataset
GROUP BY cast(TIME AS date)

这给出了下一个结果:

_col0 40x 50倍
2018-06-01 2 1
2018-06-02 1 0

推荐阅读