sql - 我想用亚马逊雅典娜按日期总结 http 状态
问题描述
我可以像这样访问 Athena 中的日志。
time hostname user_agent http_status col_0 col_1 col_2 col_3
2018-06-01T10:00:00 host01 Mozilla/5.0 200 2018 6 1 10
2019-06-01T10:00:00 host01 Mozilla/5.0 200 2019 6 1 10
2020-06-01T10:00:00 host01 Mozilla/5.0 404 2020 6 1 10
2021-06-08T10:00:00 host01 Mozilla/5.0 404 2021 6 8 10
2021-06-09T10:00:00 host01 Mozilla/5.0 200 2021 6 9 10
2021-06-10T10:00:00 host01 Mozilla/5.0 404 2021 6 10 10
2021-06-10T11:00:00 host01 Mozilla/5.0 503 2021 6 10 11
2021-06-11T10:00:00 host01 Mozilla/5.0 200 2021 6 11 10
2021-06-12T10:00:00 host01 Mozilla/5.0 503 2021 6 12 10
2021-06-13T10:00:00 host01 Mozilla/5.0 200 2021 6 13 10
2021-06-14T10:00:00 host01 Mozilla/5.0 404 2021 6 14 10
2021-06-15T10:00:00 host01 Mozilla/5.0 200 2021 6 15 10
2021-06-15T11:00:00 host01 Mozilla/5.0 503 2021 6 15 11
我想按日期计算http状态。那时,需要显示从现在开始一周的数据。
date status_40x status_50x
2021-06-09 0 0
2021-06-10 1 1
2021-06-11 0 0
2021-06-12 0 1
2021-06-13 0 0
2021-06-14 1 0
2021-06-15 0 1
如何进行 sql 查询?
类型是...
time (string)
hostname (string)
user_agent (string)
http_status (string)
col_0 (string)(Partitioned)
col_1 (string)(Partitioned)
col_2 (string)(Partitioned)
col_3 (string)(Partitioned)
询问
select d.dte, coalesce(status_404, 0), coalesce(status_503, 0)
from (select date '2021-06-09' + n.n * interval '1' day as dte
from unnest(sequence(0, 6, 1)) as n(n)
) d left join
(select DATE_TRUNC('DAY', cast(from_iso8601_timestamp(time) as date)) as dte,
count_if(http_status = '404') AS status_404,
count_if(http_status = '503') AS status_503
from access_logs al
group by dte
) al
on al.dte = d.dte
错误
Error running query: SYNTAX_ERROR: line 9:16: Column 'dte' cannot be resolved
解决方案
您可以按日期分组(例如,将时间戳转换为日期)并使用count_if
:
WITH dataset AS
(SELECT *
FROM (VALUES (TIMESTAMP '2018-06-01 10:00:00', 401),
(TIMESTAMP '2018-06-01 11:00:00', 400),
(TIMESTAMP '2018-06-01 10:00:00', 500),
(TIMESTAMP '2018-06-02 11:00:00', 400)) AS t (TIME, http_status))
SELECT cast(TIME AS date),
count_if(http_status BETWEEN 400 AND 499) AS "40x",
count_if(http_status BETWEEN 500 AND 599) AS "50x"
FROM dataset
GROUP BY cast(TIME AS date)
这给出了下一个结果:
_col0 | 40x | 50倍 |
---|---|---|
2018-06-01 | 2 | 1 |
2018-06-02 | 1 | 0 |
推荐阅读
- javascript - 在onPressEnter之后反应KeyboardEvent更改输入值
- database - 将数据从 servlet 显示到 jsp 文件时出现问题
- r - R - 具有 O(log(n)) 复杂度的 'length' 等效函数
- angular - NullInjectorError:在添加 mat 对话框时没有获取 MatDialog 的提供程序
- windows - ansible aws windows ec2 - 内部错误:此连接模块不支持通过 sudo 运行命令
- html - 全高左浮动子元素
- excel - 如何标记两组不同的重复值?
- visual-studio - Nuget 依赖解决方案
- excel - 使用excel VBA的复选框上的可变标题
- openmodelica - fmi - 忽略未知资源 URI