amazon-web-services - 查询分区 athena 表以获取 aws alb 日志时出现问题
问题描述
我在 Athena 中添加了一个用于查询应用程序负载均衡器日志的表。我使用下面的查询创建了表,然后根据 s3 中的数据存储添加了分区。但我仍然无法通过查询获得所需的数据。
建表查询:
CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
type string,
time string,
elb string,
client_ip string,
client_port string,
target string,
request_processing_time int,
target_processing_time int,
response_processing_time int,
elb_status_code int,
target_status_code string,
received_bytes int,
sent_bytes int,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';
分区查询:
ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
当我试图运行一个简单的查询说“选择”时,它给了我找到的零结果。
我希望根据年或月进行分区。
解决方案
AWS 似乎再次改变了他们的 alb 日志格式。AWS 文档有新的正则表达式来处理新的日志格式。下面的查询对我有用。
表创建:
CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code string,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
new_field string
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';
添加分区:
ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";
参考:
https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html https://docs.aws.amazon.com/athena/latest/ug/partitions.html
推荐阅读
- elasticsearch - Logstash :sql_last_value 显示错误的垃圾日期(显示 6 个月的旧日期作为上次运行时间)
- eclipse - eclipse maven构建java.lang.ExceptionInInitializerError
- python - 无法使用 bs4、python、selenium 抓取细节
- javascript - 使用 Javascript 进行分阶段表单验证
- vue.js - Vue.js点击组件多次触发
- python - 在 tkinter python 中的 OptionMenu 内定位文本
- python - Pandas:基于一个公共列组合两个不同形状的数据框
- javascript - 如何在 JavaScript 函数中添加 jQuery?
- flutter - Flutter 命令在终端中不起作用
- functional-programming - F# 嵌套记录