首页 > 解决方案 > 查询分区 athena 表以获取 aws alb 日志时出现问题

问题描述

我在 Athena 中添加了一个用于查询应用程序负载均衡器日志的表。我使用下面的查询创建了表,然后根据 s3 中的数据存储添加了分区。但我仍然无法通过查询获得所需的数据。

建表查询:

CREATE EXTERNAL TABLE IF NOT EXISTS {{DATABASE_NAME.TABLE_NAME}} (
  type string,
  time string,
  elb string,
  client_ip string,
  client_port string,
  target string,
  request_processing_time int,
  target_processing_time int,
  response_processing_time int,
  elb_status_code int,
  target_status_code string,
  received_bytes int,
  sent_bytes int,
  request_verb string,
  request_url string,
  request_proto string,
  user_agent string,
  ssl_cipher string,
  ssl_protocol string,
  target_group_arn string,
  trace_id string
)
PARTITIONED BY(year string, month string, day string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([-0-9]*) ([^ ]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) ([^ ]*)\" \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)'
) LOCATION 's3://{{BUCKET_NAME}}/AWSLogs/{{ACCOUNT_ID}}/elasticloadbalancing/us-west-2/';

分区查询:

ALTER TABLE alb_webapp add partition (year="2018", month="*", day="*")
location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";

当我试图运行一个简单的查询说“选择”时,它给了我找到的零结果。

我希望根据年或月进行分区。

标签: amazon-web-servicesamazon-elb

解决方案


AWS 似乎再次改变了他们的 alb 日志格式。AWS 文档有新的正则表达式来处理新的日志格式。下面的查询对我有用。

表创建:

CREATE EXTERNAL TABLE IF NOT EXISTS webapp_alb (
    type string,
    time string,
    elb string,
    client_ip string,
    client_port int,
    target_ip string,
    target_port int,
    request_processing_time double,
    target_processing_time double,
    response_processing_time double,
    elb_status_code string,
    target_status_code string,
    received_bytes bigint,
    sent_bytes bigint,
    request_verb string,
    request_url string,
    request_proto string,
    user_agent string,
    ssl_cipher string,
    ssl_protocol string,
    target_group_arn string,
    trace_id string,
    domain_name string,
    chosen_cert_arn string,
    matched_rule_priority string,
    request_creation_time string,
    actions_executed string,
    redirect_url string,
    new_field string
    )
    PARTITIONED BY(year string, month string, day string) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'input.regex' = 
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\"($| \"[^ ]*\")(.*)')
    LOCATION 's3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/';

添加分区:

ALTER TABLE webapp_alb add partition (year="2018", month="*", day="*") location "s3://{{bucket-name}}/{{directory-name}}/AWSLogs/{{account-id}}/elasticloadbalancing/us-east-1/2018/09/";

参考:

https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html https://docs.aws.amazon.com/athena/latest/ug/partitions.html


推荐阅读