首页 > 解决方案 > 如何创建红移卸载标题行

问题描述

我有一个来自 redshift 的卸载查询,它提供了所需的数据。我希望在将写入每个文件的 python 脚本中添加一个标题行。提前致谢。

标题行:应该看起来完全像这样。~id ~from ~to ~label starttime:Date endtime:Date accountid:String srcaddr:String dstaddr:String srcport:String dstport:String protocol:String packet:Int bytes:Int action:String

    def handler(event, context):
        now   = datetime.utcnow().replace(second=0, microsecond=0)
        past  = now - timedelta(minutes=120)

        cmd = """unload (SELECT 'pk' AS id,
        'version' AS version,
        'accountid' AS accountid,
        'interfaceid' AS interfaceid,
        'srcaddr' AS srcaddr,
        'dstaddr' AS dstaddr,
        'srcport' AS srcport,
        'dstport' AS dstport,
        'protocol' AS protocol,
        'packets' AS packets,
        'bytes' AS bytes,
        'starttime' AS startime,
        'endtime' AS endtime,
        'action' AS action,
        'logstatus' AS logstatus,
        'fromeni' AS fromeni,
        'toeni' AS toeni
    UNION ALL SELECT
        cast(pk AS varchar(255)) AS id,
        cast(version AS varchar(255)) AS version,
        cast(accountid AS varchar(255)) AS accountid,
        cast(interfaceid AS varchar(255)) AS interfaceid,
        cast(srcaddr AS varchar(255)) AS srcaddr,
        cast(dstaddr AS varchar(255)) AS dstaddr,
        cast(srcport AS varchar(255)) AS srcport,
        cast(dstport AS varchar(255)) AS dstport,
        cast(protocol AS varchar(255)) AS protocol,
        cast(packets AS varchar(255)) AS packets,
        cast(bytes AS varchar(255)) AS bytes,
        cast(starttime AS varchar(255)) AS starttime,
        cast(endtime AS varchar(255)) AS endtime,
        action,
        logstatus,
        cast(fromeni AS varchar(255)) AS fromeni,
        cast(toeni AS varchar(255)) AS toeni
    FROM dimension.vpc_flow_logs)\
            to 's3://xxxxxxxxxx/' \
            iam_role 'arn:aws:iam::xxxxxxxxx:role/task' \
            delimiter as ',' \
            PARALLEL OFF \
            ESCAPE \
            manifest allowoverwrite"""

        try:
            cursor = conn.cursor()
            cursor.execute(cmd)
            conn.commit()
        finally:
            cursor.close()
            conn.close()

    if __name__ == '__main__':
        handler({},{})

标签: pythonpython-2.7amazon-web-servicesamazon-redshift

解决方案


好消息!自 2018 年 10 月 2 日起,Redshift 开始在命令中推出对HEADER选项的支持,该选项UNLOAD会将列标题导出为 csv 的标题。

这是他们的公告: https ://forums.aws.amazon.com/ann.jspa?annID=6136

文档也已更新: https ://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

例如

UNLOAD ('SELECT * FROM customers')
TO 's3://<bucket-name>/customers.csv'
CREDENTIALS 'aws_iam_role=arn:aws:iam::<IAM_USER_ID#>:role/myRedshiftRole'
PARALLEL OFF DELIMITER ',' HEADER;

我刚刚在一个有 100-300 列的数据库上试了一下,效果很好。列定义的日子已经一去不复返了UNION!为使用 Redshift 并需要为第三方平台导出数据的任何数据科学家节省大量时间!

PARALLEL ON也适用于具有多个输出文件的默认值。


推荐阅读