首页 > 解决方案 > 如何将嵌套的 Avro 字段作为单个字段复制到 Redshift?

问题描述

我有以下 Avro 模式作为记录,我想向 Redshift 发出一份副本:

"fields": [{
    "name": "id",
    "type": "long"
}, {
    "name": "date",
    "type": {
        "type": "record",
        "name": "MyDateTime",
        "namespace": "com.mynamespace",
        "fields": [{
            "name": "year",
            "type": "int"
        }, {
            "name": "monthOfYear",
            "type": "int"
        }, {
            "name": "dayOfMonth",
            "type": "int"
        }, {
            "name": "hourOfDay",
            "type": "int"
        }, {
            "name": "minuteOfHour",
            "type": "int"
        }, {
            "name": "secondOfMinute",
            "type": "int"
        }, {
            "name": "millisOfSecond",
            "type": ["int", "null"],
            "default": 0
        }, {
            "name": "zone",
            "type": {
                "type": "string",
                "avro.java.string": "String"
            },
            "default": "America/New_York"
        }],
        "noregistry": []
    }
}]

我想MyDateTime在复制期间将对象压缩到 Redshift 中的单个列。我看到您可以将嵌套的 JSON 数据映射到顶级列:https ://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-json-jsonpaths ,但我还没有找到直接在 COPY 命令中连接字段的方法。

也就是说,有没有办法转换下面的记录(原来是Avro格式)

{
    "id": 6,
    "date": {
        "year": 2010,
        "monthOfYear": 10,
        "dayOfMonth": 12,
        "hourOfDay": 14,
        "minuteOfHour": 26,
        "secondOfMinute": 42,
        "millisOfSecond": {
            "int": 0
        },
        "zone": "America/New_York"
    }
}

在 Redshift 中排成一行,如下所示:

id | date
---------------------------------------------
6  | 2010-10-12 14:26:42:000 America/New_York

我想直接用 COPY 做这个

标签: amazon-redshiftavrojsonpath

解决方案


您需要将 Avro 文件声明为 Redshift Spectrum 外部表,然后对其使用查询将数据插入到本地 Redshift 表中。

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html


推荐阅读