首页 > 解决方案 > Redshift COPY 命令 - 某些字段的值未复制

问题描述

我创建了一个临时表,我想使用 COPY 命令从 S3 存储桶中的 JSON 文件填充该表。它有效,但firstName, itemInSession, lastName, sessionId, userAgent,userId的值没有被复制。

表.py

from dwh_conn import *

drop_staging_event = "DROP TABLE IF EXISTS staging_event;"

create_staging_event = ("""
CREATE TABLE IF NOT EXISTS staging_event(
    artist VARCHAR,
    auth VARCHAR,
    firstName VARCHAR,
    gender TEXT,
    itemInSession INTEGER,
    lastName VARCHAR,
    length FLOAT,
    level TEXT,
    location VARCHAR,
    method TEXT,
    page TEXT,
    registration FLOAT,
    sessionId INTEGER,
    song VARCHAR,
    status INTEGER,
    ts BIGINT,
    userAgent VARCHAR,
    userId VARCHAR
    );""")

load_staging_event = ("""COPY staging_event FROM 's3://dwh-training/data/log_data/2018/11/'
    credentials 'aws_iam_role={}' region 'us-east-2' JSON 'auto';""").format(ROLE_ARN)

以下返回空行:

SELECT firstName, itemInSession, lastName, sessionId, userAgent, userId FROM staging_event LIMIT 5;
 firstname | iteminsession | lastname | sessionid | useragent | userid 
-----------+---------------+----------+-----------+-----------+--------
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
           |               |          |           |           | 
(5 rows)

但其他字段具有值:

SELECT artist, auth, gender, length, level, location, method, page, registration, song, status, ts FROM staging_event LIMIT 5;
          artist           |   auth    | gender |  length   | level |         location         | method |   page   | registration  |              song               | status |      ts       
---------------------------+-----------+--------+-----------+-------+--------------------------+--------+----------+---------------+---------------------------------+--------+---------------
 N.E.R.D. FEATURING MALICE | Logged In | M      |  288.9922 | free  | New Orleans-Metairie, LA | PUT    | NextSong | 1541033612796 | Am I High (Feat. Malice)        |    200 | 1541121934796
                           | Logged In | F      |           | free  | Lubbock, TX              | GET    | Home     | 1540708070796 |                                 |    200 | 1541122176796
 Death Cab for Cutie       | Logged In | F      | 216.42404 | free  | Lubbock, TX              | PUT    | NextSong | 1540708070796 | A Lack Of Color (Album Version) |    200 | 1541122241796
 Tracy Gang Pussy          | Logged In | F      | 221.33506 | free  | Lubbock, TX              | PUT    | NextSong | 1540708070796 | I Have A Wish                   |    200 | 1541122457796
 Skillet                   | Logged In | M      | 178.02404 | free  | Harrisburg-Carlisle, PA  | PUT    | NextSong | 1540006905796 | Monster (Album Version)         |    200 | 1541126568796
(5 rows)

有谁知道为什么没有填充空字段?谢谢。

标签: pythonpostgresqlamazon-redshift

解决方案


Banty - 带有“json 'auto'”的 Redshift COPY 只会加载它可以与表匹配的 json 值。这可以在与顶级 json 键匹配的列名上完成,但它们需要完全匹配。检查这些名称是否完全匹配。

更好的匹配方法是使用 json_paths 文件,将 json 键映射到 Redshift 列。可以在那里进行名称更改以及从非顶级 json 键加载数据。

这只是我所知道的发生这种情况的最常见原因。如果您需要更多信息回复,并且您可能需要提供正在加载的 json 之外的内容。


推荐阅读