首页 > 解决方案 > 频谱扫描错误复制镶木地板文件 Redshift

问题描述

我有来自镶木地板文件的复制语句,例如:

COPY schema.table
FROM 's3://bucket/folder/'
IAM_ROLE 'MyRole'
FORMAT AS PARQUET ;

MyRole政策是:

resource "aws_iam_policy" "PolicyMyRole" {
  name = "MyRole"
  path = "/"
  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::other/*",
                "arn:aws:s3:::folder"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::folder/*",
                "arn:aws:s3:::folder"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:*"
            ],
            "Resource": "*"
        }
    ]
}
EOF
}

副本返回错误,例如:

sqlalchemy.exc.InternalError: (psycopg2.InternalError) Spectrum Scan Error
DETAIL:  
  -----------------------------------------------
  error:  Spectrum Scan Error
  code:      15001
  context:   
Error: HTTP response error code: 403 Message: AccessDenied Access Denied
x-amz-request-id: 9A5F3F8BB1C6AD5C
x-amz-id-2: 1JwcGdQFUJMec7s97plTFEvaw0EldAsDnYrg56bTpz/QVzbclIiVf/bK4ynGF/T7VNJIcf01PbQ=

  query:     20027980
  location:  dory_util.cpp:929
  process:   fetchtask_thread [pid=527]
  -----------------------------------------------

parquet 文件是使用 pandas Dataframe 创建的:

df.to_parquet(fname=path,compression="gzip",engine='fastparquet', index=False)

文件成功上传到 s3 使用:

os.environ['AWS_PROFILE'] = profile
s3 = boto3.client('s3')
response = s3.upload_file(path, 'bucket', "folder/"+path)

标签: python-3.xpandasamazon-redshiftparquetspectrum

解决方案


推荐阅读