首页 > 解决方案 > 尝试同步到混合有 kms 的不同账户中的另一个存储桶时,AWS S3 同步不一致失败

问题描述

问题的执行摘要。我有一个存储桶,我们称之为存储桶 A,它在一个帐户中设置了默认的客户 KMS 密钥(将调用 id:1111111),我们将其称为 123。在该存储桶中有两个对象,它们都在同一个帐户下此存储桶内的路径。它们具有相同的 KMS 密钥 ID 和相同的所有者。当我尝试将这些同步到另一个帐户中的新存储桶 B 时,让我们使用帐户 456,其中一个已成功同步,但另一个未成功同步,而是我得到:

An error occurred (AccessDenied) when calling the CopyObject operation: Access Denied

有没有人见过这样不一致的行为?我说不一致是因为它们之间的访问权限绝对没有区别,但是一个成功而另一个不成功。注意:为简单起见,我的摘要说明了两个对象,但我的一个真实案例有 30 个对象,其中 2 个正在复制,其余的失败,并且在其他一些路径中,结果不同。

下面描述了一些条件——一些数据为了安全而被混淆但以一致的方式:

存储桶 A (com.mycompany.datalake.us-east-1) 存储桶策略:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123:root",
                    "arn:aws:iam::456:root"
                ]
            },
            "Action": [
                "s3:PutObjectTagging",
                "s3:PutObjectAcl",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::com.mycompany.datalake.us-east-1/security=0/*",
                "arn:aws:s3:::com.mycompany.datalake.us-east-1"
            ]
        },
        {
            "Sid": "DenyIfNotGrantingFullAccess",
            "Effect": "Deny",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123:root",
                    "arn:aws:iam::456:root"
                ]
            },
            "Action": "s3:PutObject",
            "Resource": [
                "arn:aws:s3:::com.mycompany.datalake.us-east-1/security=0/*",
                "arn:aws:s3:::com.mycompany.datalake.us-east-1"
            ],
            "Condition": {
                "StringNotLike": {
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        },
        {
            "Sid": "DenyIfNotUsingExpectedKmsKey",
            "Effect": "Deny",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123:root",
                    "arn:aws:iam::456:root"
                ]
            },
            "Action": "s3:PutObject",
            "Resource": [
                "arn:aws:s3:::com.mycompany.datalake.us-east-1/security=0/*",
                "arn:aws:s3:::com.mycompany.datalake.us-east-1"
            ],
            "Condition": {
                "StringNotLike": {
                    "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:us-east-1:123:key/1111111"
                }
            }
        }
    ]
}

同样在源帐户中,我创建了一个假定角色,我称之为datalake_full_access_role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::com.mycompany.datalake.us-east-1/security=0/*",
                "arn:aws:s3:::com.mycompany.datalake.us-east-1"
            ]
        }
    ]
}

它与帐户 456 具有 Trusted 关系。另外值得一提的是,目前 KMS 密钥 1111111 的策略是完全开放的:

{
    "Version": "2012-10-17",
    "Id": "key-default-1",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "kms:Encrypt*",
                "kms:Decrypt*",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:Describe*"
            ],
            "Resource": "*"
        }
    ]
}

现在对于账户 456 中的目标存储桶 B (mycompany-us-west-2-datalake),存储桶策略:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AccountBasedAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::456:root"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::mycompany-us-west-2-datalake",
                "arn:aws:s3:::mycompany-us-west-2-datalake/*"
            ]
        }
    ]
}

为了进行迁移(同步),我在 456 账户中配置了一个 EC2 实例,并为其附加了一个实例配置文件,该配置文件附加了以下策略:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::123:role/datalake_full_access_role"
        }
    ]
}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:DescribeKey",
                "kms:ReEncrypt*",
                "kms:CreateGrant",
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:us-east-1:123:key/1111111"
            ]
        }
    ]
}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::com.mycompany.datalake.us-east-1",
                "arn:aws:s3:::com.mycompany.datalake.us-east-1/security=0/*"
            ]
        }
    ]
}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::mycompany-us-west-2-datalake",
                "arn:aws:s3:::mycompany-us-west-2-datalake/*"
            ]
        }
    ]
}

现在在 EC2 实例上,我安装了最新的 aws 版本:

$ aws --version
aws-cli/1.16.297 Python/3.5.2 Linux/4.4.0-1098-aws botocore/1.13.33

然后运行我的同步命令:

aws s3 sync s3://com.mycompany.datalake.us-east-1 s3://mycompany-us-west-2-datalake --source-region us-east-1 --region us-west-2 --acl bucket-owner-full-control --exclude '*' --include '*/zone=raw/Event/*' --no-progress

我相信我已经完成了我的功课,这一切都应该有效,并且对于几个对象来说它确实有效,但不是全部,我目前没有其他东西可以尝试。注意我已经 100% 成功地同步到 EC2 实例上的本地目录,然后通过以下两个调用从本地目录到新存储桶:

aws s3 sync s3://com.mycompany.datalake.us-east-1 datalake --source-region us-east-1 --exclude '*' --include '*/zone=raw/Event/*' --no-progress
aws s3 sync datalake s3://mycompany-us-west-2-datalake --region us-west-2 --acl bucket-owner-full-control --exclude '*' --include '*/zone=raw/Event/*' --no-progress

这绝对没有意义,因为从访问 POV 来看没有区别。下面看一下源存储桶中两个对象的属性,一个成功,一个失败:

成功对象:

Owner
Dev.Awsmaster

Last modified
Jan 12, 2019 10:11:48 AM GMT-0800

Etag
12ab34

Storage class
Standard

Server-side encryption
AWS-KMS

KMS key ID
arn:aws:kms:us-east-1:123:key/1111111

Size
9.2 MB

Key
security=0/zone=raw/Event/11_96152d009794494efeeae49ed10da653.avro

失败的对象:

Owner
Dev.Awsmaster

Last modified
Jan 12, 2019 10:05:26 AM GMT-0800

Etag
45cd67

Storage class
Standard

Server-side encryption
AWS-KMS

KMS key ID
arn:aws:kms:us-east-1:123:key/1111111

Size
3.2 KB

Key
security=0/zone=raw/Event/05_6913583e47f457e9e25e9ea05cc9c7bb.avro

附录:在浏览了几个案例后,我开始看到一个模式。我认为当对象太小时可能会出现问题。在分析的 10 个目录中,有 10 个目录成功同步了一些但不是所有对象,所有成功的对象大小都在 8MB 或更大,而所有失败的对象都在 8MB 以下。aws s3 sync当 KMS 混合使用时,这可能是一个错误吗?我想知道我是否可以调整~/.aws/config它以解决这个问题?

标签: amazon-s3

解决方案


我找到了解决方案;虽然,我仍然认为这是 aws s3 sync 的一个错误。~./aws/config通过在成功同步的所有对象中设置以下内容:

[default]
output = json
s3 =
    signature_version = s3v4
    multipart_threshold = 1

signature_version我以前有过,但我想我会为了完整性而提供它,以防有人有类似的需求。新条目是multipart_threshold = 1,这意味着任何大小的对象都将触发分段上传。我没有指定multipart_chunksize,根据文档将默认为 5MB。

老实说,这个要求没有任何意义,因为对象之前是否使用 multipart 上传到 S3 并不重要,我知道当不涉及 KMS 时这并不重要,但显然它在什么时候很重要。


推荐阅读