首页 > 解决方案 > Python boto3 从 s3 加载模型 tar 文件并解压

问题描述

我正在使用 Sagemaker 并且有一堆 model.tar.gz 文件,我需要在 sklearn 中解压和加载这些文件。我一直在测试使用带有分隔符的 list_objects 来访问 tar.gz 文件:

response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)


for i in response['Contents']:
    print(i['Key'])

然后我打算用

import tarfile
tf = tarfile.open(model.read())
tf.extractall()

但是如何从 s3 而不是某个 boto3 对象获取实际的 tar.gz 文件?

标签: amazon-s3boto3taramazon-sagemaker

解决方案


您可以使用s3.download_file(). 这将使您的代码看起来像:

s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'

# List objects matching your criteria
response = s3.list_objects(
    Bucket = bucket,
    Prefix = prefix,
    Delimiter = '.csv'
)

# Iterate over each file found and download it
for i in response['Contents']:
    key = i['Key']
    dest = os.path.join('/tmp',key)
    print("Downloading file",key,"from bucket",bucket)
    s3.download_file(
        Bucket = bucket,
        Key = key,
        Filename = dest
    )

推荐阅读