amazon-s3 - 直接从 S3 读取预训练的拥抱脸转换器
问题描述
加载一个Huggingface 预训练的 Transformer 模型似乎需要您将模型保存在本地(如此处所述),这样您只需将本地路径传递给您的模型和配置:
model = PreTrainedModel.from_pretrained('path/to/model', local_files_only=True)
当模型存储在 S3 上时,这可以实现吗?
解决方案
回答我自己的问题......(显然鼓励)
我使用一个临时文件 ( NamedTemporaryFile
) 实现了这一点,它可以解决问题。我希望找到一个内存解决方案(即BytesIO
直接传入 to from_pretrained
),但这需要transformers
代码库的补丁
import boto3
import json
from contextlib import contextmanager
from io import BytesIO
from tempfile import NamedTemporaryFile
from transformers import PretrainedConfig, PreTrainedModel
@contextmanager
def s3_fileobj(bucket, key):
"""
Yields a file object from the filename at {bucket}/{key}
Args:
bucket (str): Name of the S3 bucket where you model is stored
key (str): Relative path from the base of your bucket, including the filename and extension of the object to be retrieved.
"""
s3 = boto3.client("s3")
obj = s3.get_object(Bucket=bucket, Key=key)
yield BytesIO(obj["Body"].read())
def load_model(bucket, path_to_model, model_name='pytorch_model'):
"""
Load a model at the given S3 path. It is assumed that your model is stored at the key:
'{path_to_model}/{model_name}.bin'
and that a config has also been generated at the same path named:
f'{path_to_model}/config.json'
"""
tempfile = NamedTemporaryFile()
with s3_fileobj(bucket, f'{path_to_model}/{model_name}.bin') as f:
tempfile.write(f.read())
with s3_fileobj(bucket, f'{path_to_model}/config.json') as f:
dict_data = json.load(f)
config = PretrainedConfig.from_dict(dict_data)
model = PreTrainedModel.from_pretrained(tempfile.name, config=config)
return model
model = load_model('my_bucket', 'path/to/model')
推荐阅读
- kubernetes - NodePort 服务不会将请求重定向到另一个节点
- swift - 在特定条件下不调用类扩展中的协议方法
- java - 如何禁用数据透视表阅读以创建工作簿
- swift - 如何从 ViewController.swift 引用函数内的 URL 对象?
- php - Laravel - 将我的普通 PHP 代码转换为 Laravel(控制器和视图)
- ruby-on-rails - 'foreign_key' 关联列的 ActiveAdmin belongs_to
- npm - 启动开发服务器时出现浏览器列表错误
- python - 聚合具有不同功能的字典值
- c++ - 类构造函数比“=”运算符具有更高的优先级?
- vba - 循环遍历 Word 文档并用 PageBreak 替换字符串