python - 尝试在 python shell 中复制到 AWS Glue tmp 文件夹时出错
问题描述
我正在尝试在胶水作业中使用 boto3将一些文件复制到tmp文件夹。这是我的代码:
import pandas as pd
import numpy as np
import boto3
bucketname = "<bucket_name>"
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucketname)
print('line 9')
source = "stuff/20210223/"
#target = temp directory job is running in
target = os.path.dirname(os.path.realpath(__file__))
for obj in my_bucket.objects.filter(Prefix=source):
print('line 15')
source_filename = (obj.key).split('/')[-1]
copy_source = {
'Bucket': bucketname,
'Key': obj.key
}
print(obj.key)
print('line 21')
target_filename = "/{}/{}".format(target, source_filename)
print('target_filename')
print(target_filename)
s3.meta.client.copy(copy_source, bucketname, target_filename)
print('line 27')
print('curr dir')
curr_dir = os.path.dirname(os.path.realpath(__file__))
print('\n----------------\n')
dir_path = os.path.dirname(os.path.realpath(__file__))
files = [f for f in os.listdir('.') if os.path.isfile(f)]
print(files)
这会产生以下错误:
File "/tmp/runscript.py", line 123, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-hcnfmxbn/CDF parser.py", line 26, in <module>
File "/usr/local/lib/python3.6/site-packages/boto3/s3/inject.py", line 379, in copy
return future.result()
File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/usr/local/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/usr/local/lib/python3.6/site-packages/s3transfer/copies.py", line 110, in _submit
**head_object_request)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/runscript.py", line 142, in <module>
raise e_type(e_value).with_traceback(new_stack)
TypeError: __init__() missing 1 required positional argument: 'operation_name'
几乎每个 SO 线程都关注此错误消息,但解决方案似乎与区域有关,并且我已经检查过胶水作业的区域与我的存储桶相同。错误发生在这行代码中:
s3.meta.client.copy(copy_source, bucketname, target_filename)
当我实际尝试将文件复制到 tmp 文件夹时。我不明白这怎么可能是与我使用 Glue 服务 IAM 角色对文件夹的写访问权限相关的权限问题,因为我可以使用 pandas 将 csv 保存到文件夹to_csv
解决方案
通过更新我用于胶水作业的 IAM 角色策略以允许对正在运行作业的存储桶进行写访问来解决。
推荐阅读
- sql-server - 无法使用 Python 字典作为列值执行 SQL 插入命令
- c++ - 从 C 字符串与从另一个 std::string 构造 std::string 的不一致
- nginx - 为什么 Nginx 中的 proxy_pass 不处理子 uri?
- salesforce - 有没有一种方法可以在 Salesforce 中识别沙盒的许可证类型?(Developer、Developer Pro、部分复制、完整)
- r - 删除虚线置信区间线并修改图例
- php - 我在 Unity 项目中不断收到“IndexOutOfRangeException”。UNITY + PHP
- javascript - 如何在每次刷新时将图像放在页面中心
- java - 从java中的文件中按id删除
- elasticsearch - Elasticsearch Aggs 加入逻辑
- java - Excel Apache Poi 中的土耳其货币