google-cloud-storage - ML-Engine unable to access job_dir directory in bucket
问题描述
I am attempting to submit a job for training in ML-Engine using gcloud but am running into an error with service account permissions that I can't figure out. The model code exists on a Compute Engine instance from which I am running gcloud ml-engine jobs submit
as part of a bash script. I have created a service account (ai-platform-developer@....iam.gserviceaccount.com) for gcloud authentication on the VM instance and have created a bucket for the job and model data. The service account has been granted Storage Object Viewer and Storage Object Creator roles for the bucket and the VM and bucket all belong to the same project.
When I try to submit a job per this tutorial, the following are executed:
time_stamp=`date +"%Y%m%d_%H%M"`
job_name='ObjectDetection_'${time_stamp}
gsutil cp object_detection/samples/configs/faster_rcnn_resnet50.config
gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
gcloud ml-engine jobs submit training ${job_name} \
--project [project-name] \
--runtime-version 1.12 \
--job-dir=gs://[bucket-name]/jobs/${job_name} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region us-central1 \
--config object_detection/training-config.yml \
-- \
--model_dir=gs://[bucket-name]/output/${job_name}} \
--pipeline_config_path=gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
where [bucket-name] and [project-name] are placeholders for the bucket created above and the project it and the VM are contained in.
The config file is successfully uploaded to the bucket, I can confirm it exists in the cloud console. However, the job fails to submit with the following error:
ERROR: (gcloud.ml-engine.jobs.submit.training) User [ai-platform-developer@....iam.gserviceaccount.com] does not have permission to access project [project-name] (or it may not exist): Field: job_dir Error: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
- '@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
field: job_dir
If I look in the cloud console, the files specified by --packages
exist in that location, and I've ensured the service account ai-platform-developer@....iam.gserviceaccount.com
has been given Storage Object Viewer and Storage Object Creator roles for the bucket, which has bucket level permissions set. After ensuring the service account is activated and the default, I can also run
gsutil ls gs://[bucket-name]/jobs/ObjectDetection_20190709_2001
which successfully returns the contents of the folder without a permission error. In the project, there exists a managed service account service-[project-number]@cloud-ml.google.com.iam.gserviceaccount.com
and I have also granted this account Storage Object Viewer and Storage Object Creator roles on the bucket.
To confirm this VM is able to submit a job, I am able to switch the gcloud user to my personal account and the script runs and submits a job without any error. However, since this exists in a shared VM, I would like to rely on service account authorization instead of my own user account.
解决方案
我有一个类似的问题,错误完全相同。
我发现解决这些错误的最简单方法是转到“日志记录”并搜索“PERMISSION DENIED”文本。
在我的情况下,服务帐户缺少权限“storage.buckets.get”。然后,您需要找到具有此权限的角色。您可以从 IAM->Roles 执行此操作。在该视图中,您可以按权限名称过滤角色。事实证明,只有以下角色具有所需的权限:
- 存储管理员
- 存储旧版存储桶所有者
- Storage Legacy Bucket Reader
- 存储传统存储桶写入器
我将“Storage Legacy Bucket Writer”角色添加到存储桶中的服务帐户,然后能够提交作业。
推荐阅读
- apache-spark - 如何通过 Spark Structured Streaming 确保 kafka 数据摄取不丢失数据?
- r - 在 R 中针对最大深度(从 1 到 20)绘制 R² 值
- julia - 默认项目中的包
- openpose - 找不到模块:tensorflow.contrib
- python - 如何使用请求发布 python 将照片上传到 Instagram?
- cuda - 在我的代码中检测问题时遇到问题 | CUDA
- regex - 正则表达式性能
- swift - 等效于 Request.serializeResponseJSON - 迁移 Alamofire 5
- apache-spark - 使用免费的 Google Cloud 积分创建 Dataproc 集群失败
- kubernetes - Google Kubernetes 集群没有自动缩减