首页 > 解决方案 > 使用自定义 docker 映像连接到 Google Cloud AI Platform Notebook 实例会导致“连接被拒绝”

问题描述

我正在尝试使用自定义图像在 Google Cloud AI Platform 上启动笔记本。我遵循了这里描述的做法:

https://cloud.google.com/ai-platform/deep-learning-containers/docs/derivative-container

所以要构建和推送 docker 镜像:

gcloud auth configure-docker
export PROJECT=$(gcloud config list project --format "value(core.project)")
docker build . -f Dockerfile -t "gcr.io/${PROJECT}/my-custom-image:latest"
docker push "gcr.io/${PROJECT}/my-custom-image:latest"

但是,当尝试使用此图像连接到笔记本实例时

gcloud compute --project "myproject" ssh --zone "myzone" "custom-test" -- -L 8080:localhost:8080

我明白了

ssh: connect to host XXX.XXX.XXX.XXX port 22: Connection refused

即使我只使用基本映像而不做任何更改,也会发生这种情况,例如使用这个 Dockerfile:

FROM gcr.io/deeplearning-platform-release/base-cpu:latest

如果我直接启动一个笔记本实例,gcr.io/deeplearning-platform-release/base-cpu:latest我可以按预期连接到它。

编辑 1:从串行端口 1 日志:

May  9 16:51:31 custom-test GCEGuestAgent[673]: 2020-05-09T16:51:31.7524Z GCEGuestAgent Info: Updating keys for user MYUSER.
[  206.144111] google_guest_agent[673]: 2020/05/09 16:51:33 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
May  9 16:51:33 custom-test google_guest_agent[673]: 2020/05/09 16:51:33 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
May  9 16:53:25 custom-test ntpd[707]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

似乎是权限错误,但我不确定为什么我无权部署从同一帐户推送的图像。会不会有关系custom-test ntpd[707]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

编辑 2:现在,大约一个小时后,我可以连接(没有进行任何更改)。但是当访问localhost:8080我得到:

channel 4: open failed: connect failed: Connection refused
channel 3: open failed: connect failed: Connection refused

作为附加控制台中的输出。

从串行端口 1 日志:

May  9 18:04:36 custom-test systemd[1]: Started Session 4 of user MYUSER.
May  9 18:04:36 custom-test GCEGuestAgent[673]: 2020-05-09T18:04:36.5636Z GCEGuestAgent Info: Updating keys for user MYUSER.
May  9 18:04:37 custom-test google_guest_agent[673]: 2020/05/09 18:04:37 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission
[ 4590.862794] google_guest_agent[673]: 2020/05/09 18:04:37 logging client: rpc error: code = PermissionDenied desc = The caller does not have permission

编辑 3:将映像作为 VM 启动会导致:

[   26.315675] konlet-startup[535]: 2020/05/09 19:34:57 Launching user container 'gcr.io/myproject/my-custom-image:latest'
[   26.315713] konlet-startup[535]: 2020/05/09 19:34:57 Configured container 'instance-1-test' will be started with name 'klt-instance-1-test-azmb'.
[   26.315740] konlet-startup[535]: 2020/05/09 19:34:57 Pulling image: 'gcr.io/myproject/my-custom-image:latest'
[   26.839555] konlet-startup[535]: 2020/05/09 19:34:57 Error: Failed to start container: Error response from daemon: {"message":"pull access denied for gcr.io/myproject/my-custom-image, repository does not exist or may require 'docker login': denied: Permission denied for \"latest\" from request \"/v2/myproject/my-custom-image/manifests/latest\". "}
[   26.839839] konlet-startup[535]: 2020/05/09 19:34:57 Saving welcome script to profile.d

标签: google-cloud-platformgcp-ai-platform-notebookgoogle-dl-platform

解决方案


推荐阅读