kubernetes - Kubernetes 上的 Dask-gateway 无法创建新集群
问题描述
我已经将 Dask 集成到 kubernetes 服务器上的 jupyterhub 中,并且可以在其上执行代码。但是我无法扩展集群或启动一个新集群。我的猜测是 dask 无法访问 Kubernetes 上的资源。我的做法与在 Kubernetes 集群上安装类似
什么有效:
from dask_gateway import Gateway
gateway = Gateway(address="http://xx.xx.xx.xx",auth="jupyterhub",)
gateway.list_clusters()
这给出了: [] 从那里我可以在 dask 上计算东西,所以它似乎能够与 traefik-dask-gateway 通信。但是运行:
cluster = gateway.new_cluster()
无限期挂起。和
options = gateway.cluster_options()
一无所获。资源设置有 2 个掌舵图。这是 jupyterhub_config.yaml:
proxy:
secretToken: abc...
imagePullSecret:
create: true
username: xx
password: xx
email: xx
singleuser:
defaultUrl: "/lab"
image:
name: xx
tag: latest
hub:
services:
dask-gateway:
apiToken: def...
config:
Authenticator:
admin_users:
- some-users
allowed_users:
- some-users
GitHubOAuthenticator:
client_id: "some_client_id"
client_secret: "some_client_secret"
oauth_callback_url: "https://my_jhub_server/hub/oauth_callback"
JupyterHub:
authenticator_class: github
这是通过运行创建的:
helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.11.1 --values jupyterhub_config.yaml
和 dask_config.yaml:
gateway:
replicas: 1
annotations: {}
resources: {}
prefix: /
loglevel: INFO
image:
name: daskgateway/dask-gateway-server
tag: 0.9.0
pullPolicy: IfNotPresent
imagePullSecrets: []
service:
annotations: {}
auth:
type: jupyterhub
jupyterhub:
apiToken: def
kerberos:
keytab: null
custom:
class: null
options: {}
livenessProbe:
enabled: true
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 6
readinessProbe:
enabled: true
initialDelaySeconds: 5
timeoutSeconds: 2
periodSeconds: 10
failureThreshold: 3
backend:
image:
name: daskgateway/dask-gateway
tag: 0.9.0
pullPolicy: IfNotPresent
namespace: null
environment: null
scheduler:
extraPodConfig: {}
extraContainerConfig: {}
cores:
request: null
limit: null
memory:
request: null
limit: null
worker:
extraPodConfig: {}
extraContainerConfig: {}
cores:
request: null
limit: null
memory:
request: null
limit: null
nodeSelector: {}
affinity: {}
tolerations: []
extraConfig: {}
controller:
enabled: true
annotations: {}
resources: {}
imagePullSecrets: []
loglevel: INFO
completedClusterMaxAge: 86400
completedClusterCleanupPeriod: 600
backoffBaseDelay: 0.1
backoffMaxDelay: 300
k8sApiRateLimit: 50
k8sApiRateLimitBurst: 100
image:
name: daskgateway/dask-gateway-server
tag: 0.9.0
pullPolicy: IfNotPresent
nodeSelector: {}
affinity: {}
tolerations: []
traefik:
replicas: 1
annotations: {}
resources: {}
image:
name: traefik
tag: 2.1.3
additionalArguments: []
loglevel: WARN
dashboard: false
service:
type: LoadBalancer
annotations: {}
spec: {}
ports:
web:
port: 80
nodePort: null
tcp:
port: web
nodePort: null
nodeSelector: {}
affinity: {}
tolerations: []
rbac:
enabled: true
controller:
serviceAccountName: null
gateway:
serviceAccountName: null
traefik:
serviceAccountName: null
安装有:
helm upgrade --install \
--namespace=jhub \
--version=0.9.0 \
--values=dask_config.yaml \
dask-gateway \
daskgateway/dask-gateway
更新:我已经阅读了 traefik-dask-gateway pod 的日志。它说 :
time="2021-07-13T04:33:10Z" level=error msg="Error while Peeking first byte: read tcp 10.120.5.89:8000->10.128.0.38:19143: read: connection reset by peer"
因此,欢迎任何有关如何修复的见解。谢谢。
解决方案
推荐阅读
- stream - 无法从 Kapacitor 流中获取响应 http Post 到本地快递应用程序
- typescript - 如何对原始类型进行类型查找以延迟其返回类型
- airflow - 我们可以在 Airflow 中为每个任务设置 priority_weight 吗?
- c++ - 将字符串作为指针传递时出错,无法将 const char* 分配给 char*
- java - 在 JAVA 程序中使用深度优先搜索而不是广度优先搜索
- javascript - 将python循环转换为javascript循环
- c# - 我需要添加什么代码才能使其对 SharePoint 在线工作?
- django - 如何通过 gunicorn 和 nginx 在 django 应用程序上使用 certbot 证书?
- javascript - 滚动时如何在徽标导航栏中交换 img
- ios - 从 CAAnimation 获取 UIVIew