首页 > 解决方案 > Kubernetes 上的 Dask-gateway 无法创建新集群

问题描述

我已经将 Dask 集成到 kubernetes 服务器上的 jupyterhub 中,并且可以在其上执行代码。但是我无法扩展集群或启动一个新集群。我的猜测是 dask 无法访问 Kubernetes 上的资源。我的做法与在 Kubernetes 集群上安装类似

什么有效:

from dask_gateway import Gateway
gateway = Gateway(address="http://xx.xx.xx.xx",auth="jupyterhub",)
gateway.list_clusters()

这给出了: [] 从那里我可以在 dask 上计算东西,所以它似乎能够与 traefik-dask-gateway 通信。但是运行:

cluster = gateway.new_cluster()

无限期挂起。和

options = gateway.cluster_options()

一无所获。资源设置有 2 个掌舵图。这是 jupyterhub_config.yaml:

proxy:
  secretToken: abc...
imagePullSecret:
  create: true
  username: xx
  password: xx
  email: xx
singleuser:
  defaultUrl: "/lab"
  image:
    name: xx
    tag: latest

hub:
  services:
    dask-gateway:
      apiToken: def...
  config:
    Authenticator:
      admin_users:
        - some-users
      allowed_users:
        - some-users
    GitHubOAuthenticator:
      client_id: "some_client_id"
      client_secret: "some_client_secret"
      oauth_callback_url: "https://my_jhub_server/hub/oauth_callback"
    JupyterHub:
      authenticator_class: github

这是通过运行创建的:

helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.11.1 --values jupyterhub_config.yaml

和 dask_config.yaml:

gateway:
  replicas: 1
  annotations: {}
  resources: {}
  prefix: /
  loglevel: INFO
  image:
    name: daskgateway/dask-gateway-server
    tag: 0.9.0
    pullPolicy: IfNotPresent
  imagePullSecrets: []
  service:
    annotations: {}
  auth:
    type: jupyterhub
    jupyterhub:
      apiToken: def
    kerberos:
      keytab: null
    custom:
      class: null
      options: {}
  livenessProbe:
    enabled: true
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 6
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 3

  backend:
    image:
      name: daskgateway/dask-gateway
      tag: 0.9.0
      pullPolicy: IfNotPresent
    namespace: null
    environment: null

    scheduler:
      extraPodConfig: {}
      extraContainerConfig: {}
      cores:
        request: null
        limit: null
      memory:
        request: null
        limit: null

    worker:
      extraPodConfig: {}
      extraContainerConfig: {}
      cores:
        request: null
        limit: null
      memory:
        request: null
        limit: null

  nodeSelector: {}
  affinity: {}
  tolerations: []
  extraConfig: {}

controller:

  enabled: true
  annotations: {}
  resources: {}
  imagePullSecrets: []
  loglevel: INFO
  completedClusterMaxAge: 86400
  completedClusterCleanupPeriod: 600
  backoffBaseDelay: 0.1
  backoffMaxDelay: 300
  k8sApiRateLimit: 50
  k8sApiRateLimitBurst: 100

  image:
    name: daskgateway/dask-gateway-server
    tag: 0.9.0
    pullPolicy: IfNotPresent

  nodeSelector: {}
  affinity: {}
  tolerations: []

traefik:
  replicas: 1
  annotations: {}
  resources: {}
  image:
    name: traefik
    tag: 2.1.3
  additionalArguments: []
  loglevel: WARN
  dashboard: false
  service:
    type: LoadBalancer
    annotations: {}
    spec: {}
    ports:
      web:
        port: 80
        nodePort: null
      tcp:
        port: web
        nodePort: null
  nodeSelector: {}
  affinity: {}
  tolerations: []

rbac:
  enabled: true
  controller:
    serviceAccountName: null
  gateway:
    serviceAccountName: null
  traefik:
    serviceAccountName: null

安装有:

helm upgrade --install \
 --namespace=jhub \
 --version=0.9.0 \
 --values=dask_config.yaml \
dask-gateway \
daskgateway/dask-gateway

更新:我已经阅读了 traefik-dask-gateway pod 的日志。它说 :

time="2021-07-13T04:33:10Z" level=error msg="Error while Peeking first byte: read tcp 10.120.5.89:8000->10.128.0.38:19143: read: connection reset by peer"

因此,欢迎任何有关如何修复的见解。谢谢。

标签: kubernetesdaskjupyterhub

解决方案


推荐阅读