首页 > 解决方案 > composer 中的 Python 冲突使整个 kubernetes 集群崩溃,我该如何解决?

问题描述

让我先声明一下,我并不精通 Kubernetes。

昨天,为了我们在 Google Cloud Composer 中的一个 dag,我们不得不安装/更新一个 python 依赖项。我不确定这是不是原因,但在此之后整个作曲家环境都崩溃了。

当我浏览 Logs Explorer 时,我在调度程序和工作程序中发现以下错误:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 4, in <module>
    __import__('pkg_resources').require('apache-airflow===1.10.2-composer')
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3105, in <module>
    @_call_aside
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3089, in _call_aside
    f(*args, **kwargs)
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3118, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 580, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 593, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 781, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'urllib3<1.25,>=1.21.1' distribution was not found and is required by requests

所以我尝试将 urllib3==1.24.3 添加到 Composer 的 PyPI 包中,我得到了这个错误:

更新失败并显示此消息:

此环境上的 UPDATE 操作在 39 分钟前失败,并显示以下错误消息:无法在新版本中创建 Web 服务器。检查气流网络服务器日志以获取详细信息。

不管怎样,很明显我需要解决python lib依赖的冲突,所以我按照这篇文章

在其中,一个步骤是连接到工作人员并执行 a pip freeze,这是有问题的,因为当我尝试执行以下操作时:

kubectl exec -itn composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm -- /bin/bash

我得到:

Defaulting container name to airflow-worker.
Use 'kubectl describe pod/airflow-worker-6cdfc68fd4-4k4jm -n composer-1-7-5-airflow-1-10-2-2d974007' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("airflow-worker")

这是一个结果kubectl get pods --all-namespaces

NAMESPACE                                NAME                                                             READY   STATUS             RESTARTS   AGE
composer-1-7-5-airflow-1-10-2-2d974007   airflow-scheduler-574bcfbd47-gqnkp                               1/2     CrashLoopBackOff   234        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-4k4jm                                  1/2     CrashLoopBackOff   233        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-fwz5h                                  1/2     CrashLoopBackOff   232        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-vl25g                                  1/2     CrashLoopBackOff   233        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-75ff8dbb56-qxc7j                                  0/2     Evicted            0          21d
default                                  airflow-monitoring-5bd5f64896-g6q8v                              1/1     Running            0          21d
default                                  airflow-redis-0                                                  1/1     Running            0          21d
default                                  airflow-sqlproxy-577bbc7577-mxv5p                                1/1     Running            0          21d
default                                  composer-agent-7c388f77-840c-40c8-be09-66303d721742-xxqlf        0/1     Completed          0          66m
default                                  composer-agent-9fd51464-6ed2-4f3d-9714-762ea723cb61-5lc2s        0/1     Completed          0          19h
default                                  composer-fluentd-daemon-gm7vr                                    1/1     Running            0          21d
default                                  composer-fluentd-daemon-srw2h                                    1/1     Running            4          21d
default                                  composer-fluentd-daemon-swzgc                                    1/1     Running            0          21d
kube-system                              heapster-gke-7b4f99dd5f-8d2fx                                    3/3     Running            0          21d
kube-system                              kube-dns-5995c95f64-7hn2s                                        4/4     Running            0          21d
kube-system                              kube-dns-5995c95f64-dwlfv                                        4/4     Running            0          21d
kube-system                              kube-dns-autoscaler-8687c64fc-fpvm9                              1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-7zcs   1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-h2u4   1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-i3kz   1/1     Running            1          7d20h
kube-system                              l7-default-backend-fd59995cd-hkk6z                               1/1     Running            0          21d
kube-system                              metrics-server-v0.3.1-5c6fbf777-27hgk                            2/2     Running            0          21CrashLoopBackOff
kube-system                              prometheus-to-sd-5z9sw                                           2/2     Running            0          21d
kube-system                              prometheus-to-sd-8dsr8                                           2/2     Running            2          21d
kube-system                              prometheus-to-sd-f55cl                                           2/2     Running            0          21d

用谷歌搜索一下,我发现CrashLoopBackOff在 Kubernetes 中可能很难诊断/解决错误。由于我对这项技术几乎不熟悉,因此我在这件事上寻求您的帮助。

  1. 我如何连接到工作人员?
  2. 如何安装/更新从该工作人员执行气流的 python 环境的库?这甚至是解决 Google Cloud Composer 中的 python 依赖问题的正确方法吗?

如果您能提供帮助,最好能获得尽可能多的详细信息。谢谢你。

标签: kubernetesgoogle-cloud-platformgoogle-kubernetes-enginegoogle-cloud-composer

解决方案


推荐阅读