kubernetes - composer 中的 Python 冲突使整个 kubernetes 集群崩溃,我该如何解决?
问题描述
让我先声明一下,我并不精通 Kubernetes。
昨天,为了我们在 Google Cloud Composer 中的一个 dag,我们不得不安装/更新一个 python 依赖项。我不确定这是不是原因,但在此之后整个作曲家环境都崩溃了。
当我浏览 Logs Explorer 时,我在调度程序和工作程序中发现以下错误:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 4, in <module>
__import__('pkg_resources').require('apache-airflow===1.10.2-composer')
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3105, in <module>
@_call_aside
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3089, in _call_aside
f(*args, **kwargs)
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3118, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 580, in _build_master
return cls._build_from_requirements(__requires__)
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 593, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 781, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'urllib3<1.25,>=1.21.1' distribution was not found and is required by requests
所以我尝试将 urllib3==1.24.3 添加到 Composer 的 PyPI 包中,我得到了这个错误:
更新失败并显示此消息:
此环境上的 UPDATE 操作在 39 分钟前失败,并显示以下错误消息:无法在新版本中创建 Web 服务器。检查气流网络服务器日志以获取详细信息。
不管怎样,很明显我需要解决python lib依赖的冲突,所以我按照这篇文章。
在其中,一个步骤是连接到工作人员并执行 a pip freeze
,这是有问题的,因为当我尝试执行以下操作时:
kubectl exec -itn composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm -- /bin/bash
我得到:
Defaulting container name to airflow-worker.
Use 'kubectl describe pod/airflow-worker-6cdfc68fd4-4k4jm -n composer-1-7-5-airflow-1-10-2-2d974007' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("airflow-worker")
这是一个结果kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
composer-1-7-5-airflow-1-10-2-2d974007 airflow-scheduler-574bcfbd47-gqnkp 1/2 CrashLoopBackOff 234 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm 1/2 CrashLoopBackOff 233 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-fwz5h 1/2 CrashLoopBackOff 232 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-vl25g 1/2 CrashLoopBackOff 233 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-75ff8dbb56-qxc7j 0/2 Evicted 0 21d
default airflow-monitoring-5bd5f64896-g6q8v 1/1 Running 0 21d
default airflow-redis-0 1/1 Running 0 21d
default airflow-sqlproxy-577bbc7577-mxv5p 1/1 Running 0 21d
default composer-agent-7c388f77-840c-40c8-be09-66303d721742-xxqlf 0/1 Completed 0 66m
default composer-agent-9fd51464-6ed2-4f3d-9714-762ea723cb61-5lc2s 0/1 Completed 0 19h
default composer-fluentd-daemon-gm7vr 1/1 Running 0 21d
default composer-fluentd-daemon-srw2h 1/1 Running 4 21d
default composer-fluentd-daemon-swzgc 1/1 Running 0 21d
kube-system heapster-gke-7b4f99dd5f-8d2fx 3/3 Running 0 21d
kube-system kube-dns-5995c95f64-7hn2s 4/4 Running 0 21d
kube-system kube-dns-5995c95f64-dwlfv 4/4 Running 0 21d
kube-system kube-dns-autoscaler-8687c64fc-fpvm9 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-7zcs 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-h2u4 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-i3kz 1/1 Running 1 7d20h
kube-system l7-default-backend-fd59995cd-hkk6z 1/1 Running 0 21d
kube-system metrics-server-v0.3.1-5c6fbf777-27hgk 2/2 Running 0 21CrashLoopBackOff
kube-system prometheus-to-sd-5z9sw 2/2 Running 0 21d
kube-system prometheus-to-sd-8dsr8 2/2 Running 2 21d
kube-system prometheus-to-sd-f55cl 2/2 Running 0 21d
用谷歌搜索一下,我发现CrashLoopBackOff
在 Kubernetes 中可能很难诊断/解决错误。由于我对这项技术几乎不熟悉,因此我在这件事上寻求您的帮助。
- 我如何连接到工作人员?
- 如何安装/更新从该工作人员执行气流的 python 环境的库?这甚至是解决 Google Cloud Composer 中的 python 依赖问题的正确方法吗?
如果您能提供帮助,最好能获得尽可能多的详细信息。谢谢你。
解决方案
推荐阅读
- javascript - Javascript 在动态创建的选择框中添加工具提示
- r - 将面板数据与横截面数据合并
- git - 如何使用预提交验证 Hive HQL 语法?
- css - 媒体查询在 wordpress 中不起作用?
- bash - bash READLINE_LINE 在行继续后的函数中为空
- html - 我的表单代码显示在我的网站上而不是呈现?
- sql - PostgreSQL 多对多关系 SUM
- android - 当用户完成输入而不是在 kotlin android 中更改的文本时发送文本数据
- azure - 由于使用 StackExchange.Redis 的“ConnectTimeout”,无法连接到 Azure Redis 缓存
- python - 根据数据框的另一部分有条件地将一列拆分为多列