c# - 确定 Kubernetes 中应用程序关闭的原因
问题描述
我有几个 .net Core 应用程序无缘无故关闭。自从实施健康检查以来似乎发生了这种情况,但我无法在 kubernetes 中看到杀死命令。
命令
kubectl describe pod mypod
输出(由于每天晚上关闭,重新启动计数如此之高;舞台环境)
Name: mypod
...
Status: Running
...
Controlled By: ReplicaSet/mypod-deployment-6dbb6bcb65
Containers:
myservice:
State: Running
Started: Fri, 01 Nov 2019 09:59:40 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 01 Nov 2019 07:19:07 +0100
Finished: Fri, 01 Nov 2019 09:59:37 +0100
Ready: True
Restart Count: 19
Liveness: http-get http://:80/liveness delay=10s timeout=1s period=5s #success=1 #failure=10
Readiness: http-get http://:80/hc delay=10s timeout=1s period=5s #success=1 #failure=10
...
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 18m (x103 over 3h29m) kubelet, aks-agentpool-40946522-0 Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 18m (x29 over 122m) kubelet, aks-agentpool-40946522-0 Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
这些是 pod 日志
命令
kubectl logs mypod --previous
输出
Hosting environment: Production
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
Application is shutting down...
命令
kubectl get events
输出(我在这里缺少的是杀戮事件。我的假设是 pod 没有重新启动,这是由多次失败的健康检查引起的)
LAST SEEN TYPE REASON OBJECT MESSAGE
39m Normal NodeHasSufficientDisk node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasSufficientDisk
39m Normal NodeHasSufficientMemory node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasSufficientMemory
39m Normal NodeHasNoDiskPressure node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasNoDiskPressure
39m Normal NodeReady node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeReady
39m Normal CREATE ingress/my-ingress Ingress default/ebizsuite-ingress
39m Normal CREATE ingress/my-ingress Ingress default/ebizsuite-ingress
7m2s Warning Unhealthy pod/otherpod2 Readiness probe failed: Get http://10.244.0.158:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
7m1s Warning Unhealthy pod/otherpod2 Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m Warning Unhealthy pod/otherpod2 Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
44m Warning Unhealthy pod/otherpod1 Liveness probe failed: Get http://10.244.0.151:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m35s Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
8m8s Warning Unhealthy pod/mypod Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
8m7s Warning Unhealthy pod/mypod Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
从另一个 pod 卷曲(我每秒都在很长的循环中执行此操作,并且从未收到过 200 OK 以外的其他内容)
kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/hc
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
{"status":"Healthy","totalDuration":"00:00:00.0647250","entries":{"self":{"data":{},"duration":"00:00:00.0000012","status":"Healthy"},"warmup":{"data":{},"duration":"00:00:00.0000007","status":"Healthy"},"TimeDB-check":{"data":{},"duration":"00:00:00.0341533","status":"Healthy"},"time-blob-storage-check":{"data":{},"duration":"00:00:00.0108192","status":"Healthy"},"time-rabbitmqbus-check":{"data":{},"duration":"00:00:00.0646841","status":"Healthy"}}}100 454 0 454 0 0 6579 0 --:--:-- --:--:-- --:--:-- 6579
卷曲
kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/liveness
Healthy % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7 0 7 0 0 7000 0 --:--:-- --:--:-- --:--:-- 7000
解决方案
从日志看来,问题出在活跃度和就绪性探测器上。这些都失败了,因此应用程序没有重新启动。
移除探针并检查应用程序是否启动。进入 pod 并尝试检查 liveness 和 readiness 探针以调查它们失败的原因。
推荐阅读
- guava - 为什么在 Guava 的 BloomFilter 中实际误报的概率远低于预期的误报概率?
- java - Spring:如何在 Spring Boot 中动态创建类似的 bean?
- c# - 当我无权访问 HttpContext 时,如何填充我的审核日志 UserId 字段?
- python - 如何在自定义 Django Allauth 表单中设置密码字段的 help_text?
- terraform - terraform cloud - 已计划但未显示可应用的选项
- linker - 在调试部分周围添加符号会导致二进制文件的大小增加三倍
- php - 如何解决不存在的目标类?
- android - 弹出回栈会引发“已添加片段”异常
- siddhi - 是否可以在 WSO2SP 中生成和使用动态 URL?
- spring - 从外部文件加载 Spring Boot 应用程序属性