kubernetes - 什么会触发 k8s 中的 SyncLoop DELETE api 调用?
问题描述
我有一个在集群replicaset
中nginx-ingress
运行的,有两个实例。两天前,两个容器同时被删除(彼此相隔几毫秒),并在同一个副本集中创建了两个新实例。我不知道是什么触发了删除。在 kubelet 日志中,我可以看到以下内容:
kubelet[13317]: I0207 22:01:36.843804 13317 kubelet.go:1918] SyncLoop (DELETE, "api"): "nginx-ingress-public-controller-6bf8d59c4c
稍后在日志中列出了一个失败的活性探测:
kubelet[13317]: I0207 22:01:42.596603 13317 prober.go:116] Liveness probe for "nginx-ingress-public-controller-6bf8d59c4c (60c3f9e5-e228-44c8-abd5-b0a4a8507b5c):nginx-ingress-controller" failed (failure): HTTP probe failed with statuscode: 500
从理论上讲,这可以解释 pod 删除,但我对订单感到困惑。这个活性探测失败是因为删除命令已经杀死了底层的 docker 容器,还是这是触发删除的原因?
解决方案
To determine what exactly caused deletion of your nginx
pod is hard to guess without full logs. Also as you mention it's customer environment there might be many reasons. As I've asked in comments it might be HPA
or CA
, preemptible nodes, temporary network issues, etc.
Regarding the second part about pod deletion and Liveness
, Liveness
probe failed because nginx
pod was in the deletion
process.
One of Kubernetes
default settings is grace-period
equal to 30 seconds. In short it means that Pod will be in Terminating
status for 30 seconds, and after this time it will be removed.
Tests
If you would like to verify it by yourself you can do some testing to confirm. It would require kubeadm master and change of Verbosity. You can do it by editing the /var/lib/kubelet/kubeadm-flags.env
file (you must have root rights) and add --v=X
where X
is number 0-9
. Details which level shows specific logs can be found here.
- Set verbosity level at least to
level=5
, I've tested onlevel=8
- Deploy
Nginx Ingress Controller
- Delete
Nginx Ingress Controller
pod manually - Check logs using
$ journalctl -u kubelet
, you can usegrep
to narrow output and save it to file ($ journalctl -u kubelet | grep ingress-nginx-controller-s2kfr > nginx.log
)
Below examples from my tests:
#Liveness and Readiness probe works properly:
Feb 24 14:18:35 kubeadm kubelet[11922]: I0224 14:18:35.399156 11922 prober.go:126] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
Feb 24 14:18:40 kubeadm kubelet[11922]: I0224 14:18:40.587129 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
#Once Deletion process start you can find DELETE api and other information
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.900957 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901057 11922 kubelet_pods.go:1482] Generating status for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901914 11922 round_trippers.go:422] GET https://10.154.15.225:6443/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-s2kfr
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.909123 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container controller"
# This entry occurs as default grace-period-time was kept
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.947193 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
# As Pod was in deletion, Probes failed.
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584208 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584338 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: HTTP probe failed with statuscode: 500"
Feb 24 14:18:52 kubeadm kubelet[11922]: I0224 14:18:52.045155 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
Feb 24 14:18:55 kubeadm kubelet[11922]: I0224 14:18:55.398025 11922 prober.go:117] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
In logs, the time between SyncLoop (DELETE, "api")
and Liveness
probe is 4 seconds. In others, the test time was a few seconds (4-7 seconds difference).
If you would like to perform your own test you can change Readiness
and Liveness
probe check to 1 second (not 10 as is set by default) you would get probe issues in the same second as Delete api
.
Feb 24 15:09:40 kubeadm kubelet[11922]: I0224 15:09:40.865718 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" succeeded
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.488819 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6)"
...
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.865422 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" failed (failure): HTTP probe failed with statuscode: 500
Good explanation of syncLoop
you can find in Alibaba docs
As indicated in the comments, the
syncLoop
function is the major cycle ofKubelet
. This function listens on the updates, obtains the latestPod
configurations, andsynchronizes
the running state and desired state. In this way, allPods
on the local node can run in the expected states. Actually,syncLoop
only encapsulatessyncLoopIteration
, while thesynchronization
operation is carried out bysyncLoopIteration
.
Conclusion
If you don't have additional logging to save outputs from pods before termination it's hard to determine the root cause after a while since that event.
In the setup you have provided, the Liveness
probe failed because nginx-ingress
pod was already in the termination process. Liveness probe
fail did not trigger pod deletion but it was the result of that deletion.
In addition, you can also check Kubelet and Prober source code.
推荐阅读
- php - 如何显示从 Guzzle 响应到 html 的流内容
- java - 我在 vs 代码上设置 java 并且 (1) 即使代码正在运行并且公共类中 p 下方有一条小线,问题也会不断弹出
- python - 如何将打印输出存储在变量中(转义后)
- spring-boot - Neo4j OGM中同一实体的重复实例
- xamarin - Sdk 绑定失败并出现错误 MT5209:错误:找不到文件:*/iphone 12-14.4/mtouch-cache/libNetverify.a' (JumioNetverifyDemo) 本机链接
- python - 提升涉及静态和重载成员函数的python代码
- python - 使用 Twitter API V2 获取推文 ID 列表的所有转推的最有效方法是什么
- angular - 我怎样才能使我自己的芯片组件在角度
- spring - 在 Spring Data JPA 中加入查询
- laravel - 如果先前的查询返回结果,则避免运行后续查询