kubernetes - 什么会触发 k8s 中的 SyncLoop DELETE api 调用？

To determine what exactly caused deletion of your nginx pod is hard to guess without full logs. Also as you mention it's customer environment there might be many reasons. As I've asked in comments it might be HPA or CA, preemptible nodes, temporary network issues, etc.

Regarding the second part about pod deletion and Liveness, Liveness probe failed because nginx pod was in the deletion process.

One of Kubernetes default settings is grace-period equal to 30 seconds. In short it means that Pod will be in Terminating status for 30 seconds, and after this time it will be removed.

Tests

If you would like to verify it by yourself you can do some testing to confirm. It would require kubeadm master and change of Verbosity. You can do it by editing the /var/lib/kubelet/kubeadm-flags.env file (you must have root rights) and add --v=X where X is number 0-9. Details which level shows specific logs can be found here.

Set verbosity level at least to level=5, I've tested on level=8
Deploy Nginx Ingress Controller
Delete Nginx Ingress Controller pod manually
Check logs using $ journalctl -u kubelet, you can use grep to narrow output and save it to file ($ journalctl -u kubelet | grep ingress-nginx-controller-s2kfr > nginx.log)

Below examples from my tests:

#Liveness and Readiness probe works properly:
Feb 24 14:18:35 kubeadm kubelet[11922]: I0224 14:18:35.399156   11922 prober.go:126] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
Feb 24 14:18:40 kubeadm kubelet[11922]: I0224 14:18:40.587129   11922 prober.go:126] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded

#Once Deletion process start you can find DELETE api and other information

Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.900957   11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901057   11922 kubelet_pods.go:1482] Generating status for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901914   11922 round_trippers.go:422] GET https://10.154.15.225:6443/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-s2kfr
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.909123   11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container controller"

# This entry occurs as default grace-period-time was kept
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.947193   11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running

# As Pod was in deletion, Probes failed.
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584208   11922 prober.go:117] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584338   11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: HTTP probe failed with statuscode: 500"
Feb 24 14:18:52 kubeadm kubelet[11922]: I0224 14:18:52.045155   11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
Feb 24 14:18:55 kubeadm kubelet[11922]: I0224 14:18:55.398025   11922 prober.go:117] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500

In logs, the time between SyncLoop (DELETE, "api") and Liveness probe is 4 seconds. In others, the test time was a few seconds (4-7 seconds difference).

If you would like to perform your own test you can change Readiness and Liveness probe check to 1 second (not 10 as is set by default) you would get probe issues in the same second as Delete api.

Feb 24 15:09:40 kubeadm kubelet[11922]: I0224 15:09:40.865718   11922 prober.go:126] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" succeeded
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.488819   11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6)"
...
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.865422   11922 prober.go:117] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" failed (failure): HTTP probe failed with statuscode: 500

Good explanation of syncLoop you can find in Alibaba docs

As indicated in the comments, the syncLoop function is the major cycle of Kubelet. This function listens on the updates, obtains the latest Pod configurations, and synchronizes the running state and desired state. In this way, all Pods on the local node can run in the expected states. Actually, syncLoop only encapsulates syncLoopIteration, while the synchronization operation is carried out by syncLoopIteration.

Conclusion

If you don't have additional logging to save outputs from pods before termination it's hard to determine the root cause after a while since that event.

In the setup you have provided, the Liveness probe failed because nginx-ingress pod was already in the termination process. Liveness probe fail did not trigger pod deletion but it was the result of that deletion.

In addition, you can also check Kubelet and Prober source code.

kubernetes - 什么会触发 k8s 中的 SyncLoop DELETE api 调用？

问题描述

解决方案

Tests

Conclusion

推荐阅读