首页 > 解决方案 > EKS 集群中的 FailedCreatePodSandBox 事件

问题描述

在 EKS Kubernetes 集群中,我有一个每 5 分钟创建一次 pod 的 cronjob。cronjob 总是运行良好,但有时会显示FailedCreatePodSandBox我无法理解其原因的事件。即使发生此事件,cronjob 也可以正常工作。事件日志如下,

39m         Warning   FailedCreatePodSandBox   Pod          Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565344380-bg97c": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"read init-p: connection reset by peer\"": unknown
34m         Warning   FailedCreatePodSandBox   Pod          Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565344680-xq9rl": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"read init-p: connection reset by peer\"": unknown
24m         Warning   FailedCreatePodSandBox   Pod          Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565345280-v5pz9": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"\"": unknown
9m39s       Warning   FailedCreatePodSandBox   Pod          Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "synthetic-test-cronjob-1565346180-xxpmc": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: killed\"": unknown

如您所见,错误消息中出现了两个不同的行号。process_linux.go:402process_linux.go:301

此警告的可能原因是什么?我该如何防止它,或者我应该忽略它,因为它不会影响 cronjob?

标签: kubernetesamazon-eks

解决方案


看起来有一些与您的示例中提供的错误消息相关的已知问题。看看以下github 问题

https://github.com/kubernetes/kubernetes/issues/68190

https://github.com/opencontainers/runc/issues/1914

我们认为,当超过任何容器 cgroup 限制(例如内存、cpu、pids)时,也可能出现此错误。


推荐阅读