kubernetes - k8s pod readiness probed failed: read tcp xxx -> yyy: read: connection reset by peer
问题描述
我在 EKS 上运行 Fargate,并且运行了大约 20~30 个 Pod。大约几天后(5~7天;经历了两次),他们开始拒绝Readiness probe HTTP请求。我当时捕捉到了吊舱的描述。我想指出第一个事件 - connection reset by peer
。
我在 Istio 中遇到过这个问题,根本原因可能是相同的。但是,我不使用 Istio,所以我被困在哪里。我将在下面附上我的入口、服务和部署的部分数据。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 56m kubelet Readiness probe failed: Get "http://10.104.4.xxx:20001/health_readiness": read tcp 169.254.175.xxx:36978->10.104.4.xxx:20001: read: connection reset by peer
Warning Unhealthy 55m (x3 over 56m) kubelet Liveness probe failed: dial tcp 10.104.4.xxx:20001: connect: connection refused
Normal Killing 55m kubelet Container hybrid-server-logic failed liveness probe, will be restarted
Warning FailedPreStopHook 55m kubelet Exec lifecycle hook ([/bin/bash -c kill -SIGTERM $(ps -ef | grep node | grep -v grep | awk '{print $1}')]) for Container "hybrid-server-logic" in Pod "hybrid-server-logic-745bf8ffc4-479x6_jpj-prod(c4acfaef-a8a6-41e8-9d89-3c03336388b3)" failed - error: rpc error: code = Unknown desc = failed to exec in container: failed to create exec "e92f0b6c6f1dcfa680a03ed3d2dc9b5176980d7b6dce371a8bcbb2c5eb2368fe": mkdir /run/containerd/io.containerd.grpc.v1.cri/containers/hybrid-server-logic/io/168763600: no space left on device, message: ""
Warning Unhealthy 72s (x331 over 56m) kubelet Readiness probe failed: Get "http://10.104.4.xxx:20001/health_readiness": dial tcp 10.104.4.xxx:20001: connect: connection refused
//ingress
http {
path {
path = "/*"
backend {
service_name = "my-app-service"
service_port = 20001
}
}
}
// serivce
name = my-app-service
spec {
port {
port = 20001
protocol = "TCP"
target_port = "my-app-port"
}
selector = {
"app" = "my-app"
}
type = "NodePort"
}
// deployment
...
ports:
- containerPort: 20001
name: logic-port
protocol: TCP
...
readinessProbe: # on failure, k8s will not forward traffic.
httpGet:
path: /health_readiness
port: my-app-port
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 5
livenessProbe: # on failure, k8s will restarts the server.
tcpSocket:
port: my-app-port
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 5
解决方案
我正在查看实例,由于机器上的日志文件,磁盘已满。
推荐阅读
- python - 如果条件为真,则在 python 中使用 any() 方法返回列表的索引
- jquery - 使用 JSON 和 AJAX 通过视图传递 dict
- python - 在日期列中查找最小值?
- python - sys.path.insert 引用 github 模块时出错
- javascript - Firebase Cloud Firestore noSQL
- r - 过滤r中的data.frame字符列
- scala - 通过 spark-shell 进行 Hadoop Config 设置似乎没有效果
- .net-core - ExecuteXmlReader 在 SQL Server 的特定 JSON 上失败
- python - 我想平滑我的 python matplotlib 图,代码不起作用
- javascript - 在 React Native 中创建行