首页 > 解决方案 > GKE Kubernetes LoadBalancer 返回由对等方重置的连接

问题描述

我的集群遇到了一个奇怪的问题

在我的集群中,我有一个部署和一个负载平衡器服务,它暴露了这个部署,它就像一个魅力,但突然负载平衡器开始返回一个错误:

curl: (56) Recv failure: Connection reset by peer

pod 和负载均衡器正在运行并且日志中没有错误时显示错误

我已经尝试过的:

我的服务 yaml:

apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"RELEASE-NAME","app.kubernetes.io/name":"APP-NAME","app.kubernetes.io/version":"latest"},"name":"APP-NAME","namespace":"namespacex"},"spec":{"ports":[{"name":"web","port":3000}],"selector":{"app.kubernetes.io/instance":"RELEASE-NAME","app.kubernetes.io/name":"APP-NAME"},"type":"LoadBalancer"}}
  creationTimestamp: "2021-08-03T07:55:00Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app.kubernetes.io/instance: RELEASE-NAME
    app.kubernetes.io/name: APP-NAME
    app.kubernetes.io/version: latest
  name: APP-NAME
  namespace: namespacex
  resourceVersion: "14583904"
  uid: 7fb4d7e6-4316-44e5-8f9b-7a466bc776da
spec:
  clusterIP: 10.4.18.36
  clusterIPs:
  - 10.4.18.36
  externalTrafficPolicy: Cluster
  ports:
  - name: web
    nodePort: 30970
    port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app.kubernetes.io/instance: RELEASE-NAME
    app.kubernetes.io/name: APP-NAME
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: xx.xxx.xxx.xxx

我的部署 yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: APP-NAME
  labels:
    app.kubernetes.io/name: APP-NAME
    app.kubernetes.io/instance: RELEASE-NAME
    app.kubernetes.io/version: "latest"
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: APP-NAME
      app.kubernetes.io/instance: RELEASE-NAME
  template:
    metadata:
      annotations:
        checksum/config: 5e6ff0d6fa64b90b0365e9f3939cefc0a619502b32564c4ff712067dbe44ab90
        checksum/secret: 76e0a1351da90c0cef06851e3aa9e7c80b415c29b11f473d4a2520ade9c892ce
      labels:
        app.kubernetes.io/name: APP-NAME
        app.kubernetes.io/instance: RELEASE-NAME
    spec:
      serviceAccountName: APP-NAME
      containers:
        - name: APP-NAME
          image: 'docker.io/xxxxxxxx:latest'
          imagePullPolicy: "Always"
          ports:
            - name: http
              containerPort: 3000
          livenessProbe:
            httpGet:
              path: /balancer/
              port: http
          readinessProbe:
            httpGet:
              path: /balancer/
              port: http
          env:
            ...
          volumeMounts:
            - name: config-volume
              mountPath: /home/app/config/
          resources:
            limits:
              cpu: 400m
              memory: 256Mi
            requests:
              cpu: 400m
              memory: 256Mi
      volumes:
        - name: config-volume
          configMap:
            name: app-config
      imagePullSecrets:
        - name: secret

标签: kubernetesgoogle-kubernetes-enginegoogle-cloud-load-balancer

解决方案


在我的情况下,问题变成了网络组件(如 FW)在无明显原因使集群“不安全”变暗后阻塞了出站连接

所以本质上这不是 K8s 问题,而是 IT 问题


推荐阅读