首页 > 解决方案 > 需要在 k8s 中应用 alertmanager repo 中的 prometheus 规则

问题描述

有 2 个 gitlab 存储库:=> gitlab a => gitlab b

gitlab a - 包含prometheus和prometheus pushgateway的状态集和pod

gitlab b - 包含 alertmanager 服务、alermanager pod 和 prometheus 规则。

所有 pod 和容器都已启动并运行。我正在尝试将普罗米修斯规则应用于普罗米修斯状态集。普罗米修斯规则.png

需要将 Kind:prometheus 规则应用于有状态的 prometheus 集。有人可以帮忙吗? 在此处输入图像描述

应用规则 yaml :

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: prometheus-k8s-rules
  namespace: cmp-monitoring
spec:
 groups:
  - name: node-exporter.rules
    rules:
    - expr: |
        count without (cpu) (
          count without (mode) (
            node_cpu_seconds_total{job="node-exporter"}
          )
        )
      record: instance:node_num_cpu:sum
    - expr: |
        1 - avg without (cpu, mode) (
          rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[1m])
        )
      record: instance:node_cpu_utilisation:rate1m

prometheus-statefulset

apiVersion: apps/v1

kind: StatefulSet
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  serviceName: prometheus
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: prometheus
          image: prom/prometheus
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 9090
          volumeMounts:
            - name: prometheus-config
              mountPath: "/etc/prometheus/prometheus.yml"
              subPath: prometheus.yml
            - name: prometheus-data
              mountPath: "/prometheus"
            #- name: rules-general
            #  mountPath: "/etc/prometheus/prometheus.rules.yml"
            #  subPath: prometheus.rules.yml
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
            initialDelaySeconds: 120
            periodSeconds: 40
            successThreshold: 1
            timeoutSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
            initialDelaySeconds: 120
            periodSeconds: 40
            successThreshold: 1
            timeoutSeconds: 10
            failureThreshold: 3
      securityContext:
        fsGroup: 1000
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-server-conf
        #- name: rules-general
        #  configMap:
        #    name: prometheus-server-conf    
  volumeClaimTemplates:
    - metadata:
        name: prometheus-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: rbd-default
        resources:
          requests:
            storage: 10Gi

标签: kubernetesprometheusprometheus-alertmanager

解决方案


推荐阅读