首页 > 解决方案 > 我可以在 kubernetes 外部部署 promethues 并监控 kubernetes 吗?

问题描述

我一直在 kubernetes Cluser 的外部部署一个普罗米修斯。我想用它来监控 Kubernetes。不幸的是,我遇到了很多问题。

如:

这是屏幕截图

这是我的部署脚本:

docker run -it -d --name prometheus -p 9090:9090 \
  --user 1000 \
  -v /home/prometheus/prometheus:/etc/prometheus/ \
  -v /home/prometheus/data:/prometheus \
  quay.io/prometheus/prometheus:v2.20.1

这是我的 prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']


  - job_name: kubernetes-apiservers
    metrics_path: /metrics
    # metrics_path: /

    scrape_interval: 10s
    # scrape_interval: 1m
    scrape_timeout: 10s
    scheme: https

    tls_config:
      insecure_skip_verify: true 
      # ca_file: /etc/prometheus/ca.crt

    kubernetes_sd_configs:
    - api_server: https://192.168.0.146:6443
      role: endpoints
      bearer_token_file: /etc/prometheus/prome.token
      tls_config:
        insecure_skip_verify: true  
      # namespaces:
      #   names: []

    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: default;kubernetes;https
      replacement: $1
      action: keep


  - job_name: kubernetes-nodes
    metrics_path: /metrics

    scrape_interval: 10s
    scrape_timeout: 10s
    scheme: https

    tls_config:
      insecure_skip_verify: true
      # ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

    kubernetes_sd_configs:
    - api_server: https://192.168.0.146:6443
      role: node
      bearer_token_file: /etc/prometheus/prome.token
      tls_config:
        insecure_skip_verify: true
        # ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      namespaces:
        names: []

    relabel_configs:
    - separator: ;
      regex: __meta_kubernetes_node_label_(.+)
      replacement: $1
      action: labelmap
    - separator: ;
      regex: (.*)
      target_label: __address__
      replacement: kubernetes.default.svc:443
      action: replace
    - source_labels: [__meta_kubernetes_node_name]
      separator: ;
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics
      action: replace

另外,我检查了普罗米修斯日志,没有发现任何可疑之处:

[root@company-server-121 prometheus]# docker logs -f prometheus --tail 100
level=info ts=2020-09-04T02:55:49.571Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599170400000 maxt=1599177600000 ulid=01EHBA1FG79CSJGAEFKEBKX8WA
level=info ts=2020-09-04T02:55:49.577Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T02:55:49.578Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=1.465365ms
level=info ts=2020-09-04T02:55:49.579Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T02:55:49.583Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T02:55:49.613Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=35 maxSegment=39
level=info ts=2020-09-04T02:55:49.644Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=36 maxSegment=39
level=info ts=2020-09-04T02:55:49.676Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=37 maxSegment=39
level=info ts=2020-09-04T02:55:49.703Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=39
level=info ts=2020-09-04T02:55:49.704Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=39
level=info ts=2020-09-04T02:55:49.704Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=4.494795ms wal_replay_duration=121.056294ms total_replay_duration=127.049792ms
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T02:55:49.707Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T02:55:49.708Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T02:55:49.708Z caller=main.go:652 msg="Server is ready to receive web requests."
level=info ts=2020-09-04T03:00:01.264Z caller=compact.go:495 component=tsdb msg="write block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z duration=23.544803ms
level=info ts=2020-09-04T03:00:01.269Z caller=head.go:804 component=tsdb msg="Head GC completed" duration=1.195693ms
level=info ts=2020-09-04T03:00:01.269Z caller=checkpoint.go:96 component=tsdb msg="Creating checkpoint" from_segment=35 to_segment=37 mint=1599184800000
level=info ts=2020-09-04T03:00:01.302Z caller=head.go:884 component=tsdb msg="WAL checkpoint complete" first=35 last=37 duration=32.653413ms
level=info ts=2020-09-04T03:00:01.328Z caller=compact.go:441 component=tsdb msg="compact blocks" count=3 mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK sources="[01EHAWA1070CVB7X1BZ8CE9TXH 01EHB35R87CQVVJYNHBB2T76G6 01EHBA1FG79CSJGAEFKEBKX8WA]" duration=23.473664ms
level=warn ts=2020-09-04T03:38:34.452Z caller=main.go:530 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:553 msg="Stopping scrape discovery manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:567 msg="Stopping notify discovery manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:589 msg="Stopping scrape manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:549 msg="Scrape discovery manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:563 msg="Notify discovery manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=manager.go:888 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-09-04T03:38:34.453Z caller=manager.go:898 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-09-04T03:38:34.453Z caller=main.go:583 msg="Scrape manager stopped"
level=info ts=2020-09-04T03:38:34.454Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-09-04T03:38:34.454Z caller=main.go:755 msg="Notifier manager stopped"
level=info ts=2020-09-04T03:38:34.454Z caller=main.go:767 msg="See you next time!"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:308 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:343 msg="Starting Prometheus" version="(version=2.20.1, branch=HEAD, revision=983ebb4a513302315a8117932ab832815f85e3d2)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:344 build_context="(go=go1.14.6, user=root@7cbd4d1c15e0, date=20200805-17:26:58)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:345 host_details="(Linux 4.14.15-1.el7.elrepo.x86_64 #1 SMP Tue Jan 23 20:28:26 EST 2018 x86_64 34ba7bdc34ce (none))"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:346 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-09-04T03:38:34.952Z caller=main.go:347 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-09-04T03:38:34.954Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-09-04T03:38:34.954Z caller=main.go:684 msg="Starting TSDB ..."
level=info ts=2020-09-04T03:38:34.955Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599023110806 maxt=1599069600000 ulid=01EH8GS198RCSZBAPC1Z7P629X
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599069600000 maxt=1599134400000 ulid=01EHA7PVAGDKSJTEJE1572FC7J
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599134400000 maxt=1599156000000 ulid=01EHAWA116KSFTR969PJ5MKAQ2
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z
level=info ts=2020-09-04T03:38:34.956Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK
level=info ts=2020-09-04T03:38:34.963Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T03:38:34.964Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=899.254µs
level=info ts=2020-09-04T03:38:34.964Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T03:38:34.967Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T03:38:34.997Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=41
level=info ts=2020-09-04T03:38:35.002Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=40 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=41 maxSegment=41
level=info ts=2020-09-04T03:38:35.029Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=2.805909ms wal_replay_duration=62.637276ms total_replay_duration=66.40127ms
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T03:38:35.031Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:38:35.032Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:38:35.032Z caller=main.go:652 msg="Server is ready to receive web requests."
level=warn ts=2020-09-04T03:47:13.411Z caller=main.go:530 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:553 msg="Stopping scrape discovery manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:567 msg="Stopping notify discovery manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:589 msg="Stopping scrape manager..."
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:549 msg="Scrape discovery manager stopped"
level=info ts=2020-09-04T03:47:13.411Z caller=main.go:563 msg="Notify discovery manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=manager.go:888 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-09-04T03:47:13.412Z caller=manager.go:898 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:583 msg="Scrape manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:755 msg="Notifier manager stopped"
level=info ts=2020-09-04T03:47:13.412Z caller=main.go:767 msg="See you next time!"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:308 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:343 msg="Starting Prometheus" version="(version=2.20.1, branch=HEAD, revision=983ebb4a513302315a8117932ab832815f85e3d2)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:344 build_context="(go=go1.14.6, user=root@7cbd4d1c15e0, date=20200805-17:26:58)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:345 host_details="(Linux 4.14.15-1.el7.elrepo.x86_64 #1 SMP Tue Jan 23 20:28:26 EST 2018 x86_64 34ba7bdc34ce (none))"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:346 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-09-04T03:47:13.939Z caller=main.go:347 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-09-04T03:47:13.941Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-09-04T03:47:13.941Z caller=main.go:684 msg="Starting TSDB ..."
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599023110806 maxt=1599069600000 ulid=01EH8GS198RCSZBAPC1Z7P629X
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599069600000 maxt=1599134400000 ulid=01EHA7PVAGDKSJTEJE1572FC7J
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599134400000 maxt=1599156000000 ulid=01EHAWA116KSFTR969PJ5MKAQ2
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599177600000 maxt=1599184800000 ulid=01EHBGWZ2SH6X5N3JK6T4NRM2Z
level=info ts=2020-09-04T03:47:13.942Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1599156000000 maxt=1599177600000 ulid=01EHBGWZ4S5F3GJ259C7070CYK
level=info ts=2020-09-04T03:47:13.948Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-09-04T03:47:13.949Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=727.383µs
level=info ts=2020-09-04T03:47:13.949Z caller=head.go:661 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-09-04T03:47:13.952Z caller=head.go:687 component=tsdb msg="WAL checkpoint loaded"
level=info ts=2020-09-04T03:47:13.983Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=38 maxSegment=42
level=info ts=2020-09-04T03:47:13.987Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=39 maxSegment=42
level=info ts=2020-09-04T03:47:14.017Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=40 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=41 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:713 component=tsdb msg="WAL segment loaded" segment=42 maxSegment=42
level=info ts=2020-09-04T03:47:14.023Z caller=head.go:716 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=2.967015ms wal_replay_duration=71.121903ms total_replay_duration=74.856822ms
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:700 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:701 msg="TSDB started"
level=info ts=2020-09-04T03:47:14.025Z caller=main.go:805 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:47:14.026Z caller=main.go:833 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-09-04T03:47:14.026Z caller=main.go:652 msg="Server is ready to receive web requests."

希望能得到您的帮助,谢谢。

标签: kubernetesmonitoringprometheus

解决方案


是的,您可以从外部监控 Kubernetes,但强烈建议不要这样做。Prometheus 在 k8s 集群中效果最好。更清洁的解决方案是在集群中拥有 Prometheus + thanos/cortana,并使用辅助的中央 Prometheus 来监控所有内容。

要解决您遇到的问题,您还需要提供证书,请查看: https ://github.com/prometheus/prometheus/blob/release-2.20/documentation/examples/prometheus-kubernetes.yml


推荐阅读