python - 如何在多个 Celery 任务中结合实施 Prometheus 监控?
问题描述
我有一个设置,我运行多个 (3) 芹菜工人,我有 8 个不同的任务: - 芹菜 - 高频作业:任务 1,任务 2 - 低频作业:任务 3-8,每个任务都在自己的 kubernetes pod 中。
我想使用普罗米修斯实现监控。为此,我正在使用库 prometheus_client。
from celery import Celery, signals
from prometheus_client import start_http_server as start_prometheus_http_server
REDIS_HOST = os.environ.get("REDIS_HOST", "localhost")
BROKER_URL = f"redis://{REDIS_HOST}:6379/0"
app = Celery("tasks", broker=BROKER_URL)
app.conf.task_routes = {
"hifreq.main": {"queue": "main_queue"},
"hifreq.final": {"queue": "final_queue"},
"lowfreq.*": {"queue": "lowfreq_queue"},
}
@signals.celeryd_after_setup.connect
def setup_direct_queue(sender, instance, **kwargs):
start_prometheus_http_server(9090)
@app.task(name="hifreq.main")
def long_running_task():
data_loading()
DATA_LOADING_TIME = Summary(
"data_loading_seconds",
"Time spent loading the data",
)
@DATA_LOADING_TIME.time()
def data_loading():
pass
这将启动普罗米修斯服务器(我认为它会为每个工作人员启动一个)。我已经通过入口/服务公开了它,以便我可以访问服务器,当我导航到运行“hifreq”工作程序的 pod 时,我得到:
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 5828.0
python_gc_objects_collected_total{generation="1"} 1643.0
python_gc_objects_collected_total{generation="2"} 294.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 152.0
python_gc_collections_total{generation="1"} 13.0
python_gc_collections_total{generation="2"} 2.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="6",patchlevel="9",version="3.6.9"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.19164416e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.4453888e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58876788561e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.1
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 30.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
这是默认的 Python 指标,但不是data_loading_seconds
我自己定义的预期指标。我怀疑每个拥有自己服务器的多个工作人员出了点问题,但我不太确定到底出了什么问题。任何帮助表示赞赏!
解决方案
推荐阅读
- python - Python中对属性的需求
- javascript - 谷歌折线图轴标签被删除
- sql - 反转字符串顺序
- asp.net-core - 在 Blazor WebAssembly 解决方案中,如何在与服务器 API 不同的端口上托管 Blazor WASM 客户端应用程序
- c++ - 获取指定字符数组的前 n 个元素
- javascript - Amcharts 显示/隐藏自定义图像项目符号
- java - Android 10: java.lang.SecurityException: getDeviceId: 用户 10222 不满足访问设备标识符的要求
- sql-server - 是否建议将 XACT_ABORT “ON”设置为读取操作?
- java - 天空盒纹理是透明的
- android - 计算下午 4:00 - 凌晨 1:00(午夜)之间的时差