首页 > 解决方案 > Apache Airflow scheduler not running workers on its own

问题描述

I am currently trying Apache Airflow on my system (Ubuntu 18) and I set it up with postgreSQL and RabbitMQ to use the CeleryExecutor.

I run airflow webserver and airflow scheduler on separate consoles, but the scheduler is only putting tasks as queued but no worker is actually running them.

I tried opening a different terminal and run airflow worker on its own and that seemed to do the trick.

Now the scheduler puts tasks on a queue and the worker I ran manually actually executes them.

As I have read, that should not be the case. The scheduler should run the workers on its own right? What could I do to make this work?

I have checked the logs from the consoles and I don't see any errors.

标签: python-3.xpostgresqlairflow

解决方案


This is expected. If you look at the docs for airflow worker, it is specifically to bring up a Celery worker when you're using the CeleryExecutor, while the other executors do not require a separate process for tasks to run.

  • LocalExecutor: uses multiprocessing to run tasks within the scheduler.
  • SequentialExecutor: just runs one task at a time so that happens within the scheduler as well.
  • CeleryExecutor: scales out by having N workers, so having it as a separate command lets you run a worker on as many machines as you'd like.
  • KubernetesExecutor: I imagine talks to your Kubernetes cluster to tell it to run tasks.

推荐阅读