首页 > 解决方案 > Can you specify the number of threads for certain tasks in a DAG?

问题描述

I'm very new to Airflow and while I have read the docs and some answers about Airflow's configuration regarding parallelism, it seems I have not yet found the answer to specifying threads used in a task.

My current case is I have 5 tasks (in the form of a Python script) that only do API calls (but to different API service) and transform the data. For each task I can make up to 1000+ calls, so I try to utilize multithreading in the script. Unfortunately, when I try to run the multithreaded script in Airflow, it doesn't use the multithreading mechanism in the script. I feel like this is because of Airflow configuration that overrides the child script or am I wrong? Any help or answer is appreciated, thank you.

标签: airflow

解决方案


使用KubernetesPodOperator运行您的脚本。

您可以使用 python 基础映像并按原样运行您的脚本。这应该与您在本地执行脚本的方式非常相似,但现在它是在 kubernetes pod 中完成的。


推荐阅读