首页 > 解决方案 > 在 dask 分布式中具有正确资源分配的工人示例

问题描述

有没有人有一个工作示例代码显示您可以使用client.submitdask 分布式在这里提供的 api 有选择地使用 CPU 和 GPU 工作人员?

我正在尝试在 GPU 机器上以分布式方式使用 dask-cudf 训练 xgboost,但我无法使其尊重我为不同任务提供的资源标签

标签: daskxgboostdask-distributed

解决方案


我的朋友和同事 @pentschev (github) 想从这里向您指出这个示例: https ://github.com/dask/distributed/pull/4869#issue-909265778

import asyncio
import threading
import dask
from dask.distributed import Client, Scheduler, Worker
from distributed.threadpoolexecutor import ThreadPoolExecutor

def get_thread_name(prefix):
    return prefix + threading.current_thread().name

async def main():
    async with Scheduler() as s:
        async with Worker(
            s.address,
            nthreads=5,
            executor={
                "GPU": ThreadPoolExecutor(1, thread_name_prefix="Dask-GPU-Threads")
            },
            resources={"GPU": 1, "CPU": 4},
        ) as w:
            async with Client(s.address, asynchronous=True) as c:
                with dask.annotate(resources={"CPU": 1}, executor="default"):
                    print(await c.submit(get_thread_name, "CPU-"))
                with dask.annotate(resources={"GPU": 1}, executor="GPU"):
                    print(await c.submit(get_thread_name, "GPU-"))

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())

输出:

CPU-Dask-Default-Threads'-29802-2
GPU-Dask-GPU-Threads-29802-3

推荐阅读