首页 > 解决方案 > Dask on single OSX machine - is it parallel by default?

问题描述

I have installed Dask on OSX Mojave. Does it execute computations in parallel by default? Or do I need to change some settings?

I am using the DataFrame API. Does that make a difference to the answer?

I installed it with pip. Does that make a difference to the answer?

标签: pythondaskdask-distributed

解决方案


Yes, Dask is parallel by default.

Unless you specify otherwise, or create a distributed Client, execution will happen with the "threaded" scheduler, in a number of threads equal to your number of cores. Note, however, that because of the python GIL (only one python instruction executed at a time), you may not get as much parallelism as available, depending on how good your specific tasks are at releasing the GIL. That is why you have a choice of schedulers.

Being on OSX, installing with pip: these make no difference. Using dataframes makes a difference in that it dictates the sorts of tasks you're likely running. Pandas is good at releasing the GIL for many operations.


推荐阅读