dea_tools.dask

Tools for simplifying the creation of Dask clusters for parallelised computing.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Discord chat (https://discord.com/invite/4hhBQVas5U) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, you can file one on GitHub (GeoscienceAustralia/dea-notebooks#new).

Last modified: July 2025

Functions

create_dask_gateway_cluster([profile, workers])

Create a cluster in our internal dask cluster.

create_local_dask_cluster([display_client, ...])

Create a local Dask cluster for parallelised computing using dask.distributed.Client.

dea_tools.dask.create_dask_gateway_cluster(profile='r5_L', workers=2)[source]

Create a cluster in our internal dask cluster.

Parameters:
  • profile (str) –

    Possible values are:
    • r5_L (2 cores, 15GB memory)

    • r5_XL (4 cores, 31GB memory)

    • r5_2XL (8 cores, 63GB memory)

    • r5_4XL (16 cores, 127GB memory)

  • workers (int) – Number of workers in the cluster.

dea_tools.dask.create_local_dask_cluster(display_client=True, return_client=False, configure_rio=True, n_workers=1, threads_per_worker=None, memory_limit='spare_mem', **kwargs)[source]

Create a local Dask cluster for parallelised computing using dask.distributed.Client.

Example use:

from dea_dask import create_local_dask_cluster create_local_dask_cluster()

Parameters:
  • display_client (bool, optional) – An optional boolean indicating whether to display a summary of the dask client, including a link to monitor progress of the analysis. Set to False to hide this display.

  • return_client (bool, optional) – An optional boolean indicating whether to return the dask client object.

  • configure_rio (bool, optional) – An optional boolean indicating whether to configure rasterio with cloud defaults and unsigned AWS access. Set to False to not apply these defaults.

  • n_workers (int, optional) – Number of workers to start, default is set to 1 which works well with loading ODC data.

  • threads_per_worker (int, optional) – Number of threads per each worker, by default this will be set to the number of cpus on the machine.

  • memory_limit (str, float, int, or None, optional) – Sets the memory limit per worker. Default is ‘spare_mem’, where 95 % of the available system memory is split among the number of workers, allowing spare memory to be withheld from the cluster. To see other options: https://distributed.dask.org/en/stable/api.html#distributed.Client

  • **kwargs – Additional keyword arguments passed to dask.distributed.Client. For full options, see: https://distributed.dask.org/en/stable/api.html#distributed.Client