dea_tools.dask
Tools for simplifying the creation of Dask clusters for parallelised computing.
License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat (https://discord.com/invite/4hhBQVas5U) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).
If you would like to report an issue with this script, you can file one on GitHub (GeoscienceAustralia/dea-notebooks#new).
Last modified: July 2025
Functions
|
Create a cluster in our internal dask cluster. |
|
Create a local Dask cluster for parallelised computing using |
- dea_tools.dask.create_dask_gateway_cluster(profile='r5_L', workers=2)[source]
Create a cluster in our internal dask cluster.
- Parameters:
profile (str) –
- Possible values are:
r5_L (2 cores, 15GB memory)
r5_XL (4 cores, 31GB memory)
r5_2XL (8 cores, 63GB memory)
r5_4XL (16 cores, 127GB memory)
workers (int) – Number of workers in the cluster.
- dea_tools.dask.create_local_dask_cluster(display_client=True, return_client=False, configure_rio=True, n_workers=1, threads_per_worker=None, memory_limit='spare_mem', **kwargs)[source]
Create a local Dask cluster for parallelised computing using
dask.distributed.Client
.Example use:
from dea_dask import create_local_dask_cluster create_local_dask_cluster()
- Parameters:
display_client (bool, optional) – An optional boolean indicating whether to display a summary of the dask client, including a link to monitor progress of the analysis. Set to False to hide this display.
return_client (bool, optional) – An optional boolean indicating whether to return the dask client object.
configure_rio (bool, optional) – An optional boolean indicating whether to configure
rasterio
with cloud defaults and unsigned AWS access. Set to False to not apply these defaults.n_workers (int, optional) – Number of workers to start, default is set to 1 which works well with loading ODC data.
threads_per_worker (int, optional) – Number of threads per each worker, by default this will be set to the number of cpus on the machine.
memory_limit (str, float, int, or None, optional) – Sets the memory limit per worker. Default is ‘spare_mem’, where 95 % of the available system memory is split among the number of workers, allowing spare memory to be withheld from the cluster. To see other options: https://distributed.dask.org/en/stable/api.html#distributed.Client
**kwargs – Additional keyword arguments passed to
dask.distributed.Client
. For full options, see: https://distributed.dask.org/en/stable/api.html#distributed.Client