Exporting cloud-optimised GeoTIFF files

Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the NCI and DEA Sandbox environments
Products used: ga_ls8c_ard_3

Background

At the end of an analysis it can be useful to export data to a GeoTIFF file (e.g. outputname.tif), either to save results or to allow for exploring results in a GIS software platform (e.g. ArcGIS or QGIS).

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file (i.e. that can be opened by GIS software like QGIS or ArcMap) aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud.

Description

This notebook shows a number of ways to export a GeoTIFF file using the datacube.utils.cog function write_cog:

Exporting a single-band, single time-slice GeoTIFF from an xarray object loaded through a dc.load query
Exporting a multi-band, single time-slice GeoTIFF from an xarray object loaded through a dc.load query
Exporting multiple GeoTIFFs, one for each time-slice of an xarray object loaded through a dc.load query

In addition, the notebook demonstrates several more advanced applications of write_cog: 1. Exporting data from lazily loaded dask arrays 2. Passing in custom rasterio parameters to override the function’s defaults

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

[1]:

import rasterio
import datacube
from datacube.utils.cog import write_cog

import sys
sys.path.insert(1, '../Tools/')
from dea_tools.plotting import rgb

Connect to the datacube

[2]:

dc = datacube.Datacube(app='Exporting_GeoTIFFs')

Export GeoTIFFs

Single-band, single time-slice data

This method uses the datacube.utils.cog function write_cog (where COG stands for Cloud Optimised GeoTIFF) to export a simple single-band, single time-slice GeoTIFF file. A few important caveats should be noted when using this function:

It requires an xarray.DataArray; supplying an xarray.Dataset will return an error. To convert an xarray.Dataset to an xarray.DataArray run the following:

da = ds.to_array()

This function generates a temporary in-memory GeoTIFF file without compression. This means the function will temporarily use about 1.5 to 2 times the memory of the input xarray.DataArray

[5]:

# Select a single time-slice and a single band from the dataset.
singleband_da = ds.nbart_red.isel(time=0)

# Write GeoTIFF to a location
write_cog(geo_im=singleband_da,
          fname='red_band.tif',
          overwrite=True)

[5]:

PosixPath('red_band.tif')

Multi-band, single time-slice data

Here we select a single time and export all the bands in the dataset using write_cog:

[6]:

# Select a single time-slice
rgb_da = ds.isel(time=0).to_array()

# Write multi-band GeoTIFF to a location
write_cog(geo_im=rgb_da,
          fname='rgb.tif',
          overwrite=True)

[6]:

PosixPath('rgb.tif')

Multi-band, multiple time-slice data

If we want to export all of the time steps in a dataset, we can wrap write_cog in a for-loop and export each time slice as an individual GeoTIFF file:

[7]:

for i in range(len(ds.time)):

    # We will use the date of the satellite image to name the GeoTIFF
    date = ds.isel(time=i).time.dt.strftime('%Y-%m-%d').data
    print(f'Writing {date}')

    # Convert current time step into a `xarray.DataArray`
    singletimestamp_da = ds.isel(time=i).to_array()

    # Write GeoTIFF
    write_cog(geo_im=singletimestamp_da,
              fname=f'{date}.tif',
              overwrite=True)

Writing 2020-01-07
Writing 2020-01-16
Writing 2020-01-23
Writing 2020-02-01
Writing 2020-02-08
Writing 2020-02-17
Writing 2020-02-24
Writing 2020-03-04
Writing 2020-03-11
Writing 2020-03-20
Writing 2020-03-27

Exporting GeoTIFFs from a `dask` array

Note: For more information on using dask, refer to the Parallel processing with Dask notebook

If you pass a lazily-loaded dask array into the function, write_cog will not immediately output a GeoTIFF, but will instead return a dask.delayed object:

[8]:

# Lazily load data using dask
ds_dask = dc.load(product='ga_ls8c_ard_3',
                  dask_chunks={},
                  **query)

# Run `write_cog`
ds_delayed = write_cog(geo_im=ds_dask.isel(time=0).to_array(),
                       fname='dask_geotiff.tif',
                       overwrite=True)

# Print dask.delayed object
print(ds_delayed)

Delayed('_write_cog-94418718-68c7-4e80-a786-c75e7d6fe50a')

To trigger the GeoTIFF to be exported to file, run .compute() on the dask.delayed object. The file will now appear in the file browser to the left.

[9]:

ds_delayed.compute()

[9]:

PosixPath('dask_geotiff.tif')

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: December 2023

Compatible datacube version:

[16]:

print(datacube.__version__)

1.8.6

Exporting cloud-optimised GeoTIFF files

Background

Description

Getting started

Load packages

Connect to the datacube

Load Landsat 8 data from the datacube

Plot an rgb image to confirm we have data

Export GeoTIFFs

Single-band, single time-slice data

Multi-band, single time-slice data

Multi-band, multiple time-slice data

Exporting GeoTIFFs from a `dask` array

Advanced

Passing custom `rasterio` parameters

Unsetting nodata for `float` datatypes containing `NaN`

Additional information

Tags

Exporting cloud-optimised GeoTIFF files

Background

Description

Getting started

Load packages

Connect to the datacube

Load Landsat 8 data from the datacube

Plot an rgb image to confirm we have data

Export GeoTIFFs

Single-band, single time-slice data

Multi-band, single time-slice data

Multi-band, multiple time-slice data

Exporting GeoTIFFs from a dask array

Advanced

Passing custom rasterio parameters

Unsetting nodata for float datatypes containing NaN

Additional information

Tags

Exporting GeoTIFFs from a `dask` array

Passing custom `rasterio` parameters

Unsetting nodata for `float` datatypes containing `NaN`