SpatioTemporal Asset Catalogue (STAC)

SpatioTemporal Asset Catalog (STAC) is a specification that consistently describes geospatial information so it can more easily be discovered and accessed. STAC provides a powerful tool to quickly identify all available data for a given product, location or time period. This data can then be easily and efficiently loaded into your own computing environment, or streamed directly into desktop GIS software like QGIS or ESRI.

Why use STAC?

STAC may be useful for you if any of the following apply:

  • You want to easily find and load satellite data products through time or across large spatial areas

  • You want to access satellite data on your own computing environment

  • Your analysis requires more memory or processing power than provided by DEA’s managed Sandbox or NCI environments

  • You want to combine DEA data with data from other external sources (e.g. Microsoft Planetary Computer, Element 84 Earth Search)

odc-stac tutorial

This tutorial demonstrates how to use the odc-stac Python package to load data from DEA using the DEA Explorer STAC API. The odc-stac package translates STAC metadata to the Open Data Cube data model (Table 1), allowing you to load DEA data into xarray.Dataset format to be processed locally, or distribute data loading and computation with Dask.

Table 1: Comparison between STAC and Open Data Cube concepts

STAC

ODC

Description

Collection

Product or DatasetType

A collection of observations across space and time (e.g. all observations from a Landsat or Sentinel-2 satellite sensor)

Item

Dataset

A single observation for a specific time and place, containing one or more bands (for example, a specific Landsat or Sentinel-2 scene)

Asset

Measurement

A component of an observation, including but not limited to bands

Band

Measurement

A single data layer/array within a multi-band observation (e.g. spectral data from a specific wavelength)

Common Name

Alias

Alternative names for the same band


Setup

Import required packages for querying and loading data from STAC:

[1]:
import pystac_client
import odc.stac

Connect to the DEA Explorer STAC API to allow searching for data:

[2]:
catalog = pystac_client.Client.open("https://explorer.dea.ga.gov.au/stac")

To load data via STAC, we must configure appropriate access to data stored on DEA’s Amazon S3 buckets. This can be done with the odc.stac.configure_rio function. The configuration below must be used when loading any DEA data through the STAC API.

[3]:
odc.stac.configure_rio(
    cloud_defaults=True,
    aws={"aws_unsigned": True},
)

Searching for STAC data using pystac_client

First we need to define the location, time period and DEA product we want to load.

In this example, we will load December 2021 Landsat 8 data over Canberra:

[4]:
# Set a bounding box
# [xmin, ymin, xmax, ymax] in latitude and longitude
bbox = [149.05, -35.32, 149.17, -35.25]

# Set a start and end date
start_date = "2021-12-01"
end_date = "2021-12-31"

# Set product ID as the STAC "collection"
collections = ["ga_ls8c_ard_3"]

Now we can use the pystac_client Python package to search for STAC items that match our query:

[5]:
# Build a query with the parameters above
query = catalog.search(
    bbox=bbox,
    collections=collections,
    datetime=f"{start_date}/{end_date}",
)

# Search the STAC catalog for all items matching the query
items = list(query.items())
print(f"Found: {len(items):d} datasets")
Found: 8 datasets

Loading data using odc-stac

Once we have found data to load, we can use the odc.stac.load() function to load them as xarray.Dataset format.

This works similarly to how the datacube.load() function is used to load data from DEA on the DEA Sandbox and on the NCI.

[6]:
ds = odc.stac.load(
    items,
    bands=["nbart_red"],
    crs="EPSG:32655",
    resolution=30,
    groupby="solar_day",
    bbox=bbox,
)

ds
[6]:
<xarray.Dataset> Size: 2MB
Dimensions:      (y: 268, x: 370, time: 4)
Coordinates:
  * y            (y) float64 2kB -3.903e+06 -3.903e+06 ... -3.911e+06 -3.911e+06
  * x            (x) float64 3kB 6.864e+05 6.864e+05 ... 6.974e+05 6.974e+05
    spatial_ref  int32 4B 32655
  * time         (time) datetime64[ns] 32B 2021-12-04T23:50:39.744022 ... 202...
Data variables:
    nbart_red    (time, y, x) float32 2MB 5.597e+03 5.571e+03 ... 5.203e+03

We can now plot and analyse our data:

[7]:
ds.nbart_red.plot(col="time", robust=True);
../../../../_images/guides_setup_gis_stac_14_0.png

Advanced

Filtering

The DEA STAC API supports filtering data by metadata fields before loading it using the filter extention parameter. This can be useful for limiting data loads to the most useful data (e.g. least affected by clouds, most geometrically accurate etc).

To inspect the fields we can filter on, we can look at a single STAC item and expand the properties dropdown:

[8]:
items[0]
[8]:

For example, we can filter by the eo:cloud_cover property to load only mostly cloud free observations (e.g. less than 10% cloud).

For more information about using filter, refer to the STAC guide here.

[9]:
# Set up a filter query
filter_query = "eo:cloud_cover < 10"

# Query with filtering
query = catalog.search(
    bbox=bbox,
    collections=collections,
    datetime=f"{start_date}/{end_date}",
    filter=filter_query,
)

# Load our filtered data
ds_filtered = odc.stac.load(
    query.items(),
    bands=["nbart_red"],
    crs="EPSG:32655",
    resolution=30,
    groupby="solar_day",
    bbox=bbox,
)

# Plot our filtered data
ds_filtered.nbart_red.plot(col="time", robust=True);
../../../../_images/guides_setup_gis_stac_18_0.png

Filtering can be performed on multiple metadata fields at once - for example, filtering by both cloud cover and geometric accuracy:

filter_query = "(eo:cloud_cover < 10) AND (gqa:abs_iterative_mean_xy < 1)"

Sorting

The DEA STAC API supports sorting results by metadata fields using the sortby extension parameter. For example, we can request that STAC items are returned in ascending order of cloud cover:

[10]:
# Query with sorting
query = catalog.search(
    bbox=bbox,
    collections=collections,
    datetime=f"{start_date}/{end_date}",
    sortby="eo:cloud_cover",
)

# Print out cloud cover values from low to high
[i.properties["eo:cloud_cover"] for i in query.items()]
[10]:
[0.7413425371604095,
 5.218904736222981,
 5.84303770341032,
 16.825931028459298,
 28.258366277957347,
 41.91683359769932,
 85.11645459806458,
 98.44253190598418]

Items can be sorted in descending order by prefixing the property name with -:

[11]:
# Query with sorting
query = catalog.search(
    bbox=bbox,
    collections=collections,
    datetime=f"{start_date}/{end_date}",
    sortby="-eo:cloud_cover",
)

# Print out cloud cover values from high to low
[i.properties["eo:cloud_cover"] for i in query.items()]
[11]:
[98.44253190598418,
 85.11645459806458,
 41.91683359769932,
 28.258366277957347,
 16.825931028459298,
 5.84303770341032,
 5.218904736222981,
 0.7413425371604095]

Fields

We can also restrict the metadata fields we load from the DEA STAC API using the filter extension parameter. This can be useful for more efficiently returning a small subset of fields from large volumes of metadata.

Metadata fields to load can be configured by passing in a dictionary with “include” and “exclude” keys. For example, we can choose to include only the odc:region_code and eo:cloud_cover metadata fields (note that we need to prefix these field names with properties.{}):

[12]:
# Query with sorting
query = catalog.search(
    bbox=bbox,
    collections=collections,
    datetime=f"{start_date}/{end_date}",
    fields={"include": ["properties.odc:region_code", "properties.eo:cloud_cover"]},
)

# Inspect returned STAC item properties
list(query.items())[0].properties
[12]:
{'datetime': '2021-12-04T23:50:39.744022Z',
 'eo:cloud_cover': 85.11645459806458,
 'odc:region_code': '090084'}

Additional resources

Explore the following Jupyter Notebooks for more in-depth guides to querying and loading data from STAC:

For more information about pystac_client and odc-stac: