Polygon drill 9c1d446f689d4051aa1a43d9e4615328


A polygon drill can be used to grab a stack of imagery that corresponds to the location of an input polygon. It is a useful tool for generating animations, or running analyses over long time periods.

This notebook shows you how to:

  1. Use a polygon’s geometry to generate a dc.load query

  2. Mask the returned data with the polygon geometry (to remove unwanted pixels)

  3. Plot a timeseries plot summarising satellite bands over time

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

%matplotlib inline

import datacube
import geopandas as gpd
import odc.geo.xr
from odc.geo.geom import Geometry

import sys

sys.path.insert(1, "../Tools/")
from dea_tools.plotting import rgb

Connect to the datacube

Connect to the datacube so we can access DEA data. The app parameter is a unique name for the analysis which is based on the notebook file name.

dc = datacube.Datacube(app="Polygon_drill")

Analysis parameters

Update the following two parameters to set the polygon dataset and times used for the polygon drill:

  • polygon_to_drill: The path containing the vector file to use for the polygon drill. If it’s a local file, then this parameter is the local path to that polygon. If it’s located online, then this is the path to the online location of the polygon.

  • time_to_drill: e.g. ('2020-02', '2020-03'). The time over which we want to run the polygon drill, entered as a tuple.

For this example, we will use a polygon for Capital Hill in Canberra extracted from the ACT Suburb/Locality Boundaries Administrative Boundaries © Geoscape Australia vector dataset, licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).

polygon_to_drill = (
time_to_drill = ("2020-01", "2020-03")

Load the vector file we want to use for the polygon drill

# Read vector file
polygon_to_drill = gpd.read_file(polygon_to_drill)

# Check that the polygon loaded as expected. We'll just print the first 3 rows to check
0 lcpb8f6c0999acd loceae77f35464f 2022-03-11 Capital Hill Gazetted Locality ACT MULTIPOLYGON (((149.12849 -35.31083, 149.12848...

Query the datacube using the polygon we have loaded

Set up the dc.load query

We need to grab the polygon from the vector file we want to use for the polygon drill. For this example, we’ll just grab the first polygon from the file using .iloc[0] (Capital Hill in the ACT).

To use this polygon in datacube, we need to first convert it to a odc.geo.geom.Geometry object. This is the equivelent of a shapely geometry, but with added Coordinate Reference System information (for more information, refer to the odc-geo documentation).

# Select polygon
shapely_geometry = polygon_to_drill.iloc[0].geometry

# Convert to Geometry object with CRS information
geom = Geometry(geom=shapely_geometry, crs=polygon_to_drill.crs)

To set up the query, we need to set up several additional parameters:

  • geopolygon: Here we input the geometry we want to use for the drill that we prepared in the cell above

  • time: Feed in the time_to_drill parameter we set earlier

  • output_crs: We need to specify the coordinate reference system (CRS) of the output. We’ll use Albers Equal Area projection for Australia

  • resolution: You can choose the resolution of the output dataset. Since Landsat 8 is 30 m resolution, we’ll just use that

  • measurements: Here is where you specify which bands you want to extract. We will just be plotting a true colour image, so we just need red, green and blue.

query = {
    "geopolygon": geom,
    "time": time_to_drill,
    "output_crs": "EPSG:3577",
    "resolution": (-30, 30),
    "measurements": ["nbart_red", "nbart_green", "nbart_blue"],

Use the query to extract data

Here we have chosen to load data from Landsat 8 by supplying product='ga_ls8c_ard_3', but this can be changed depending on your requirements.

We can verify that the polygon drill has loaded a time series of satellite data by checking the Dimensions of the resulting xarray.Dataset. In this example, we can see that the polygon drill has loaded 11 time steps (i.e. Dimensions: time: 11):

# Load data for our polygon and time period
ds = dc.load(product="ga_ls8c_ard_3", group_by="solar_day", **query)

# Check we have some data back with multiple timesteps
Dimensions:      (time: 11, y: 33, x: 32)
  * time         (time) datetime64[ns] 2020-01-07T23:56:40.592891 ... 2020-03...
  * y            (y) float64 -3.96e+06 -3.96e+06 ... -3.961e+06 -3.961e+06
  * x            (x) float64 1.549e+06 1.549e+06 1.549e+06 ... 1.55e+06 1.55e+06
    spatial_ref  int32 3577
Data variables:
    nbart_red    (time, y, x) int16 1043 1157 1222 1061 ... 2403 2743 3157 3047
    nbart_green  (time, y, x) int16 1014 1116 1150 1023 ... 2321 2650 2972 2935
    nbart_blue   (time, y, x) int16 944 1019 1029 932 ... 2201 2491 2862 2848
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Plot original satellite data

To inspect the satellite data we have loaded using the pixel drill, we can plot an image for each timestep in the data:

rgb(ds, col="time", vmin=0, vmax=2000)

Mask data using polygon

The data returned from our polygon drill contains data for the bounding box (i.e. extents) of the input polygon, not the actual shape of the polygon. To get rid of the parts of the image located outside the polygon, we need to mask our using the original polygon.

This can be achieved easily using the built-in .odc.mask() method on our xarray data, which will set all pixels outside of the mask to nan to signify “nodata”:

# Mask out all pixels outside of our polygon:
ds_masked = ds.odc.mask(poly=geom)

Plot masked data

When we plot this new masked dataset, we can see that the areas located outside of the polygon have now been masked out (i.e. set to nan or white).

Note: Several of our images are also covered in cloud - for advice on applying a cloud mask to remove these cloudy pixels see the Masking data using the Fmask cloud mask notebook.

rgb(ds_masked, col="time", vmin=0, vmax=2000)
/env/lib/python3.10/site-packages/matplotlib/cm.py:494: RuntimeWarning: invalid value encountered in cast
  xx = (xx * 255).astype(np.uint8)

Summarise results over time into a timeseries plot

Now that we have produced a masked satellite dataset, we can summarise values across time to plot it as a polygon drill timeseries. This can be useful for summarising environmental change over time (e.g. drying or wetting of the landscape, or the growth or less of vegetation cover).

# Average of the red band across time
red_timeseries = ds_masked.nbart_red.mean(dim=["x", "y"])
[<matplotlib.lines.Line2D at 0x7fae8e2351e0>]

We can also summarise all of our bands across time as a pandas.Dataframe(), and plot them on a single graph:

# Plot all bands at once
timeseries_df = ds_masked.mean(dim=["x", "y"]).drop("spatial_ref").to_dataframe()
nbart_red nbart_green nbart_blue
2020-01-07 23:56:40.592891 1243.424683 1170.858887 1051.900391
2020-01-16 23:50:27.927367 3740.156494 3802.088867 3901.860107
2020-01-23 23:56:36.765939 1093.314331 936.836304 692.188599
2020-02-01 23:50:22.661832 8965.114258 8898.247070 8858.674805
2020-02-08 23:56:31.774346 8857.622070 8714.263672 8629.692383
2020-02-17 23:50:19.161045 1192.411621 1133.207642 1008.849365
2020-02-24 23:56:28.025939 821.346375 712.434143 492.763947
2020-03-04 23:50:13.948244 8824.697266 8708.142578 8736.099609
2020-03-11 23:56:21.709961 2946.153076 2853.225342 2732.492188
2020-03-20 23:50:06.289323 894.078308 837.536194 617.673767
2020-03-27 23:56:12.950767 3733.026123 3590.622803 3412.762695
<Axes: xlabel='time'>

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last modified: February 2024

Compatible datacube version:



Tags: NCI compatible, sandbox compatible, landsat 8, rgb, polygon drill, shapefile, GeoPandas, odc.geo.geom.geometry, query, masking