Polygon drill
Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_ls8c_ard_3
Special requirements: A vector file (e.g. GeoJSON, Esri Shapefile) containing the polygon you would like to use for the analysis. Here we use a polygon extracted from the ACT Suburb/Locality Boundaries Administrative Boundaries © Geoscape Australia dataset.
Description
A polygon drill can be used to grab a stack of imagery that corresponds to the location of an input polygon. It is a useful tool for generating animations, or running analyses over long time periods.
This notebook shows you how to:
Use a polygon’s geometry to generate a
dc.load
queryMask the returned data with the polygon geometry (to remove unwanted pixels)
Plot a timeseries plot summarising satellite bands over time
Getting started
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages
Import Python packages that are used for the analysis.
[1]:
%matplotlib inline
import datacube
import geopandas as gpd
import odc.geo.xr
from odc.geo.geom import Geometry
import sys
sys.path.insert(1, "../Tools/")
from dea_tools.plotting import rgb
Connect to the datacube
Connect to the datacube so we can access DEA data. The app
parameter is a unique name for the analysis which is based on the notebook file name.
[2]:
dc = datacube.Datacube(app="Polygon_drill")
Analysis parameters
Update the following two parameters to set the polygon dataset and times used for the polygon drill:
polygon_to_drill
: The path containing the vector file to use for the polygon drill. If it’s a local file, then this parameter is the local path to that polygon. If it’s located online, then this is the path to the online location of the polygon.time_to_drill
: e.g.('2020-02', '2020-03')
. The time over which we want to run the polygon drill, entered as a tuple.
For this example, we will use a polygon for Capital Hill in Canberra extracted from the ACT Suburb/Locality Boundaries Administrative Boundaries © Geoscape Australia vector dataset, licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).
[3]:
polygon_to_drill = (
"../Supplementary_data/Polygon_drill/act_localities_capitalhill.geojson"
)
time_to_drill = ("2020-01", "2020-03")
Load the vector file we want to use for the polygon drill
[4]:
# Read vector file
polygon_to_drill = gpd.read_file(polygon_to_drill)
# Check that the polygon loaded as expected. We'll just print the first 3 rows to check
polygon_to_drill.head(3)
[4]:
LC_PLY_PID | LOC_PID | DT_CREATE | LOC_NAME | LOC_CLASS | STATE | geometry | |
---|---|---|---|---|---|---|---|
0 | lcpb8f6c0999acd | loceae77f35464f | 2022-03-11 | Capital Hill | Gazetted Locality | ACT | MULTIPOLYGON (((149.12849 -35.31083, 149.12848... |
Query the datacube using the polygon we have loaded
Set up the dc.load
query
We need to grab the polygon from the vector file we want to use for the polygon drill. For this example, we’ll just grab the first polygon from the file using .iloc[0]
(Capital Hill in the ACT).
To use this polygon in datacube
, we need to first convert it to a odc.geo.geom.Geometry
object. This is the equivelent of a shapely
geometry, but with added Coordinate Reference System information (for more information, refer to the odc-geo documentation).
[5]:
# Select polygon
shapely_geometry = polygon_to_drill.iloc[0].geometry
# Convert to Geometry object with CRS information
geom = Geometry(geom=shapely_geometry, crs=polygon_to_drill.crs)
geom
[5]:
To set up the query, we need to set up several additional parameters:
geopolygon
: Here we input the geometry we want to use for the drill that we prepared in the cell abovetime
: Feed in thetime_to_drill
parameter we set earlieroutput_crs
: We need to specify the coordinate reference system (CRS) of the output. We’ll use Albers Equal Area projection for Australiaresolution
: You can choose the resolution of the output dataset. Since Landsat 8 is 30 m resolution, we’ll just use thatmeasurements
: Here is where you specify which bands you want to extract. We will just be plotting a true colour image, so we just need red, green and blue.
[6]:
query = {
"geopolygon": geom,
"time": time_to_drill,
"output_crs": "EPSG:3577",
"resolution": (-30, 30),
"measurements": ["nbart_red", "nbart_green", "nbart_blue"],
}
Use the query to extract data
Here we have chosen to load data from Landsat 8 by supplying product='ga_ls8c_ard_3'
, but this can be changed depending on your requirements.
We can verify that the polygon drill has loaded a time series of satellite data by checking the Dimensions
of the resulting xarray.Dataset
. In this example, we can see that the polygon drill has loaded 11 time steps (i.e. Dimensions: time: 11
):
[7]:
# Load data for our polygon and time period
ds = dc.load(product="ga_ls8c_ard_3", group_by="solar_day", **query)
# Check we have some data back with multiple timesteps
ds
[7]:
<xarray.Dataset> Dimensions: (time: 11, y: 33, x: 32) Coordinates: * time (time) datetime64[ns] 2020-01-07T23:56:40.592891 ... 2020-03... * y (y) float64 -3.96e+06 -3.96e+06 ... -3.961e+06 -3.961e+06 * x (x) float64 1.549e+06 1.549e+06 1.549e+06 ... 1.55e+06 1.55e+06 spatial_ref int32 3577 Data variables: nbart_red (time, y, x) int16 1043 1157 1222 1061 ... 2403 2743 3157 3047 nbart_green (time, y, x) int16 1014 1116 1150 1023 ... 2321 2650 2972 2935 nbart_blue (time, y, x) int16 944 1019 1029 932 ... 2201 2491 2862 2848 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Plot original satellite data
To inspect the satellite data we have loaded using the pixel drill, we can plot an image for each timestep in the data:
[8]:
rgb(ds, col="time", vmin=0, vmax=2000)
Mask data using polygon
The data returned from our polygon drill contains data for the bounding box (i.e. extents) of the input polygon, not the actual shape of the polygon. To get rid of the parts of the image located outside the polygon, we need to mask our using the original polygon.
This can be achieved easily using the built-in .odc.mask()
method on our xarray
data, which will set all pixels outside of the mask to nan
to signify “nodata”:
[9]:
# Mask out all pixels outside of our polygon:
ds_masked = ds.odc.mask(poly=geom)
Plot masked data
When we plot this new masked dataset, we can see that the areas located outside of the polygon have now been masked out (i.e. set to nan
or white).
Note: Several of our images are also covered in cloud - for advice on applying a cloud mask to remove these cloudy pixels see the Masking data using the Fmask cloud mask notebook.
[10]:
rgb(ds_masked, col="time", vmin=0, vmax=2000)
/env/lib/python3.10/site-packages/matplotlib/cm.py:494: RuntimeWarning: invalid value encountered in cast
xx = (xx * 255).astype(np.uint8)
Summarise results over time into a timeseries plot
Now that we have produced a masked satellite dataset, we can summarise values across time to plot it as a polygon drill timeseries. This can be useful for summarising environmental change over time (e.g. drying or wetting of the landscape, or the growth or less of vegetation cover).
[11]:
# Average of the red band across time
red_timeseries = ds_masked.nbart_red.mean(dim=["x", "y"])
red_timeseries.plot()
[11]:
[<matplotlib.lines.Line2D at 0x7fae8e2351e0>]
We can also summarise all of our bands across time as a pandas.Dataframe()
, and plot them on a single graph:
[12]:
# Plot all bands at once
timeseries_df = ds_masked.mean(dim=["x", "y"]).drop("spatial_ref").to_dataframe()
display(timeseries_df)
timeseries_df.plot()
nbart_red | nbart_green | nbart_blue | |
---|---|---|---|
time | |||
2020-01-07 23:56:40.592891 | 1243.424683 | 1170.858887 | 1051.900391 |
2020-01-16 23:50:27.927367 | 3740.156494 | 3802.088867 | 3901.860107 |
2020-01-23 23:56:36.765939 | 1093.314331 | 936.836304 | 692.188599 |
2020-02-01 23:50:22.661832 | 8965.114258 | 8898.247070 | 8858.674805 |
2020-02-08 23:56:31.774346 | 8857.622070 | 8714.263672 | 8629.692383 |
2020-02-17 23:50:19.161045 | 1192.411621 | 1133.207642 | 1008.849365 |
2020-02-24 23:56:28.025939 | 821.346375 | 712.434143 | 492.763947 |
2020-03-04 23:50:13.948244 | 8824.697266 | 8708.142578 | 8736.099609 |
2020-03-11 23:56:21.709961 | 2946.153076 | 2853.225342 | 2732.492188 |
2020-03-20 23:50:06.289323 | 894.078308 | 837.536194 | 617.673767 |
2020-03-27 23:56:12.950767 | 3733.026123 | 3590.622803 | 3412.762695 |
[12]:
<Axes: xlabel='time'>
Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
GitHub.
Last modified: February 2024
Compatible datacube version:
[13]:
print(datacube.__version__)
1.8.17
Tags
Tags: NCI compatible, sandbox compatible, landsat 8, rgb, polygon drill, shapefile, GeoPandas, odc.geo.geom.geometry, query, masking