dea_tools.spatial
Tools for spatially manipulating Digital Earth Australia data.
License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat (https://discord.com/invite/4hhBQVas5U) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=opendatacube) using the opendatacube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/opendatacube).
If you would like to report an issue with this script, file one on GitHub: GeoscienceAustralia/deanotebooks#new
Last modified: July 2024
Functions

Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo. 

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex. 

Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries. 

Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith. 

Perform Inverse Distance Weighting (IDW) interpolation. 

This function takes points with X, Y and Z coordinates, and interpolates Zvalues across the extent of an existing xarray dataset. 

Takes a boolean array and identifies the largest contiguous region of connected True values. 

Generates evenlyspaced point features along a specific line feature in a geopandas.GeoDataFrame. Parameters:  gdf : geopandas.GeoDataFrame A geopandas.GeoDataFrame containing line features with an index and CRS. index : string or int An value giving the index of the line to generate points along distance : integer or float, optional A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line. Returns:  points_gdf : geopandas.GeoDataFrame A geopandas.GeoDataFrame containing point features at every distance along the selected line. 

Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plaintext description of the location in the form: 

Uses skimage.measure.find_contours to extract multiple zvalue contour lines from a twodimensional array (e.g. multiple elevations from a single DEM), or one zvalue for each array along a specified dimension of a multidimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries). 

For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with 'sun_elevation' and 'sun_azimuth' variables. 

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG 

This function takes a geopandas.GeoDataFrame points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset. 

Rasterizes a vector 

Vectorises a raster 

Summarizing raster datasets based on vector geometries in parallel. 
 dea_tools.spatial.add_geobox(ds, crs=None)[source]
Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.
If ds is missing a Coordinate Reference System (CRS), this can be supplied using the crs param.
 Parameters:
ds (xarray.Dataset or xarray.DataArray) – Input xarray object that needs to be checked for spatial information.
crs (str, optional) – Coordinate Reference System (CRS) information for the input ds array. If ds already has a CRS, then crs is not required. Default is None.
 Returns:
The input xarray object with added .odc.x attributes to access spatial information.
 Return type:
xarray.Dataset or xarray.DataArray
 dea_tools.spatial.contours_to_arrays(gdf, col)[source]
This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex. This data can then be used as an input to interpolation procedures (e.g. using a function like interpolate_2d.
Last modified: October 2021
 Parameters:
gdf (Geopandas GeoDataFrame) – A GeoPandas GeoDataFrame of lines to convert into point coordinates.
col (str) – A string giving the name of the GeoDataFrame field to use as Zvalues.
 Returns:
A numpy array with three columns giving the X, Y and Z coordinates
of each vertex in the input GeoDataFrame.
 dea_tools.spatial.extract_vertices(gdf, explode=True, ignore_index=True)[source]
Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.
 Parameters:
gdf (geopandas.GeoDataFrame) – Input GeoDataFrame containing geometries to be converted.
explode (bool, optional) – By default, MultiPoint geometries will be exploded into individual Points. If False, geometries will be returned as MultiPoints.
ignore_index (bool, optional) – If True and explode=True, the resulting GeoDataFrame will have a new index.
 Returns:
Updated GeoDataFrame with geometries converted to Points or MultiPoints.
 Return type:
geopandas.GeoDataFrame
 dea_tools.spatial.hillshade(dem, elevation, azimuth, vert_exag=1, dx=30, dy=30)[source]
Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.
Parameters:
 demnumpy.array
A 2D Digital Elevation Model array.
 elevationint or float
Sun elevation (090, degrees up from horizontal).
 azimithint or float
Sun azimuth (0360, degrees clockwise from north).
 vert_exagint or float, optional
The amount to exaggerate the elevation values by when calculating illumination. This can be used either to correct for differences in units between the xy coordinate system and the elevation coordinate system (e.g. decimal degrees vs. meters) or to exaggerate or deemphasize topographic effects.
 dxint or float, optional
The xspacing (columns) of the input DEM. This is typically the spatial resolution of the DEM.
 dyint or float, optional
The yspacing (rows) of the input input DEM. This is typically the spatial resolution of the DEM.
Returns:
 hsnumpy.array
A 2D hillshade array with values between 01, where 0 is completely in shadow and 1 is completely illuminated.
 dea_tools.spatial.idw(input_z, input_x, input_y, output_x, output_y, p=1, k=10, max_dist=None, k_min=1, epsilon=1e12)[source]
Perform Inverse Distance Weighting (IDW) interpolation.
This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.
Code inspired by: DahnJ/REMxarray
 Parameters:
input_z (arraylike) – Array of values at the input points. This can be either a 1dimensional array, or a 2dimensional array where each column (axis=1) represents a different set of values to be interpolated.
input_x (arraylike) – Array of xcoordinates of the input points.
input_y (arraylike) – Array of ycoordinates of the input points.
output_x (arraylike) – Array of xcoordinates where the interpolation is to be computed.
output_y (arraylike) – Array of ycoordinates where the interpolation is to be computed.
p (int or float, optional) – Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of p will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.
k (int, optional) – Number of nearest neighbors to use for interpolation. k=1 is equivalent to “nearest” neighbour interpolation. Defaults to 10.
max_dist (int or float, optional) – Restrict neighbouring points to less than this distance. By default, no distance limit is applied.
k_min (int, optional) – If max_dist is provided, some points may end up with less than k nearest neighbours, potentially producing less reliable interpolations. Set k_min to set any points with less than k_min neighbours to NaN. Defaults to 1.
epsilon (float, optional) – Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e12.
 Returns:
interp_values – Interpolated values at the output coordinates. If input_z is 1dimensional, interp_values will also be 1dimensional. If input_z is 2dimensional, interp_values will have the same number of rows as input_z, with each column (axis=1) representing interpolated values for one set of input data.
 Return type:
numpy.ndarray
Examples
>>> input_z = [1, 2, 3, 4, 5] >>> input_x = [0, 1, 2, 3, 4] >>> input_y = [0, 1, 2, 3, 4] >>> output_x = [0.5, 1.5, 2.5] >>> output_y = [0.5, 1.5, 2.5] >>> idw(input_z, input_x, input_y, output_x, output_y, k=2) array([1.5, 2.5, 3.5])
 dea_tools.spatial.interpolate_2d(ds, x_coords, y_coords, z_coords, method='linear', factor=1, verbose=False, **kwargs)[source]
This function takes points with X, Y and Z coordinates, and interpolates Zvalues across the extent of an existing xarray dataset. This can be useful for producing smooth surfaces from point data that can be compared directly against satellite data derived from an OpenDataCube query.
Supported interpolation methods include ‘linear’, ‘nearest’ and ‘cubic (using scipy.interpolate.griddata), and ‘rbf’ (using scipy.interpolate.Rbf).
NOTE: This function is deprecated and will be retired in a future release. Please use xr_interpolate instead.”
Last modified: February 2020
 Parameters:
ds (xarray DataArray or Dataset) – A twodimensional or multidimensional array from which x and y dimensions will be copied and used for the area in which to interpolate point data.
x_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).
y_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).
z_coords (numpy array) – An array containing Z coordinates for all points (e.g. elevations). These are the values you wish to interpolate between.
method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for ‘linear’, ‘nearest’ and ‘cubic’ methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (‘rbf’). Defaults to ‘linear’.
factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, then upsample this array back to the original dimensions of the data as a final step. For example, setting factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This approach will be significantly faster than interpolating at full resolution, but will potentially produce less accurate or reliable results.
verbose (bool, optional) – Print debugging messages. Default False.
**kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is ‘linear’, ‘nearest’ or ‘cubic’), or scipy.interpolate.Rbf (is method is ‘rbf’).
 Returns:
interp_2d_array – An xarray DataArray containing with x and y coordinates copied from ds_array, and Zvalues interpolated from the points data.
 Return type:
xarray DataArray
 dea_tools.spatial.largest_region(bool_array, **kwargs)[source]
Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.
 Parameters:
bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells
**kwargs – Optional keyword arguments to pass to measure.label
 Returns:
largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.
 Return type:
boolean array
 dea_tools.spatial.points_on_line(gdf, index, distance=30)[source]
Generates evenlyspaced point features along a specific line feature in a geopandas.GeoDataFrame. Parameters: ———– gdf : geopandas.GeoDataFrame
A geopandas.GeoDataFrame containing line features with an index and CRS.
 indexstring or int
An value giving the index of the line to generate points along
 distanceinteger or float, optional
A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line.
Returns:
 points_gdfgeopandas.GeoDataFrame
A geopandas.GeoDataFrame containing point features at every distance along the selected line.
 dea_tools.spatial.reverse_geocode(coords, site_classes=None, state_classes=None)[source]
Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plaintext description of the location in the form:
Site, State
E.g.: reverse_geocode(coords=(35.282163, 149.128835))
‘Canberra, Australian Capital Territory’
 Parameters:
coords (tuple of floats) – A tuple of (latitude, longitude) coordinates used to perform the reverse geocode.
site_classes (list of strings, optional) –
A list of strings used to define the site part of the plain text location description. Because the contents of the geocoded address can vary greatly depending on location, these strings are tested against the address one by one until a match is made. Defaults to: `[‘city’, ‘town’, ‘village’, ‘suburb’, ‘hamlet’,
’county’, ‘municipality’]`.
state_classes (list of strings, optional) – A list of strings used to define the state part of the plain text location description. These strings are tested against the address one by one until a match is made. Defaults to: [‘state’, ‘territory’].
 Returns:
If a valid geocoded address is found, a plain text location
description will be returned – ‘Site, State’
If no valid address is found, formatted coordinates will be returned
instead – ‘XX.XX S, XX.XX E’
 dea_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', time_format='%Y%m%d', errors='ignore', verbose=True)[source]
Uses skimage.measure.find_contours to extract multiple zvalue contour lines from a twodimensional array (e.g. multiple elevations from a single DEM), or one zvalue for each array along a specified dimension of a multidimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).
Contours are returned as a geopandas.GeoDataFrame with one row per zvalue or one row per array along a specified dimension. The attribute_df parameter can be used to pass custom attributes to the output contour features.
Last modified: May 2023
 Parameters:
da (xarray DataArray) – A twodimensional or multidimensional array from which contours are extracted. If a twodimensional array is provided, the analysis will run in ‘single array, multiple zvalues’ mode which allows you to specify multiple z_values to be extracted. If a multidimensional array is provided, the analysis will run in ‘single zvalue, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.
z_values (int, float or list of ints, floats) – An individual zvalue or list of multiple zvalues to extract from the array. If operating in ‘single zvalue, multiple arrays’ mode specify only a single zvalue.
crs (string or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).output_path (string, optional) – The path and filename for the output shapefile.
attribute_df (pandas.Dataframe, optional) – A pandas.Dataframe containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as supplied z_values (in ‘multiple zvalue, single array’ mode), or the same number of rows as the number of arrays along the dim dimension (‘single zvalue, multiple arrays mode’).
min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.
dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single zvalue, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.
time_format (string, optional) – The format used to convert numpy.datetime64 values to strings if applied to data with a “time” dimension. Defaults to “%Y%m%d”.
errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.
verbose (bool, optional) – Print debugging messages. Default is True.
 Returns:
output_gdf – A geopandas geodataframe object with one feature per zvalue (‘single array, multiple zvalues’ mode), or one row per array along the dimension specified by the dim parameter (‘single zvalue, multiple arrays’ mode). If attribute_df was provided, these values will be included in the shapefile’s attribute table.
 Return type:
geopandas geodataframe
 dea_tools.spatial.sun_angles(dc, query)[source]
For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with ‘sun_elevation’ and ‘sun_azimuth’ variables.
Parameters:
 dcdatacube.Datacube object
Datacube instance used to load data.
 querydict
A dictionary containing query parameters used to identify satellite observations and load metadata.
Returns:
 sun_angles_dsxarray.Dataset
An xarray.set containing a ‘sun_elevation’ and ‘sun_azimuth’ variables.
 dea_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)[source]
Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG
 Parameters:
geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates
EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into
 Returns:
transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS
 Return type:
dict
 dea_tools.spatial.xr_interpolate(ds, gdf, columns=None, method='linear', factor=1, k=10, crs=None, **kwargs)[source]
This function takes a geopandas.GeoDataFrame points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset. This can be useful for producing smooth raster surfaces from point data to compare directly against satellite data.
Supported interpolation methods include “linear”, “nearest” and “cubic” (using scipy.interpolate.griddata), “rbf” (using scipy.interpolate.Rbf), and “idw” (Inverse Distance Weighted interpolation using k nearest neighbours). Each numeric column will be returned as a variable in the output xarray.Dataset.
Last modified: March 2024
 Parameters:
ds (xarray.DataArray or xarray.Dataset) – A twodimensional or multidimensional array whose spatial extent will be used to interpolate point data into.
gdf (geopandas.GeoDataFrame) – A dataset of spatial points including at least one numeric column. By default all numeric columns in this dataset will be spatially interpolated into the extent of ds; specific columns can be selected using columns. An warning will be raised if the points in gdf do not overlap with the extent of ds.
columns (list, optional) – An optional list of specific columns in gdf` to run the interpolation on. These must all be of numeric data types.
method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for “linear”, “nearest” and “cubic” methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (“rbf”), or Inverse Distance Weighted interpolation (“idw”). Defaults to ‘linear’.
factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, before upsampling the array back to the original dimensions of the data as a final step. For example, factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This will be significantly faster than interpolating at full resolution, but will potentially produce less accurate results.
k (int, optional) – The number of nearest neighbours used to calculate weightings if method is “idw”. Defaults to 10; setting k=1 is equivalent to “nearest” interpolation.
crs (string or CRS object, optional) – If ds’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter (e.g. ‘EPSG:3577’).
**kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is “linear”, “nearest” or “cubic”), scipy.interpolate.Rbf (is method is “rbf”), or idw (if method is “idw”).
 Returns:
interpolated_ds – An xarray.Dataset containing interpolated data with the same X and Y coordinate pixel grid as ds, and a data variable for each numeric column in gdf.
 Return type:
xarray.Dataset
 dea_tools.spatial.xr_rasterize(gdf, da, attribute_col=None, crs=None, name=None, output_path=None, verbose=True, **rasterio_kwargs)[source]
Rasterizes a vector
geopandas.GeoDataFrame
into a rasterxarray.DataArray
. Parameters:
gdf (geopandas.GeoDataFrame) – A
geopandas.GeoDataFrame
object containing the vector data you want to rasterise.da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to define the array that
gdf
is rasterized into. It effectively provides a spatial template.attribute_col (string, optional) – Name of the attribute column in
gdf
containing values for each vector feature that will be rasterized. If None, the output will be a boolean array of 1’s and 0’s.crs (str or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).name (str, optional) – An optional name used for the output ``xarray.DataArray`.
output_path (string, optional) – Provide an optional string file path to export the rasterized data as a GeoTIFF file.
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to
rasterio.features.rasterize
. Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.
 Returns:
da_rasterized – The rasterized vector data.
 Return type:
xarray.DataArray
 dea_tools.spatial.xr_vectorize(da, attribute_col=None, crs=None, dtype='float32', output_path=None, verbose=True, **rasterio_kwargs)[source]
Vectorises a raster
xarray.DataArray
into a vectorgeopandas.GeoDataFrame
. Parameters:
da (xarray.DataArray) – The input
xarray.DataArray
data to vectorise.attribute_col (str, optional) – Name of the attribute column in the resulting
geopandas.GeoDataFrame
. Values fromda
converted to polygons will be assigned to this column. If None, the column name will default to ‘attribute’.crs (str or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).dtype (str, optional) – Data type of must be one of int16, int32, uint8, uint16, or float32
output_path (string, optional) – Provide an optional string file path to export the vectorised data to file. Supports any vector file formats supported by
geopandas.GeoDataFrame.to_file()
.verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to
rasterio.features.shapes
. Can include mask and connectivity.
 Returns:
gdf
 Return type:
geopandas.GeoDataFrame
 dea_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)[source]
Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.
 Parameters:
shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated
raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).
statistics (list) –
 list of statistics to calculate. e.g.
[‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]
out_shp (str) – Path to export shapefile containing zonal statistics.
ncpus (int) – number of cores to parallelize the operations over.
kwargs – Any other keyword arguments to rasterstats.zonal_stats() See perrygeo/pythonrasterstats for all options
 Return type:
Exports a shapefile to disk containing the zonal statistics requested