dea_tools.spatial

Tools for spatially manipulating Digital Earth Australia data.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Discord chat (https://discord.com/invite/4hhBQVas5U) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, file one on GitHub: GeoscienceAustralia/dea-notebooks#new

Last modified: July 2024

Functions

`add_geobox`(ds[, crs])	Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.
`contours_to_arrays`(args, *kwargs)
`extract_vertices`(gdf[, explode, ignore_index])	Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.
`hillshade`(dem, elevation, azimuth[, ...])	Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.
`idw`(input_z, input_x, input_y, output_x, ...)	Perform Inverse Distance Weighting (IDW) interpolation.
`interpolate_2d`(args, *kwargs)
`largest_region`(bool_array, **kwargs)	Takes a boolean array and identifies the largest contiguous region of connected True values.
`points_on_line`(gdf, index[, distance])	Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame.
`reverse_geocode`(coords[, site_classes, ...])	Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:
`subpixel_contours`(da[, z_values, crs, ...])	Extracts multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an `xarray` timeseries).
`sun_angles`(dc, query)	For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with 'sun_elevation' and 'sun_azimuth' variables.
`transform_geojson_wgs_to_epsg`(geojson, EPSG)	Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG
`xr_interpolate`(ds, gdf[, columns, method, ...])	This function takes a `geopandas.GeoDataFrame` points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset.
`xr_rasterize`(gdf, da[, attribute_col, crs, ...])	Rasterizes a vector `geopandas.GeoDataFrame` into a raster `xarray.DataArray`.
`xr_vectorize`(da[, attribute_col, crs, ...])	Vectorises a raster `xarray.DataArray` into a vector `geopandas.GeoDataFrame`.
`zonal_stats_parallel`(shp, raster, ...)	Summarizing raster datasets based on vector geometries in parallel.

dea_tools.spatial.add_geobox(ds, crs=None)[source]

Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.

If ds is missing a Coordinate Reference System (CRS), this can be supplied using the crs param.

Parameters:

ds (xarray.Dataset or xarray.DataArray) – Input xarray object that needs to be checked for spatial information.
crs (str, optional) – Coordinate Reference System (CRS) information for the input ds array. If ds already has a CRS, then crs is not required. Default is None.

Returns:

The input xarray object with added .odc.x attributes to access spatial information.

Return type:

xarray.Dataset or xarray.DataArray

dea_tools.spatial.extract_vertices(gdf, explode=True, ignore_index=True)[source]

Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.

Parameters:

gdf (geopandas.GeoDataFrame) – Input GeoDataFrame containing geometries to be converted.
explode (bool, optional) – By default, MultiPoint geometries will be exploded into individual Points. If False, geometries will be returned as MultiPoints.
ignore_index (bool, optional) – If True and explode=True, the resulting GeoDataFrame will have a new index.

Returns:

Updated GeoDataFrame with geometries converted to Points or MultiPoints.

Return type:

geopandas.GeoDataFrame

dea_tools.spatial.hillshade(dem, elevation, azimuth, vert_exag=1, dx=30, dy=30)[source]

Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.

Parameters:

dem (numpy.array) – A 2D Digital Elevation Model array.
elevation (int or float) – Sun elevation (0-90, degrees up from horizontal).
azimith (int or float) – Sun azimuth (0-360, degrees clockwise from north).
vert_exag (int or float, optional) – The amount to exaggerate the elevation values by when calculating illumination. This can be used either to correct for differences in units between the x-y coordinate system and the elevation coordinate system (e.g. decimal degrees vs. meters) or to exaggerate or de-emphasize topographic effects.
dx (int or float, optional) – The x-spacing (columns) of the input DEM. This is typically the spatial resolution of the DEM.
dy (int or float, optional) – The y-spacing (rows) of the input input DEM. This is typically the spatial resolution of the DEM.

Returns:

hs – A 2D hillshade array with values between 0-1, where 0 is completely in shadow and 1 is completely illuminated.

Return type:

numpy.array

dea_tools.spatial.idw(input_z, input_x, input_y, output_x, output_y, p=1, k=10, max_dist=None, k_min=1, epsilon=1e-12)[source]

Perform Inverse Distance Weighting (IDW) interpolation.

This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.

Code inspired by: DahnJ/REM-xarray

Parameters:

input_z (array-like) – Array of values at the input points. This can be either a 1-dimensional array, or a 2-dimensional array where each column (axis=1) represents a different set of values to be interpolated.
input_x (array-like) – Array of x-coordinates of the input points.
input_y (array-like) – Array of y-coordinates of the input points.
output_x (array-like) – Array of x-coordinates where the interpolation is to be computed.
output_y (array-like) – Array of y-coordinates where the interpolation is to be computed.
p (int or float, optional) – Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of p will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.
k (int, optional) – Number of nearest neighbors to use for interpolation. k=1 is equivalent to “nearest” neighbour interpolation. Defaults to 10.
max_dist (int or float, optional) – Restrict neighbouring points to less than this distance. By default, no distance limit is applied.
k_min (int, optional) – If max_dist is provided, some points may end up with less than k nearest neighbours, potentially producing less reliable interpolations. Set k_min to set any points with less than k_min neighbours to NaN. Defaults to 1.
epsilon (float, optional) – Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e-12.

Returns:

interp_values – Interpolated values at the output coordinates. If input_z is 1-dimensional, interp_values will also be 1-dimensional. If input_z is 2-dimensional, interp_values will have the same number of rows as input_z, with each column (axis=1) representing interpolated values for one set of input data.

Return type:

numpy.ndarray

Examples

>>> input_z = [1, 2, 3, 4, 5]
>>> input_x = [0, 1, 2, 3, 4]
>>> input_y = [0, 1, 2, 3, 4]
>>> output_x = [0.5, 1.5, 2.5]
>>> output_y = [0.5, 1.5, 2.5]
>>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
array([1.5, 2.5, 3.5])

dea_tools.spatial.largest_region(bool_array, **kwargs)[source]

Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.

Parameters:

bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells
**kwargs – Optional keyword arguments to pass to measure.label

Returns:

largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.

Return type:

boolean array

dea_tools.spatial.points_on_line(gdf, index, distance=30)[source]

Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame.

Parameters:

gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame containing line features with an index and CRS.
index (string or int) – An value giving the index of the line to generate points along
distance (integer or float, optional) – A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line.

Returns:

points_gdf – A geopandas.GeoDataFrame containing point features at every distance along the selected line.

Return type:

geopandas.GeoDataFrame

dea_tools.spatial.reverse_geocode(coords, site_classes=None, state_classes=None)[source]

Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:

Site, State

E.g.: reverse_geocode(coords=(-35.282163, 149.128835))

‘Canberra, Australian Capital Territory’

Parameters:

coords (tuple of floats) – A tuple of (latitude, longitude) coordinates used to perform the reverse geocode.
site_classes (list of strings, optional) – A list of strings used to define the site part of the plain text location description. Because the contents of the geocoded address can vary greatly depending on location, these strings are tested against the address one by one until a match is made. Defaults to: ['city', 'town', 'village', 'suburb', 'hamlet', 'county', 'municipality'].
state_classes (list of strings, optional) – A list of strings used to define the state part of the plain text location description. These strings are tested against the address one by one until a match is made. Defaults to: ['state', 'territory'].

Returns:

If a valid geocoded address is found, a plain text location description will be returned (‘Site, State’)
If no valid address is found, formatted coordinates will be returned instead (‘XX.XX S, XX.XX E’)

dea_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', time_format='%Y-%m-%d', errors='ignore', verbose=True)[source]

Extracts multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).

Contours are returned as a geopandas.GeoDataFrame with one row per z-value or one row per array along a specified dimension. The attribute_df parameter can be used to pass custom attributes to the output contour features.

Last modified: May 2023

Parameters:

da (xarray DataArray) – A two-dimensional or multi-dimensional array from which contours are extracted. If a two-dimensional array is provided, the analysis will run in ‘single array, multiple z-values’ mode which allows you to specify multiple z_values to be extracted. If a multi-dimensional array is provided, the analysis will run in ‘single z-value, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.
z_values (int, float or list of ints, floats) – An individual z-value or list of multiple z-values to extract from the array. If operating in ‘single z-value, multiple arrays’ mode specify only a single z-value.
crs (string or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
output_path (string, optional) – The path and filename for the output shapefile.
attribute_df (pandas.Dataframe, optional) – A pandas.Dataframe containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as supplied z_values (in ‘multiple z-value, single array’ mode), or the same number of rows as the number of arrays along the dim dimension (‘single z-value, multiple arrays mode’).
min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.
dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single z-value, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.
time_format (string, optional) – The format used to convert numpy.datetime64 values to strings if applied to data with a “time” dimension. Defaults to “%Y-%m-%d”.
errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.
verbose (bool, optional) – Print debugging messages. Default is True.

Returns:

output_gdf – A geopandas geodataframe object with one feature per z-value (‘single array, multiple z-values’ mode), or one row per array along the dimension specified by the dim parameter (‘single z-value, multiple arrays’ mode). If attribute_df was provided, these values will be included in the shapefile’s attribute table.

Return type:

geopandas geodataframe

dea_tools.spatial.sun_angles(dc, query)[source]

For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with ‘sun_elevation’ and ‘sun_azimuth’ variables.

Parameters:

dcdatacube.Datacube object: Datacube instance used to load data.
querydict: A dictionary containing query parameters used to identify satellite observations and load metadata.

Returns:

sun_angles_dsxarray.Dataset: An xarray.set containing a ‘sun_elevation’ and ‘sun_azimuth’ variables.

dea_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)[source]

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

Parameters:

geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates
EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into

Returns:

transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS

Return type:

dict

dea_tools.spatial.xr_interpolate(ds, gdf, columns=None, method='linear', factor=1, k=10, crs=None, **kwargs)[source]

This function takes a geopandas.GeoDataFrame points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset. This can be useful for producing smooth raster surfaces from point data to compare directly against satellite data.

Supported interpolation methods include “linear”, “nearest” and “cubic” (using scipy.interpolate.griddata), “rbf” (using scipy.interpolate.Rbf), and “idw” (Inverse Distance Weighted interpolation using k nearest neighbours). Each numeric column will be returned as a variable in the output xarray.Dataset.

Last modified: March 2024

Parameters:

ds (xarray.DataArray or xarray.Dataset) – A two-dimensional or multi-dimensional array whose spatial extent will be used to interpolate point data into.
gdf (geopandas.GeoDataFrame) – A dataset of spatial points including at least one numeric column. By default all numeric columns in this dataset will be spatially interpolated into the extent of ds; specific columns can be selected using columns. An warning will be raised if the points in gdf do not overlap with the extent of ds.
columns (list, optional) – An optional list of specific columns in gdf to run the interpolation on. These must all be of numeric data types.
method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for “linear”, “nearest” and “cubic” methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (“rbf”), or Inverse Distance Weighted interpolation (“idw”). Defaults to ‘linear’.
factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, before up-sampling the array back to the original dimensions of the data as a final step. For example, factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This will be significantly faster than interpolating at full resolution, but will potentially produce less accurate results.
k (int, optional) – The number of nearest neighbours used to calculate weightings if method is “idw”. Defaults to 10; setting k=1 is equivalent to “nearest” interpolation.
crs (string or CRS object, optional) – If ds’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter (e.g. ‘EPSG:3577’).
**kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is “linear”, “nearest” or “cubic”), scipy.interpolate.Rbf (is method is “rbf”), or idw (if method is “idw”).

Returns:

interpolated_ds – An xarray.Dataset containing interpolated data with the same X and Y coordinate pixel grid as ds, and a data variable for each numeric column in gdf.

Return type:

xarray.Dataset

dea_tools.spatial.xr_rasterize(gdf, da, attribute_col=None, crs=None, name=None, output_path=None, verbose=True, **rasterio_kwargs)[source]

Rasterizes a vector geopandas.GeoDataFrame into a raster xarray.DataArray.

Parameters:

gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector data you want to rasterise.
da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to define the array that gdf is rasterized into. It effectively provides a spatial template.
attribute_col (string, optional) – Name of the attribute column in gdf containing values for each vector feature that will be rasterized. If None, the output will be a boolean array of 1’s and 0’s.
crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
name (str, optional) – An optional name used for the output xarray.DataArray.
output_path (string, optional) – Provide an optional string file path to export the rasterized data as a GeoTIFF file.
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to rasterio.features.rasterize. Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.

Returns:

da_rasterized – The rasterized vector data.

Return type:

xarray.DataArray

dea_tools.spatial.xr_vectorize(da, attribute_col=None, crs=None, dtype='float32', output_path=None, verbose=True, **rasterio_kwargs)[source]

Vectorises a raster xarray.DataArray into a vector geopandas.GeoDataFrame.

Parameters:

da (xarray.DataArray) – The input xarray.DataArray data to vectorise.
attribute_col (str, optional) – Name of the attribute column in the resulting geopandas.GeoDataFrame. Values from da converted to polygons will be assigned to this column. If None, the column name will default to ‘attribute’.
crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
dtype (str, optional) – Data type of must be one of int16, int32, uint8, uint16, or float32
output_path (string, optional) – Provide an optional string file path to export the vectorised data to file. Supports any vector file formats supported by geopandas.GeoDataFrame.to_file().
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to rasterio.features.shapes. For example, “mask” and “connectivity”.

Returns:

gdf

Return type:

geopandas.GeoDataFrame

dea_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)[source]

Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.

Parameters:

shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated
raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).
statistics (list) – list of statistics to calculate. e.g. [‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]
out_shp (str) – Path to export shapefile containing zonal statistics.
ncpus (int) – number of cores to parallelize the operations over.
kwargs – Any other keyword arguments to rasterstats.zonal_stats() See perrygeo/python-rasterstats for all options

Return type:

Exports a shapefile to disk containing the zonal statistics requested