dea_tools.spatial

Tools for spatially manipulating Digital Earth Australia data.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel (http://slack.opendatacube.org/) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, file one on GitHub: GeoscienceAustralia/dea-notebooks#new

Last modified: July 2024

Functions

`add_geobox`(ds[, crs])	Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.
`contours_to_arrays`(gdf, col)	This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex.
`extract_vertices`(gdf[, explode, ignore_index])	Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.
`hillshade`(dem, elevation, azimuth[, ...])	Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.
`idw`(input_z, input_x, input_y, output_x, ...)	Perform Inverse Distance Weighting (IDW) interpolation.
`interpolate_2d`(ds, x_coords, y_coords, z_coords)	This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset.
`largest_region`(bool_array, **kwargs)	Takes a boolean array and identifies the largest contiguous region of connected True values.
`points_on_line`(gdf, index[, distance])	Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame. Parameters: ----------- gdf : geopandas.GeoDataFrame A geopandas.GeoDataFrame containing line features with an index and CRS. index : string or int An value giving the index of the line to generate points along distance : integer or float, optional A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line. Returns: -------- points_gdf : geopandas.GeoDataFrame A geopandas.GeoDataFrame containing point features at every distance along the selected line.
`reverse_geocode`(coords[, site_classes, ...])	Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:
`subpixel_contours`(da[, z_values, crs, ...])	Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).
`sun_angles`(dc, query)	For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with 'sun_elevation' and 'sun_azimuth' variables.
`transform_geojson_wgs_to_epsg`(geojson, EPSG)	Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG
`xr_interpolate`(ds, gdf[, columns, method, ...])	This function takes a geopandas.GeoDataFrame points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset.
`xr_rasterize`(gdf, da[, attribute_col, crs, ...])	Rasterizes a vector `geopandas.GeoDataFrame` into a raster `xarray.DataArray`.
`xr_vectorize`(da[, attribute_col, crs, ...])	Vectorises a raster `xarray.DataArray` into a vector `geopandas.GeoDataFrame`.
`zonal_stats_parallel`(shp, raster, ...)	Summarizing raster datasets based on vector geometries in parallel.

dea_tools.spatial.add_geobox(ds, crs=None)[source]

Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.

If ds is missing a Coordinate Reference System (CRS), this can be supplied using the crs param.

Parameters:

ds (xarray.Dataset or xarray.DataArray) – Input xarray object that needs to be checked for spatial information.
crs (str, optional) – Coordinate Reference System (CRS) information for the input ds array. If ds already has a CRS, then crs is not required. Default is None.

Returns:

The input xarray object with added .odc.x attributes to access spatial information.

Return type:

xarray.Dataset or xarray.DataArray

dea_tools.spatial.contours_to_arrays(gdf, col)[source]

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex. This data can then be used as an input to interpolation procedures (e.g. using a function like interpolate_2d.

Last modified: October 2021

Parameters:

gdf (Geopandas GeoDataFrame) – A GeoPandas GeoDataFrame of lines to convert into point coordinates.
col (str) – A string giving the name of the GeoDataFrame field to use as Z-values.

Returns:

A numpy array with three columns giving the X, Y and Z coordinates
of each vertex in the input GeoDataFrame.

dea_tools.spatial.extract_vertices(gdf, explode=True, ignore_index=True)[source]

Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.

Parameters:

gdf (geopandas.GeoDataFrame) – Input GeoDataFrame containing geometries to be converted.
explode (bool, optional) – By default, MultiPoint geometries will be exploded into individual Points. If False, geometries will be returned as MultiPoints.
ignore_index (bool, optional) – If True and explode=True, the resulting GeoDataFrame will have a new index.

Returns:

Updated GeoDataFrame with geometries converted to Points or MultiPoints.

Return type:

geopandas.GeoDataFrame

dea_tools.spatial.hillshade(dem, elevation, azimuth, vert_exag=1, dx=30, dy=30)[source]

Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.

Parameters:

demnumpy.array: A 2D Digital Elevation Model array.
elevationint or float: Sun elevation (0-90, degrees up from horizontal).
azimithint or float: Sun azimuth (0-360, degrees clockwise from north).
vert_exagint or float, optional: The amount to exaggerate the elevation values by when calculating illumination. This can be used either to correct for differences in units between the x-y coordinate system and the elevation coordinate system (e.g. decimal degrees vs. meters) or to exaggerate or de-emphasize topographic effects.
dxint or float, optional: The x-spacing (columns) of the input DEM. This is typically the spatial resolution of the DEM.
dyint or float, optional: The y-spacing (rows) of the input input DEM. This is typically the spatial resolution of the DEM.

Returns:

hsnumpy.array: A 2D hillshade array with values between 0-1, where 0 is completely in shadow and 1 is completely illuminated.

dea_tools.spatial.idw(input_z, input_x, input_y, output_x, output_y, p=1, k=10, max_dist=None, k_min=1, epsilon=1e-12)[source]

Perform Inverse Distance Weighting (IDW) interpolation.

This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.

Code inspired by: DahnJ/REM-xarray

Parameters:

input_z (array-like) – Array of values at the input points. This can be either a 1-dimensional array, or a 2-dimensional array where each column (axis=1) represents a different set of values to be interpolated.
input_x (array-like) – Array of x-coordinates of the input points.
input_y (array-like) – Array of y-coordinates of the input points.
output_x (array-like) – Array of x-coordinates where the interpolation is to be computed.
output_y (array-like) – Array of y-coordinates where the interpolation is to be computed.
p (int or float, optional) – Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of p will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.
k (int, optional) – Number of nearest neighbors to use for interpolation. k=1 is equivalent to “nearest” neighbour interpolation. Defaults to 10.
max_dist (int or float, optional) – Restrict neighbouring points to less than this distance. By default, no distance limit is applied.
k_min (int, optional) – If max_dist is provided, some points may end up with less than k nearest neighbours, potentially producing less reliable interpolations. Set k_min to set any points with less than k_min neighbours to NaN. Defaults to 1.
epsilon (float, optional) – Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e-12.

Returns:

interp_values – Interpolated values at the output coordinates. If input_z is 1-dimensional, interp_values will also be 1-dimensional. If input_z is 2-dimensional, interp_values will have the same number of rows as input_z, with each column (axis=1) representing interpolated values for one set of input data.

Return type:

numpy.ndarray

Examples

>>> input_z = [1, 2, 3, 4, 5]
>>> input_x = [0, 1, 2, 3, 4]
>>> input_y = [0, 1, 2, 3, 4]
>>> output_x = [0.5, 1.5, 2.5]
>>> output_y = [0.5, 1.5, 2.5]
>>> idw(input_z, input_x, input_y, output_x, output_y, k=2)
array([1.5, 2.5, 3.5])

dea_tools.spatial.interpolate_2d(ds, x_coords, y_coords, z_coords, method='linear', factor=1, verbose=False, **kwargs)[source]

This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset. This can be useful for producing smooth surfaces from point data that can be compared directly against satellite data derived from an OpenDataCube query.

Supported interpolation methods include ‘linear’, ‘nearest’ and ‘cubic (using scipy.interpolate.griddata), and ‘rbf’ (using scipy.interpolate.Rbf).

NOTE: This function is deprecated and will be retired in a future release. Please use xr_interpolate instead.”

Last modified: February 2020

Parameters:

ds (xarray DataArray or Dataset) – A two-dimensional or multi-dimensional array from which x and y dimensions will be copied and used for the area in which to interpolate point data.
x_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).
y_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).
z_coords (numpy array) – An array containing Z coordinates for all points (e.g. elevations). These are the values you wish to interpolate between.
method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for ‘linear’, ‘nearest’ and ‘cubic’ methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (‘rbf’). Defaults to ‘linear’.
factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, then up-sample this array back to the original dimensions of the data as a final step. For example, setting factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This approach will be significantly faster than interpolating at full resolution, but will potentially produce less accurate or reliable results.
verbose (bool, optional) – Print debugging messages. Default False.
**kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is ‘linear’, ‘nearest’ or ‘cubic’), or scipy.interpolate.Rbf (is method is ‘rbf’).

Returns:

interp_2d_array – An xarray DataArray containing with x and y coordinates copied from ds_array, and Z-values interpolated from the points data.

Return type:

xarray DataArray

dea_tools.spatial.largest_region(bool_array, **kwargs)[source]

Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.

Parameters:

bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells
**kwargs – Optional keyword arguments to pass to measure.label

Returns:

largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.

Return type:

boolean array

dea_tools.spatial.points_on_line(gdf, index, distance=30)[source]

Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame. Parameters: ———– gdf : geopandas.GeoDataFrame

A geopandas.GeoDataFrame containing line features with an index and CRS.

indexstring or int: An value giving the index of the line to generate points along
distanceinteger or float, optional: A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line.

Returns:

points_gdfgeopandas.GeoDataFrame: A geopandas.GeoDataFrame containing point features at every distance along the selected line.

dea_tools.spatial.reverse_geocode(coords, site_classes=None, state_classes=None)[source]

Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:

Site, State

E.g.: reverse_geocode(coords=(-35.282163, 149.128835))

‘Canberra, Australian Capital Territory’

Parameters:

coords (tuple of floats) – A tuple of (latitude, longitude) coordinates used to perform the reverse geocode.
site_classes (list of strings, optional) –
A list of strings used to define the site part of the plain text location description. Because the contents of the geocoded address can vary greatly depending on location, these strings are tested against the address one by one until a match is made. Defaults to: `[‘city’, ‘town’, ‘village’, ‘suburb’, ‘hamlet’,

’county’, ‘municipality’]`.
state_classes (list of strings, optional) – A list of strings used to define the state part of the plain text location description. These strings are tested against the address one by one until a match is made. Defaults to: [‘state’, ‘territory’].

Returns:

If a valid geocoded address is found, a plain text location
description will be returned – ‘Site, State’
If no valid address is found, formatted coordinates will be returned
instead – ‘XX.XX S, XX.XX E’

dea_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', time_format='%Y-%m-%d', errors='ignore', verbose=True)[source]

Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).

Contours are returned as a geopandas.GeoDataFrame with one row per z-value or one row per array along a specified dimension. The attribute_df parameter can be used to pass custom attributes to the output contour features.

Last modified: May 2023

Parameters:

da (xarray DataArray) – A two-dimensional or multi-dimensional array from which contours are extracted. If a two-dimensional array is provided, the analysis will run in ‘single array, multiple z-values’ mode which allows you to specify multiple z_values to be extracted. If a multi-dimensional array is provided, the analysis will run in ‘single z-value, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.
z_values (int, float or list of ints, floats) – An individual z-value or list of multiple z-values to extract from the array. If operating in ‘single z-value, multiple arrays’ mode specify only a single z-value.
crs (string or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
output_path (string, optional) – The path and filename for the output shapefile.
attribute_df (pandas.Dataframe, optional) – A pandas.Dataframe containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as supplied z_values (in ‘multiple z-value, single array’ mode), or the same number of rows as the number of arrays along the dim dimension (‘single z-value, multiple arrays mode’).
min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.
dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single z-value, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.
time_format (string, optional) – The format used to convert numpy.datetime64 values to strings if applied to data with a “time” dimension. Defaults to “%Y-%m-%d”.
errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.
verbose (bool, optional) – Print debugging messages. Default is True.

Returns:

output_gdf – A geopandas geodataframe object with one feature per z-value (‘single array, multiple z-values’ mode), or one row per array along the dimension specified by the dim parameter (‘single z-value, multiple arrays’ mode). If attribute_df was provided, these values will be included in the shapefile’s attribute table.

Return type:

geopandas geodataframe

dea_tools.spatial.sun_angles(dc, query)[source]

For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with ‘sun_elevation’ and ‘sun_azimuth’ variables.

Parameters:

dcdatacube.Datacube object: Datacube instance used to load data.
querydict: A dictionary containing query parameters used to identify satellite observations and load metadata.

Returns:

sun_angles_dsxarray.Dataset: An xarray.set containing a ‘sun_elevation’ and ‘sun_azimuth’ variables.

dea_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)[source]

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

Parameters:

geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates
EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into

Returns:

transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS

Return type:

dict

dea_tools.spatial.xr_interpolate(ds, gdf, columns=None, method='linear', factor=1, k=10, crs=None, **kwargs)[source]

This function takes a geopandas.GeoDataFrame points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset. This can be useful for producing smooth raster surfaces from point data to compare directly against satellite data.

Supported interpolation methods include “linear”, “nearest” and “cubic” (using scipy.interpolate.griddata), “rbf” (using scipy.interpolate.Rbf), and “idw” (Inverse Distance Weighted interpolation using k nearest neighbours). Each numeric column will be returned as a variable in the output xarray.Dataset.

Last modified: March 2024

Parameters:

ds (xarray.DataArray or xarray.Dataset) – A two-dimensional or multi-dimensional array whose spatial extent will be used to interpolate point data into.
gdf (geopandas.GeoDataFrame) – A dataset of spatial points including at least one numeric column. By default all numeric columns in this dataset will be spatially interpolated into the extent of ds; specific columns can be selected using columns. An warning will be raised if the points in gdf do not overlap with the extent of ds.
columns (list, optional) – An optional list of specific columns in gdf` to run the interpolation on. These must all be of numeric data types.
method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for “linear”, “nearest” and “cubic” methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (“rbf”), or Inverse Distance Weighted interpolation (“idw”). Defaults to ‘linear’.
factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, before up-sampling the array back to the original dimensions of the data as a final step. For example, factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This will be significantly faster than interpolating at full resolution, but will potentially produce less accurate results.
k (int, optional) – The number of nearest neighbours used to calculate weightings if method is “idw”. Defaults to 10; setting k=1 is equivalent to “nearest” interpolation.
crs (string or CRS object, optional) – If ds’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter (e.g. ‘EPSG:3577’).
**kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is “linear”, “nearest” or “cubic”), scipy.interpolate.Rbf (is method is “rbf”), or idw (if method is “idw”).

Returns:

interpolated_ds – An xarray.Dataset containing interpolated data with the same X and Y coordinate pixel grid as ds, and a data variable for each numeric column in gdf.

Return type:

xarray.Dataset

dea_tools.spatial.xr_rasterize(gdf, da, attribute_col=None, crs=None, name=None, output_path=None, verbose=True, **rasterio_kwargs)[source]

Rasterizes a vector geopandas.GeoDataFrame into a raster xarray.DataArray.

Parameters:

gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector data you want to rasterise.
da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to define the array that gdf is rasterized into. It effectively provides a spatial template.
attribute_col (string, optional) – Name of the attribute column in gdf containing values for each vector feature that will be rasterized. If None, the output will be a boolean array of 1’s and 0’s.
crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
name (str, optional) – An optional name used for the output ``xarray.DataArray`.
output_path (string, optional) – Provide an optional string file path to export the rasterized data as a GeoTIFF file.
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to rasterio.features.rasterize. Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.

Returns:

da_rasterized – The rasterized vector data.

Return type:

xarray.DataArray

dea_tools.spatial.xr_vectorize(da, attribute_col=None, crs=None, dtype='float32', output_path=None, verbose=True, **rasterio_kwargs)[source]

Vectorises a raster xarray.DataArray into a vector geopandas.GeoDataFrame.

Parameters:

da (xarray.DataArray) – The input xarray.DataArray data to vectorise.
attribute_col (str, optional) – Name of the attribute column in the resulting geopandas.GeoDataFrame. Values from da converted to polygons will be assigned to this column. If None, the column name will default to ‘attribute’.
crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).
dtype (str, optional) – Data type of must be one of int16, int32, uint8, uint16, or float32
output_path (string, optional) – Provide an optional string file path to export the vectorised data to file. Supports any vector file formats supported by geopandas.GeoDataFrame.to_file().
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to rasterio.features.shapes. Can include mask and connectivity.

Returns:

gdf

Return type:

geopandas.GeoDataFrame

dea_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)[source]

Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.

Parameters:

shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated
raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).
statistics (list) –

list of statistics to calculate. e.g.
[‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]
out_shp (str) – Path to export shapefile containing zonal statistics.
ncpus (int) – number of cores to parallelize the operations over.
kwargs – Any other keyword arguments to rasterstats.zonal_stats() See perrygeo/python-rasterstats for all options

Return type:

Exports a shapefile to disk containing the zonal statistics requested