dea_tools.spatial
Tools for spatially manipulating Digital Earth Australia data.
License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat (https://discord.com/invite/4hhBQVas5U) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).
If you would like to report an issue with this script, file one on GitHub: GeoscienceAustralia/dea-notebooks#new
Last modified: July 2024
Functions
|
Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo. |
|
|
|
Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries. |
|
Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith. |
|
Perform Inverse Distance Weighting (IDW) interpolation. |
|
|
|
Takes a boolean array and identifies the largest contiguous region of connected True values. |
|
Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame. |
|
Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form: |
|
Extracts multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an |
|
For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with 'sun_elevation' and 'sun_azimuth' variables. |
|
Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG |
|
This function takes a |
|
Rasterizes a vector |
|
Vectorises a raster |
|
Summarizing raster datasets based on vector geometries in parallel. |
- dea_tools.spatial.add_geobox(ds, crs=None)[source]
Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.
If ds is missing a Coordinate Reference System (CRS), this can be supplied using the crs param.
- Parameters:
ds (xarray.Dataset or xarray.DataArray) – Input xarray object that needs to be checked for spatial information.
crs (str, optional) – Coordinate Reference System (CRS) information for the input ds array. If ds already has a CRS, then crs is not required. Default is None.
- Returns:
The input xarray object with added .odc.x attributes to access spatial information.
- Return type:
xarray.Dataset or xarray.DataArray
- dea_tools.spatial.extract_vertices(gdf, explode=True, ignore_index=True)[source]
Extract vertices from any GeoDataFrame features, returning Point or MultiPoint geometries.
- Parameters:
gdf (geopandas.GeoDataFrame) – Input GeoDataFrame containing geometries to be converted.
explode (bool, optional) – By default, MultiPoint geometries will be exploded into individual Points. If False, geometries will be returned as MultiPoints.
ignore_index (bool, optional) – If True and explode=True, the resulting GeoDataFrame will have a new index.
- Returns:
Updated GeoDataFrame with geometries converted to Points or MultiPoints.
- Return type:
geopandas.GeoDataFrame
- dea_tools.spatial.hillshade(dem, elevation, azimuth, vert_exag=1, dx=30, dy=30)[source]
Calculate hillshade from an input Digital Elevation Model (DEM) array and a sun elevation and azimith.
- Parameters:
dem (numpy.array) – A 2D Digital Elevation Model array.
elevation (int or float) – Sun elevation (0-90, degrees up from horizontal).
azimith (int or float) – Sun azimuth (0-360, degrees clockwise from north).
vert_exag (int or float, optional) – The amount to exaggerate the elevation values by when calculating illumination. This can be used either to correct for differences in units between the x-y coordinate system and the elevation coordinate system (e.g. decimal degrees vs. meters) or to exaggerate or de-emphasize topographic effects.
dx (int or float, optional) – The x-spacing (columns) of the input DEM. This is typically the spatial resolution of the DEM.
dy (int or float, optional) – The y-spacing (rows) of the input input DEM. This is typically the spatial resolution of the DEM.
- Returns:
hs – A 2D hillshade array with values between 0-1, where 0 is completely in shadow and 1 is completely illuminated.
- Return type:
numpy.array
- dea_tools.spatial.idw(input_z, input_x, input_y, output_x, output_y, p=1, k=10, max_dist=None, k_min=1, epsilon=1e-12)[source]
Perform Inverse Distance Weighting (IDW) interpolation.
This function performs fast IDW interpolation by creating a KDTree from the input coordinates then uses it to find the k nearest neighbors for each output point. Weights are calculated based on the inverse distance to each neighbor, with weights descreasing with increasing distance.
Code inspired by: DahnJ/REM-xarray
- Parameters:
input_z (array-like) – Array of values at the input points. This can be either a 1-dimensional array, or a 2-dimensional array where each column (axis=1) represents a different set of values to be interpolated.
input_x (array-like) – Array of x-coordinates of the input points.
input_y (array-like) – Array of y-coordinates of the input points.
output_x (array-like) – Array of x-coordinates where the interpolation is to be computed.
output_y (array-like) – Array of y-coordinates where the interpolation is to be computed.
p (int or float, optional) – Power function parameter defining how rapidly weightings should decrease as distance increases. Higher values of p will cause weights for distant points to decrease rapidly, resulting in nearby points having more influence on predictions. Defaults to 1.
k (int, optional) – Number of nearest neighbors to use for interpolation. k=1 is equivalent to “nearest” neighbour interpolation. Defaults to 10.
max_dist (int or float, optional) – Restrict neighbouring points to less than this distance. By default, no distance limit is applied.
k_min (int, optional) – If max_dist is provided, some points may end up with less than k nearest neighbours, potentially producing less reliable interpolations. Set k_min to set any points with less than k_min neighbours to NaN. Defaults to 1.
epsilon (float, optional) – Small value added to distances to prevent division by zero errors in the case that output coordinates are identical to input coordinates. Defaults to 1e-12.
- Returns:
interp_values – Interpolated values at the output coordinates. If input_z is 1-dimensional, interp_values will also be 1-dimensional. If input_z is 2-dimensional, interp_values will have the same number of rows as input_z, with each column (axis=1) representing interpolated values for one set of input data.
- Return type:
numpy.ndarray
Examples
>>> input_z = [1, 2, 3, 4, 5] >>> input_x = [0, 1, 2, 3, 4] >>> input_y = [0, 1, 2, 3, 4] >>> output_x = [0.5, 1.5, 2.5] >>> output_y = [0.5, 1.5, 2.5] >>> idw(input_z, input_x, input_y, output_x, output_y, k=2) array([1.5, 2.5, 3.5])
- dea_tools.spatial.largest_region(bool_array, **kwargs)[source]
Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.
- Parameters:
bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells
**kwargs – Optional keyword arguments to pass to measure.label
- Returns:
largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.
- Return type:
boolean array
- dea_tools.spatial.points_on_line(gdf, index, distance=30)[source]
Generates evenly-spaced point features along a specific line feature in a geopandas.GeoDataFrame.
- Parameters:
gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame containing line features with an index and CRS.
index (string or int) – An value giving the index of the line to generate points along
distance (integer or float, optional) – A number giving the interval at which to generate points along the line feature. Defaults to 30, which will generate a point at every 30 metres along the line.
- Returns:
points_gdf – A geopandas.GeoDataFrame containing point features at every distance along the selected line.
- Return type:
geopandas.GeoDataFrame
- dea_tools.spatial.reverse_geocode(coords, site_classes=None, state_classes=None)[source]
Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:
Site, State
E.g.:
reverse_geocode(coords=(-35.282163, 149.128835))
‘Canberra, Australian Capital Territory’
- Parameters:
coords (tuple of floats) – A tuple of (latitude, longitude) coordinates used to perform the reverse geocode.
site_classes (list of strings, optional) – A list of strings used to define the site part of the plain text location description. Because the contents of the geocoded address can vary greatly depending on location, these strings are tested against the address one by one until a match is made. Defaults to:
['city', 'town', 'village', 'suburb', 'hamlet', 'county', 'municipality']
.state_classes (list of strings, optional) – A list of strings used to define the state part of the plain text location description. These strings are tested against the address one by one until a match is made. Defaults to:
['state', 'territory']
.
- Returns:
If a valid geocoded address is found, a plain text location description will be returned (‘Site, State’)
If no valid address is found, formatted coordinates will be returned instead (‘XX.XX S, XX.XX E’)
- dea_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', time_format='%Y-%m-%d', errors='ignore', verbose=True)[source]
Extracts multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an
xarray
timeseries).Contours are returned as a
geopandas.GeoDataFrame
with one row per z-value or one row per array along a specified dimension. Theattribute_df
parameter can be used to pass custom attributes to the output contour features.Last modified: May 2023
- Parameters:
da (xarray DataArray) – A two-dimensional or multi-dimensional array from which contours are extracted. If a two-dimensional array is provided, the analysis will run in ‘single array, multiple z-values’ mode which allows you to specify multiple z_values to be extracted. If a multi-dimensional array is provided, the analysis will run in ‘single z-value, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.
z_values (int, float or list of ints, floats) – An individual z-value or list of multiple z-values to extract from the array. If operating in ‘single z-value, multiple arrays’ mode specify only a single z-value.
crs (string or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).output_path (string, optional) – The path and filename for the output shapefile.
attribute_df (pandas.Dataframe, optional) – A
pandas.Dataframe
containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as suppliedz_values
(in ‘multiple z-value, single array’ mode), or the same number of rows as the number of arrays along thedim
dimension (‘single z-value, multiple arrays mode’).min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.
dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single z-value, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.
time_format (string, optional) – The format used to convert numpy.datetime64 values to strings if applied to data with a “time” dimension. Defaults to “%Y-%m-%d”.
errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.
verbose (bool, optional) – Print debugging messages. Default is True.
- Returns:
output_gdf – A geopandas geodataframe object with one feature per z-value (‘single array, multiple z-values’ mode), or one row per array along the dimension specified by the
dim
parameter (‘single z-value, multiple arrays’ mode). Ifattribute_df
was provided, these values will be included in the shapefile’s attribute table.- Return type:
geopandas geodataframe
- dea_tools.spatial.sun_angles(dc, query)[source]
For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with ‘sun_elevation’ and ‘sun_azimuth’ variables.
Parameters:
- dcdatacube.Datacube object
Datacube instance used to load data.
- querydict
A dictionary containing query parameters used to identify satellite observations and load metadata.
Returns:
- sun_angles_dsxarray.Dataset
An xarray.set containing a ‘sun_elevation’ and ‘sun_azimuth’ variables.
- dea_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)[source]
Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG
- Parameters:
geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates
EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into
- Returns:
transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS
- Return type:
dict
- dea_tools.spatial.xr_interpolate(ds, gdf, columns=None, method='linear', factor=1, k=10, crs=None, **kwargs)[source]
This function takes a
geopandas.GeoDataFrame
points dataset containing one or more numeric columns, and interpolates these points into the spatial extent of an existing xarray dataset. This can be useful for producing smooth raster surfaces from point data to compare directly against satellite data.Supported interpolation methods include “linear”, “nearest” and “cubic” (using
scipy.interpolate.griddata
), “rbf” (usingscipy.interpolate.Rbf
), and “idw” (Inverse Distance Weighted interpolation using k nearest neighbours). Each numeric column will be returned as a variable in the outputxarray.Dataset
.Last modified: March 2024
- Parameters:
ds (xarray.DataArray or xarray.Dataset) – A two-dimensional or multi-dimensional array whose spatial extent will be used to interpolate point data into.
gdf (geopandas.GeoDataFrame) – A dataset of spatial points including at least one numeric column. By default all numeric columns in this dataset will be spatially interpolated into the extent of
ds
; specific columns can be selected usingcolumns
. An warning will be raised if the points ingdf
do not overlap with the extent ofds
.columns (list, optional) – An optional list of specific columns in
gdf
to run the interpolation on. These must all be of numeric data types.method (string, optional) – The method used to interpolate between point values. This string is either passed to
scipy.interpolate.griddata
(for “linear”, “nearest” and “cubic” methods), or used to specify Radial Basis Function interpolation usingscipy.interpolate.Rbf
(“rbf”), or Inverse Distance Weighted interpolation (“idw”). Defaults to ‘linear’.factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, before up-sampling the array back to the original dimensions of the data as a final step. For example, factor=10 will interpolate data into a grid that has one tenth of the resolution of
ds
. This will be significantly faster than interpolating at full resolution, but will potentially produce less accurate results.k (int, optional) – The number of nearest neighbours used to calculate weightings if method is “idw”. Defaults to 10; setting
k=1
is equivalent to “nearest” interpolation.crs (string or CRS object, optional) – If
ds
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter (e.g. ‘EPSG:3577’).**kwargs – Optional keyword arguments to pass to either
scipy.interpolate.griddata
(if method is “linear”, “nearest” or “cubic”),scipy.interpolate.Rbf
(is method is “rbf”), oridw
(if method is “idw”).
- Returns:
interpolated_ds – An xarray.Dataset containing interpolated data with the same X and Y coordinate pixel grid as
ds
, and a data variable for each numeric column ingdf
.- Return type:
xarray.Dataset
- dea_tools.spatial.xr_rasterize(gdf, da, attribute_col=None, crs=None, name=None, output_path=None, verbose=True, **rasterio_kwargs)[source]
Rasterizes a vector
geopandas.GeoDataFrame
into a rasterxarray.DataArray
.- Parameters:
gdf (geopandas.GeoDataFrame) – A
geopandas.GeoDataFrame
object containing the vector data you want to rasterise.da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to define the array that
gdf
is rasterized into. It effectively provides a spatial template.attribute_col (string, optional) – Name of the attribute column in
gdf
containing values for each vector feature that will be rasterized. If None, the output will be a boolean array of 1’s and 0’s.crs (str or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).name (str, optional) – An optional name used for the output
xarray.DataArray
.output_path (string, optional) – Provide an optional string file path to export the rasterized data as a GeoTIFF file.
verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to
rasterio.features.rasterize
. Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.
- Returns:
da_rasterized – The rasterized vector data.
- Return type:
xarray.DataArray
- dea_tools.spatial.xr_vectorize(da, attribute_col=None, crs=None, dtype='float32', output_path=None, verbose=True, **rasterio_kwargs)[source]
Vectorises a raster
xarray.DataArray
into a vectorgeopandas.GeoDataFrame
.- Parameters:
da (xarray.DataArray) – The input
xarray.DataArray
data to vectorise.attribute_col (str, optional) – Name of the attribute column in the resulting
geopandas.GeoDataFrame
. Values fromda
converted to polygons will be assigned to this column. If None, the column name will default to ‘attribute’.crs (str or CRS object, optional) – If
da
’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).dtype (str, optional) – Data type of must be one of int16, int32, uint8, uint16, or float32
output_path (string, optional) – Provide an optional string file path to export the vectorised data to file. Supports any vector file formats supported by
geopandas.GeoDataFrame.to_file()
.verbose (bool, optional) – Print debugging messages. Default True.
**rasterio_kwargs – A set of keyword arguments to
rasterio.features.shapes
. For example, “mask” and “connectivity”.
- Returns:
gdf
- Return type:
geopandas.GeoDataFrame
- dea_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)[source]
Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.
- Parameters:
shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated
raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).
statistics (list) – list of statistics to calculate. e.g. [‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]
out_shp (str) – Path to export shapefile containing zonal statistics.
ncpus (int) – number of cores to parallelize the operations over.
kwargs – Any other keyword arguments to rasterstats.zonal_stats() See perrygeo/python-rasterstats for all options
- Return type:
Exports a shapefile to disk containing the zonal statistics requested