dea_tools.temporal

Conducting temporal (time-domain) analyses on Digital Earth Australia.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel (http://slack.opendatacube.org/) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, file one on GitHub: GeoscienceAustralia/dea-notebooks#new

Last modified: May 2024

Functions

allNaN_arg(da, dim, stat)

Calculate da.argmax() or da.argmin() while handling all-NaN slices.

calculate_sad(vec)

Calculates the surface area duration curve for a given vector of heights.

calculate_stsad(vec[, window_size, step, ...])

Calculates the short-time surface area duration curve for a given vector of heights.

calculate_vector_stat(vec, stat[, ...])

Calculates a vector statistic over a rolling window.

lag_linregress_3D(x, y[, lagx, lagy, first_dim])

Takes two xr.Datarrays of any dimensions (input data could be a 1D time series, or for example, have three dimensions e.g. time, lat, lon), and return covariance, correlation, regression slope and intercept, p-value, and standard error on regression between the two datasets along their aligned first dimension.

mad_outliers(da[, dim, threshold])

Identify outliers along an xarray dimension using Median Absolute Deviation (MAD).

temporal_statistics(da, stats)

Calculate various generic summary statistics on any timeseries.

time_buffer(input_date[, buffer, output_format])

Create a buffer of a given duration (e.g. days) around a time query.

xr_phenology(da[, stats, method_sos, ...])

Obtain land surface phenology metrics from an xarray.DataArray containing a timeseries of a vegetation index like NDVI.

xr_regression(x, y[, dim, alternative, ...])

Compare two multi-dimensional xr.Datarrays and calculate linear least-squares regression along a dimension, returning slope, intercept, p-value, standard error, covariance, correlation, and valid observation counts (n).

Classes

LinregressResult(cov, cor, slope, intercept, ...)

dea_tools.temporal.allNaN_arg(da, dim, stat)[source]

Calculate da.argmax() or da.argmin() while handling all-NaN slices. Fills all-NaN locations with an float and then masks the offending cells.

Parameters:
  • da (xarray.DataArray) –

  • dim (str) – Dimension over which to calculate argmax, argmin e.g. ‘time’

  • stat (str) – The statistic to calculte, either ‘min’ for argmin() or ‘max’ for .argmax()

Return type:

xarray.DataArray

dea_tools.temporal.calculate_sad(vec)[source]

Calculates the surface area duration curve for a given vector of heights.

Parameters:

vec (d-dimensional np.ndarray) – Vector of heights over time.

Returns:

Surface area duration curve vector over the same time scale.

Return type:

d-dimensional np.ndarray

dea_tools.temporal.calculate_stsad(vec, window_size=365, step=10, progress=None, window='hann')[source]

Calculates the short-time surface area duration curve for a given vector of heights.

Parameters:
  • vec (d-dimensional np.ndarray) – Vector of heights over time.

  • window_size (int) – Sliding window size (default 365).

  • step (int) – Step size (default 10).

  • progress (iterator -> iterator) – Optional progress decorator, e.g. tqdm.notebook.tqdm. Default None.

  • window (str) – What kind of window function to use. Default ‘hann’, but you might also want to use ‘boxcar’. Any scipy window function is allowed (see documentation for scipy.signal.get_window for more information).

Returns:

  • (d / step)-dimensional np.ndarray – y values (the time axis)

  • t-dimensional np.ndarray – x values (the statistic axis)

  • (d / step) x t-dimensional np.ndarray – The short-time surface area duration curve array.

dea_tools.temporal.calculate_vector_stat(vec: data dim, stat: data dim -> target dim, window_size=365, step=10, target_dim=365, progress=None, window='hann')[source]

Calculates a vector statistic over a rolling window.

Parameters:
  • vec (d-dimensional np.ndarray) – Vector to calculate over, e.g. a time series.

  • stat (R^d -> R^t function) – Statistic function.

  • window_size (int) – Sliding window size (default 365).

  • step (int) – Step size (default 10).

  • target_dim (int) – Dimensionality of the output of stat (default 365).

  • progress (iterator -> iterator) – Optional progress decorator, e.g. tqdm.notebook.tqdm. Default None.

  • window (str) – What kind of window function to use. Default ‘hann’, but you might also want to use ‘boxcar’. Any scipy window function is allowed (see documentation for scipy.signal.get_window for more information).

Returns:

  • (d / step)-dimensional np.ndarray – y values (the time axis)

  • t-dimensional np.ndarray – x values (the statistic axis)

  • (d / step) x t-dimensional np.ndarray – The vector statistic array.

dea_tools.temporal.lag_linregress_3D(x, y, lagx=0, lagy=0, first_dim='time')[source]

Takes two xr.Datarrays of any dimensions (input data could be a 1D time series, or for example, have three dimensions e.g. time, lat, lon), and return covariance, correlation, regression slope and intercept, p-value, and standard error on regression between the two datasets along their aligned first dimension.

Datasets can be provided in any order, but note that the regression slope and intercept will be calculated for y with respect to x.

NOTE: This function is deprecated and will be retired in a future release. Please use xr_regression instead.”

Parameters:
  • x (xarray DataArray) – Two xarray DataArrays with any number of dimensions, both sharing the same first dimension

  • y (xarray DataArray) – Two xarray DataArrays with any number of dimensions, both sharing the same first dimension

  • lagx (int, optional) – Optional integers giving lag values to assign to either of the data, with lagx shifting x, and lagy shifting y with the specified lag amount.

  • lagy (int, optional) – Optional integers giving lag values to assign to either of the data, with lagx shifting x, and lagy shifting y with the specified lag amount.

  • first_dim (str, optional) – An optional string giving the name of the first dimension on which to align datasets. The default is ‘time’.

Returns:

cov, cor, slope, intercept, pval, stderr – Covariance, correlation, regression slope and intercept, p-value, and standard error on regression between the two datasets along their aligned first dimension.

Return type:

xarray DataArray

dea_tools.temporal.mad_outliers(da, dim='time', threshold=3.5)[source]

Identify outliers along an xarray dimension using Median Absolute Deviation (MAD).

Parameters:
  • da (xarray.DataArray)) – The input data array with dimensions time, x, y.

  • dim (str, optional) – An optional string giving the name of the dimension on which to apply the MAD calculation. The default is ‘time’.

  • threshold (float)) – The number of MADs away from the median to consider an observation an outlier.

Returns:

A boolean array with the same dimensions as input data, where True indicates an outlier.

Return type:

xarray.DataArray

dea_tools.temporal.temporal_statistics(da, stats)[source]

Calculate various generic summary statistics on any timeseries.

This function uses the hdstats temporal library: daleroberts/hdstats

Last modified June 2020

Parameters:
  • da (xarray.DataArray) – DataArray should contain a 3D time series.

  • stats (list) –

    List of temporal statistics to calculate. Options include:

    • 'discordance': TODO

    • 'f_std': std of discrete fourier transform coefficients, returns

    three layers: f_std_n1, f_std_n2, f_std_n3 * 'f_mean': mean of discrete fourier transform coefficients, returns three layers: f_mean_n1, f_mean_n2, f_mean_n3 * 'f_median': median of discrete fourier transform coefficients, returns three layers: f_median_n1, f_median_n2, f_median_n3 * 'mean_change': mean of discrete difference along time dimension * 'median_change': median of discrete difference along time dimension * 'abs_change': mean of absolute discrete difference along time dimension * 'complexity': TODO * 'central_diff': TODO * 'num_peaks': The number of peaks in the timeseries, defined with a local

    window of size 10. NOTE: This statistic is very slow

Returns:

Dataset containing variables for the selected temporal statistics

Return type:

xarray.Dataset

dea_tools.temporal.time_buffer(input_date, buffer='30 days', output_format='%Y-%m-%d')[source]

Create a buffer of a given duration (e.g. days) around a time query. Output is a string in the correct format for a datacube query.

Parameters:
  • input_date (str, yyyy-mm-dd) – Time to buffer

  • buffer (str, optional) – Default is ‘30 days’, can be any string supported by the pandas.Timedelta function

  • output_format (str, optional) – Optional string giving the strftime format used to convert buffered times to strings; defaults to ‘%Y-%m-%d’ (e.g. ‘2017-12-02’)

Returns:

early_buffer, late_buffer – A tuple of strings to pass to the datacube query function e.g. (‘2017-12-02’, ‘2018-01-31’) for input input_date=’2018-01-01’ and buffer=’30 days’

Return type:

str

dea_tools.temporal.xr_phenology(da, stats=['SOS', 'POS', 'EOS', 'Trough', 'vSOS', 'vPOS', 'vEOS', 'LOS', 'AOS', 'ROG', 'ROS'], method_sos='first', method_eos='last', verbose=True)[source]

Obtain land surface phenology metrics from an xarray.DataArray containing a timeseries of a vegetation index like NDVI.

Last modified February 2023

Parameters:
  • da (xarray.DataArray) – DataArray should contain a 2D or 3D time series of a vegetation index like NDVI, EVI

  • stats (list) –

    list of phenological statistics to return. Regardless of the metrics returned, all statistics are calculated due to inter-dependencies between metrics. Options include:

    • 'SOS': DOY of start of season

    • 'POS': DOY of peak of season

    • 'EOS': DOY of end of season

    • 'vSOS': Value at start of season

    • 'vPOS': Value at peak of season

    • 'vEOS': Value at end of season

    • 'Trough': Minimum value of season

    • 'LOS': Length of season (DOY)

    • 'AOS': Amplitude of season (in value units)

    • 'ROG': Rate of greening

    • 'ROS': Rate of senescence

  • method_sos (str) – If ‘first’ then vSOS is estimated as the first positive slope on the greening side of the curve. If ‘median’, then vSOS is estimated as the median value of the postive slopes on the greening side of the curve.

  • method_eos (str) – If ‘last’ then vEOS is estimated as the last negative slope on the senescing side of the curve. If ‘median’, then vEOS is estimated as the ‘median’ value of the negative slopes on the senescing side of the curve.

Returns:

Dataset containing variables for the selected phenology statistics

Return type:

xarray.Dataset

dea_tools.temporal.xr_regression(x, y, dim='time', alternative='two-sided', outliers_x=None, outliers_y=None)[source]

Compare two multi-dimensional xr.Datarrays and calculate linear least-squares regression along a dimension, returning slope, intercept, p-value, standard error, covariance, correlation, and valid observation counts (n).

Input arrays can have any number of dimensions, for example: a one-dimensional time series (dims: time), or three-dimensional data (dims: time, lat, lon). Regressions will be calculated for y with respect to x.

Results should be equivelent to one-dimensional regression performed using scipy.stats.linregress. Implementation inspired by: https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

Parameters:
  • x (xarray DataArray) – Two xarray.DataArrays with any number of dimensions. Both arrays should have the same length along the dim dimension. Regression slope and intercept will be calculated for y with respect to x.

  • y (xarray DataArray) – Two xarray.DataArrays with any number of dimensions. Both arrays should have the same length along the dim dimension. Regression slope and intercept will be calculated for y with respect to x.

  • dim (str, optional) – An optional string giving the name of the dimension along which to compare datasets. The default is ‘time’.

  • alternative (string, optional) – Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available: * ‘two-sided’: slope of the regression line is nonzero * ‘less’: slope of the regression line is less than zero * ‘greater’: slope of the regression line is greater than zero

  • outliers_x (bool or float, optional) – Whether to mask out outliers in each input array prior to regression calculation using MAD outlier detection. If True, use a default threshold of 3.5 MAD to identify outliers. Custom thresholds can be provided as a float.

  • outliers_y (bool or float, optional) – Whether to mask out outliers in each input array prior to regression calculation using MAD outlier detection. If True, use a default threshold of 3.5 MAD to identify outliers. Custom thresholds can be provided as a float.

Returns:

regression_ds – A dataset comparing the two input datasets along their aligned dimension, containing variables including covariance, correlation, coefficient of determination, regression slope, intercept, p-value and standard error, and number of valid observations (n).

Return type:

xarray.Dataset