Visualising coastal turbidity with Sentinel-2 42b44e5e1840408b81a27bedf4714ea7

Background

Turbidity refers to the degree of clarity or the presence of suspended particles in a liquid. In the context of water, it is an optical property that quantifies the amount of light scattered by substances within the water sample when illuminated. Increased turbidity is indicative of a higher intensity of scattered light, which can be caused by various materials such as clay, silt, microscopic organisms, algae, dissolved organic compounds, and other fine inorganic and organic matter. Elevated levels of particulate matter often represent increases in pollutants which can have detrimental effects upon ecosystems. Read more here.

Traditional survey methods for measuring turbidity, such as collecting water samples and using Secci disks, can be time-consuming, costly, and limited in terms of spatial coverage and frequency of measurements. These methods require manual data collection, laboratory analysis, and on-site deployments. In contrast, remote sensing through satellites offers a cost-effective and efficient alternative. Satellite remote sensing provides wide coverage and frequent data acquisition, allowing for a comprehensive assessment of turbidity over large areas. One widely used remote sensing index for turbidity estimation is the Normalised Difference Turbidity Index (NDTI). The NDTI quantifies the difference in reflectance between specific spectral bands, which correlates with suspended sediment and turbidity levels. It’s important to note that NDTI values do not directly represent defined turbidity values in units such as NTU (Nephelometric Turbidity Units). However, by calibrating the NDTI with in-situ turbidity data recorded concurrently during image capture, it becomes possible to establish a correlation between NDTI values and actual turbidity measurements. This calibration enables NTU estimation, providing valuable insights into water quality. The NDTI equation is defined below:

\begin{equation} NDTI = \frac{(Red - Green)}{(Red + Green)} \end{equation}

Description

In this example, we create a time series animation depicting a turbidity plume at the Murray Mouth and the surrounding coastline. The animations contrast Sentinel-2 imagery taken from early-2022 to mid-2023; this coincides with ‘the flood of a generation’, peaking in mid-January 2023, leading to a stark decline in water quality within the vicinity. This example demonstrates how to:

  1. Load Sentinel-2 data

  2. Compute turbidity and water indices (NDTI, NDWI)

  3. Mask pixels which contain land and cloud

  4. Create time series animations

  5. Visualise changes in turbidity throughout a floodwater discharge event


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

After finishing the analysis, return to the “Analysis parameters” cell, modify some values (e.g. choose a different location or time period to analyse) and re-run the analysis. There are additional instructions on modifying the notebook at the end.

Load key Python packages and supporting functions for the analysis.

[1]:
%matplotlib inline

from IPython.core.display import Video
from IPython.display import Image
import datacube
import matplotlib.pyplot as plt
from datacube.utils import masking

import sys
sys.path.insert(1, "../Tools/")
from dea_tools.plotting import rgb, display_map
from dea_tools.bandindices import calculate_indices
from dea_tools.dask import create_local_dask_cluster
from dea_tools.datahandling import load_ard
from dea_tools.plotting import rgb, xr_animation

# Connect to dask
client = create_local_dask_cluster(return_client=True)

Client

Client-b220affd-0e6e-11ee-8221-f20df12de3eb

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: /user/robbi.bishoptaylor@ga.gov.au/proxy/8787/status

Cluster Info

Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.

[2]:
dc = datacube.Datacube(app='Turbidity_animated_timeseries')

The selected latitude and longitude will be displayed as a red box on the map below the next cell. This map can be used to find coordinates of other places, simply scroll and click on any point on the map to display the latitude and longitude of that location.

[3]:
lat_range = (-35.51, -35.73)
lon_range = (138.506, 139.028)

display_map(x=lon_range, y=lat_range)
[3]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load Sentinel-2 data

The first step in this analysis is to load in Sentinel-2 optical data for the lat_range, lon_range and the desired time range. Note that only the necessary bands have been added to reduce load times. The load_ard function is used here to load data that has undergone geometric correction and surface reflection correction, making it ready for analysis. More information on Digital Earth Australia’s ARD and the Open Data Cube here.

Note: This may take some time to load your selected data; click on the Dask Client “Dashboard” link above to see how it is progressing.

[4]:
# Load satellite data from datacube
query = {
    "y": lat_range,
    "x": lon_range,
    "time": ("2022-02-21", "2023-06-01"),
    "measurements": [
        "nbart_red",
        "nbart_green",
        "nbart_blue",
        "nbart_nir_1",
        "oa_s2cloudless_prob",
    ],
    "output_crs": "EPSG:3577",
    "resolution": (-60, 60),
    "resampling": "average",
}

# Load available data from both Sentinel-2 satellites
ds = load_ard(
    dc=dc,
    products=["ga_s2am_ard_3", "ga_s2bm_ard_3"],
    dask_chunks={"time": 1},
    min_gooddata=0.85,
    mask_pixel_quality=False,
    group_by="solar_day",
    **query
)

# Load selected data into memory using Dask
ds.load()
Finding datasets
    ga_s2am_ard_3
    ga_s2bm_ard_3
Counting good quality pixels for each time step using fmask
Filtering to 29 out of 185 time steps with at least 85.0% good quality pixels
Returning 29 time steps as a dask array
[4]:
<xarray.Dataset>
Dimensions:              (time: 29, y: 449, x: 809)
Coordinates:
  * time                 (time) datetime64[ns] 2022-03-12T00:47:04.634097 ......
  * y                    (y) float64 -3.894e+06 -3.894e+06 ... -3.921e+06
  * x                    (x) float64 5.878e+05 5.879e+05 ... 6.363e+05 6.363e+05
    spatial_ref          int32 3577
Data variables:
    nbart_red            (time, y, x) int16 1355 1325 982 940 ... 55 49 60 59
    nbart_green          (time, y, x) int16 996 955 725 716 ... 171 157 169 164
    nbart_blue           (time, y, x) int16 722 716 556 542 ... 225 239 214 227
    nbart_nir_1          (time, y, x) int16 2410 2264 2116 1913 ... 35 29 35 34
    oa_s2cloudless_prob  (time, y, x) float64 0.06223 0.03058 ... 0.1506 0.1395
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Examine the data by printing it in the next cell. The Dimensions argument revels the number of time steps in the data set, as well as the number of pixels in the x (longitude) and y (latitude) dimensions.

[5]:
ds
[5]:
<xarray.Dataset>
Dimensions:              (time: 29, y: 449, x: 809)
Coordinates:
  * time                 (time) datetime64[ns] 2022-03-12T00:47:04.634097 ......
  * y                    (y) float64 -3.894e+06 -3.894e+06 ... -3.921e+06
  * x                    (x) float64 5.878e+05 5.879e+05 ... 6.363e+05 6.363e+05
    spatial_ref          int32 3577
Data variables:
    nbart_red            (time, y, x) int16 1355 1325 982 940 ... 55 49 60 59
    nbart_green          (time, y, x) int16 996 955 725 716 ... 171 157 169 164
    nbart_blue           (time, y, x) int16 722 716 556 542 ... 225 239 214 227
    nbart_nir_1          (time, y, x) int16 2410 2264 2116 1913 ... 35 29 35 34
    oa_s2cloudless_prob  (time, y, x) float64 0.06223 0.03058 ... 0.1506 0.1395
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Print each time step and it’s associated image capture date to familiar yourself with the data.

[6]:
for i, time in enumerate(ds.time.values):
    print(f"Time step {i}: {time}")
Time step 0: 2022-03-12T00:47:04.634097000
Time step 1: 2022-03-17T00:46:58.626259000
Time step 2: 2022-04-01T00:47:02.723150000
Time step 3: 2022-04-06T00:46:57.416849000
Time step 4: 2022-04-11T00:47:02.434463000
Time step 5: 2022-05-01T00:47:05.678310000
Time step 6: 2022-05-11T00:47:04.242350000
Time step 7: 2022-05-21T00:47:06.941348000
Time step 8: 2022-06-25T00:47:05.431567000
Time step 9: 2022-06-30T00:47:13.488838000
Time step 10: 2022-07-10T00:47:13.500225000
Time step 11: 2022-08-04T00:47:05.121218000
Time step 12: 2022-08-09T00:47:12.280570000
Time step 13: 2022-09-13T00:47:02.636509000
Time step 14: 2022-10-08T00:47:07.032473000
Time step 15: 2022-11-07T00:47:04.627755000
Time step 16: 2022-11-17T00:47:02.549405000
Time step 17: 2022-12-02T00:46:58.683008000
Time step 18: 2022-12-07T00:47:01.111954000
Time step 19: 2022-12-12T00:46:57.822074000
Time step 20: 2022-12-17T00:46:59.889972000
Time step 21: 2023-01-01T00:46:59.408833000
Time step 22: 2023-01-06T00:46:59.476344000
Time step 23: 2023-01-11T00:46:56.637909000
Time step 24: 2023-02-10T00:46:59.060952000
Time step 25: 2023-02-20T00:46:59.702085000
Time step 26: 2023-03-17T00:46:58.255262000
Time step 27: 2023-03-22T00:47:03.200746000
Time step 28: 2023-05-11T00:47:03.027042000

To visualise the data, use the pre-loaded rgb utility function to plot a true colour image for a given time-step.

Change the value for timestep to explore the data.

[7]:
# Set the timestep to visualise
timestep = 20

# Generate RGB plot at the desired timestep
rgb(ds,
    index=timestep,
    percentile_stretch=(0.05, 0.98))
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_16_0.png

Calculate the NDTI index

First, calculate the NDTI index mentioned in the ‘Background’ section. Then visualise some imagery.

When it comes to interpreting the index, high values represent pixels with high turbidity, while low values represent pixels with low turbidity.

[8]:
# Calculate NDTI
ds = calculate_indices(ds, index="NDTI2", collection="ga_s2_3")
[9]:
# Visualise this
ts = ds["NDTI2"].isel(time=timestep)
ts.plot.imshow(robust=True, cmap="viridis", figsize=(15, 6))
plt.title("NDTI");
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_19_0.png

Compute Normalised Difference Water Index

When we applied the turbidity index to the imagery, the presence of land pixels within the imagery results in extremely high turbidity values over land thus reducing the sensitivity of the rasters contrast stretch over water. Hence we can mask out pixels associated with land to enhance the colour stretch over aquatic areas.

To do this, we can use our Sentinel-2 data to calculate a water index called the ‘Normalised Difference Water Index’, or NDWI. This index uses the ratio of green and near-infrared radiation to identify the presence of water. The formula is as follows:

\[\begin{aligned} \text{NDWI} &= \frac{(\text{Green} - \text{IR})}{(\text{Green} + \text{IR})} \end{aligned}\]

When it comes to interpreting the index, high values (greater than 0) typically represent water pixels, while low values (less than 0) represent land. You can use the cell below to calculate and plot one of the images after calculating the index.

[10]:
# Calculate the water index
ds = calculate_indices(ds, index="NDWI", collection="ga_s2_3")

Plot a representative sub-sample of time steps with a variety of environmental factors (e.g extreme turbidity, low turbidity, cloud, wave break).

[11]:
# Plot a representative subset of the dataset (ds)
ds["NDWI"].isel(time=[1, 7, 20, 26]).plot.imshow(
    vmin=-1, vmax=1, cmap="RdBu", col="time", col_wrap=2, figsize=(12, 6)
)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7f1ea18b1310>
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_23_1.png

Question How does extremely turbid water, wave break, and cloud impact upon NDWI values?

Mask out the land

Experiment with a landmask_threshold to isolate just water body pixels and visualise the results.

[12]:
# Trial land masking thresholds
landmask_threshold = -0.05
dslmask_trial = ds["NDWI"].isel(time=20).where(ds["NDWI"] > landmask_threshold)
[13]:
# Visualise this
ts = dslmask_trial.isel(time=timestep)
ts.plot.imshow(vmin=-1, vmax=1, cmap="RdBu", figsize=(15, 6))
plt.title("NDWI");
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_27_0.png

Once an appropriate threshold has been established create a new dataset titled ‘ds_lmask’.

[14]:
# Mask the dataset
ds_lmask = ds.where(ds['NDWI'] > landmask_threshold)

Cloud masking with s2cloudless

When working with Sentinel-2 data, users can apply a Sentinel-2 specific cloudmask, s2cloudless, a machine learning cloud mask developed by Sinergise’s Sentinel-Hub.

We can also easily load and inspect s2cloudless’s cloud probability layer that gives the likelihood of a pixel containing cloud. To do this first decide on a representative sample of the dataset ranging from non-cloud affected to very cloud affected imagery. Then plot the s2cloudless band on the same sample and decide on a making threshold to be applied to the dataset ds_lmask, which will then be renamed to ds_clmask (‘c’ stands for cloud, and ‘l’ stands for land).

[15]:
# Plot a representative sample of ds_lmask rgb
rgb(ds_lmask, index=[0, 9, 5, 3], percentile_stretch=(0.2, 0.85), col_wrap=2)
/env/lib/python3.8/site-packages/matplotlib/cm.py:478: RuntimeWarning: invalid value encountered in cast
  xx = (xx * 255).astype(np.uint8)
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_31_1.png
[16]:
# Plot the same sample of ds_lmask, cloud probability
ds_lmask["oa_s2cloudless_prob"].isel(time=[0, 9, 5, 3]).plot.imshow(
    cmap="inferno", col="time", col_wrap=2, figsize=(12, 6)
)
[16]:
<xarray.plot.facetgrid.FacetGrid at 0x7f1f0a1dcca0>
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_32_1.png

Experiment with the cloud mask. Change the cloudmask threshold and see how it affects the imagery, examine multiple timesteps. It is not imperative you remove every single cloud affected pixel: if you did so you would likely also be masking out highly turbid pixels.

[17]:
# Set cloudmask threshold
cloudmask_threshold = 0.98

# Set the timestep to visualise
timestep = 5

# Apply 'cloudmask_mask' to trial dataset 'ds_clmask_trial'
ds_clmask_trial = ds_lmask["oa_s2cloudless_prob"].where(
    ds["oa_s2cloudless_prob"] < cloudmask_threshold
)

# Visualise this
ts = ds_clmask_trial.isel(time=timestep)
ts.plot.imshow(cmap="inferno", figsize=(15, 6))
plt.title("Cloud_probability");
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_34_0.png

Now an appropriate threshold has been established create a new dataset titled ds_clmask.

[18]:
# Once certain cloudmask the dataset
ds_clmask = ds_lmask.where(ds["oa_s2cloudless_prob"] < cloudmask_threshold)

Visualise and interrogate the imagery

At this point we will plot all the masked data both in RGB and NDTI.

As both land and cloud have been masked, these values are now ‘null’ values. This is problematic as null values appear white, which is the same colour as clouds; this can be a source of confusion when displaying in RGB. To rectify this set ‘null’ values to ‘0’.

[19]:
# Manually set the null values equal to zero so they don't appear white like clouds
ds_clmask_zero = ds_clmask.where(~ds_clmask.isnull(), 0)
[20]:
# Generate RGB plot at the desired timestep
rgb(ds_clmask_zero,
    index=timestep,
    percentile_stretch=(0.1, 0.975))
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_39_0.png

Now plot all the imagery in RGB and NDTI.

Consider which images you want in your final animations. If an image remains too cloud affected or there is severe seamline effects (boundaries between adjacent image tiles, as observed in timesteps 0 and 1), it might be worth removing these images. Make a note of all the images you want to retain in the variable below called timesteps.

[21]:
# Plot the data so you can determine if there are any images you don't want in your animated time series

# Create a list of titles for the series of images and their timesteps
titles = list(f'Timestep:{x}' for x in range(0, len(ds_clmask_zero.time)))

# Plot the image series with titles
rgb(ds_clmask_zero, col="time", col_wrap=4, percentile_stretch=(0.1, 0.975), titles=titles, size=2.5)
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_41_0.png
[22]:
# Plot the NDTI data with the mask so you can determine if there are any images you don't want in your animated time series

img = ds_clmask["NDTI2"].plot(col="time", col_wrap=4, cmap="viridis", figsize=(12, 15))

# Add timesteps to the plot titles
for ax, title in zip(img.axs.flat, titles):
    ax.set_title(title)
../../../_images/notebooks_Real_world_examples_Turbidity_animated_timeseries_42_0.png

In the next two cells create two datasets for two separate purposes. ‘ds_final_imgs’ will be utilised to create animations of 1) the NDTI over only waterbodies, and 2) an RGB animation with the land and cloud masked to increase the contrast stretch over the waterbodies. Create another dataset ds_final_images_no_mask without the final mask to display the mosaicked imagery simply in RGB in their original state.

[23]:
# List the imagery timesteps to be included in the animations
timesteps = [2, 5, 6, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 28]
[24]:
# Create a dataset with only the desired time steps to be utilised in an animation when a cloud and land mask is needed.
ds_final_imgs = ds_clmask.isel(time=timesteps)
[25]:
# Create a dataset with only the desired time steps to be utilised in an animation when no mask is needed.
ds_final_imgs_no_mask = ds.isel(time=timesteps)

Print both datasets familiarise yourself with the data and ensure the timesteps are equal for both datasets.

[26]:
# Check that the timesteps are equal for both datasets
ds_final_imgs_no_mask.time.values == ds_final_imgs.time.values
[26]:
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True])
[27]:
# Print and investigate the index of each time step in ds_final_imgs
for i, time in enumerate(ds_final_imgs.time.values):
    print(f"Time step {i}: {time}")
Time step 0: 2022-04-01T00:47:02.723150000
Time step 1: 2022-05-01T00:47:05.678310000
Time step 2: 2022-05-11T00:47:04.242350000
Time step 3: 2022-05-21T00:47:06.941348000
Time step 4: 2022-06-25T00:47:05.431567000
Time step 5: 2022-06-30T00:47:13.488838000
Time step 6: 2022-09-13T00:47:02.636509000
Time step 7: 2022-10-08T00:47:07.032473000
Time step 8: 2022-11-07T00:47:04.627755000
Time step 9: 2022-11-17T00:47:02.549405000
Time step 10: 2022-12-02T00:46:58.683008000
Time step 11: 2022-12-07T00:47:01.111954000
Time step 12: 2022-12-12T00:46:57.822074000
Time step 13: 2022-12-17T00:46:59.889972000
Time step 14: 2023-01-01T00:46:59.408833000
Time step 15: 2023-01-06T00:46:59.476344000
Time step 16: 2023-01-11T00:46:56.637909000
Time step 17: 2023-02-10T00:46:59.060952000
Time step 18: 2023-02-20T00:46:59.702085000
Time step 19: 2023-05-11T00:47:03.027042000

Produce the animations

Create animations with xr_animation for:

1. RGB unmasked imagery
2. RGB land and cloud masked, and contrast stretched
3. NDTI land and cloud masked imagery

Consider changing the following parameters:

  • show_text will adapt the text of the label

  • interval will alter the amount at which images are displayed (units = milliseconds)

  • width_pixels changes the size of the display

  • annotation_kwargs to change the typography of the label

[28]:
# Produce time series animation of RGB (unmasked)
xr_animation(ds=ds_final_imgs_no_mask,
             bands=['nbart_red', 'nbart_green', 'nbart_blue'],
             output_path='RGB_Murray_Mouth_Plume.mp4',
             annotation_kwargs={'fontsize': 20, 'color':'white'},
             show_text='RGB',
             interval=2000,
             width_pixels=800)

# Plot animation
plt.close()
Video('RGB_Murray_Mouth_Plume.mp4', embed=True)
Exporting animation to RGB_Murray_Mouth_Plume.mp4
[28]: