Image segmentation 23d3f754f53c4b28adc48e5601e35b46


In the last two decades, as the spatial resolution of satellite images has increased, remote sensing has begun to shift from a focus on pixel-based analysis towards Geographic Object-Based Image Analysis (GEOBIA), which aims to group pixels together into meaningful image-objects. There are two advantages to a GEOBIA worklow; one, we can reduce the ‘salt and pepper’ effect typical of classifying pixels; and two, we can increase the computational efficiency of our workflow by grouping pixels into fewer, larger, but meaningful objects. A review of the emerging trends in GEOBIA can be found in Chen et al. (2017).


This notebook demonstrates two methods for conducting image segmentation, which is a common image analysis technique used to transform a digital satellite image into objects. In brief, image segmentation aims to partition an image into segments, where each segment consists of a group of pixels with similar characteristics. A number of algorithms exist to perform image segmentation, one of which are shown here:

  1. Quickshift, implemented through the python package scikit-image

Note: Image segmentation at very large scales can be both time and memory intensive, and the examples shown here will become prohibitively time consuming at scale. The notebook Tiled, Parallel Image Segmentation builds upon the image segmentation algorithm developed by Shepherd et al. (2019) to run image segmentation across multiple cpus.

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

%matplotlib inline

import datacube
import xarray as xr
import numpy as np
import scipy
import matplotlib.pyplot as plt
from osgeo import gdal
from datacube.utils.cog import write_cog
from skimage.segmentation import quickshift

import sys
sys.path.insert(1, '../Tools/')
from dea_tools.datahandling import load_ard
from dea_tools.bandindices import calculate_indices
from dea_tools.plotting import rgb

Connect to the datacube

dc = datacube.Datacube(app='Image_segmentation')

/env/lib/python3.8/site-packages/datacube/drivers/postgres/ RemovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings.  Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at:

Load Sentinel 2 data from the datacube

Here we are loading in a timeseries of Sentinel 2 satellite images through the datacube API using the load_ard function. This will provide us with some data to work with.

# Create a query object
query = {
    'x': (153.35, 153.50),
    'y': (-28.80, -28.95),
    'time': ('2018-01', '2018-03'),
    'measurements': ['nbart_red', 'nbart_nir_1'],
    'output_crs': 'EPSG:3577',
    'resolution': (-30, 30),
    'group_by': 'solar_day'

# Load available data from all three Landsat satellites
ds = load_ard(dc=dc,
              products=['ga_s2am_ard_3', 'ga_s2bm_ard_3'],

# Print output data

Finding datasets
Applying pixel quality/cloud mask (oa_fmask)
Loading 18 time steps
/env/lib/python3.8/site-packages/rasterio/ NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
/env/lib/python3.8/site-packages/rasterio/ NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
Dimensions:      (time: 18, y: 634, x: 570)
  * time         (time) datetime64[ns] 2018-01-01T23:53:47.615000 ... 2018-03...
  * y            (y) float64 -3.3e+06 -3.3e+06 ... -3.319e+06 -3.319e+06
  * x            (x) float64 2.047e+06 2.047e+06 ... 2.064e+06 2.064e+06
    spatial_ref  int32 3577
Data variables:
    nbart_red    (time, y, x) float32 nan nan nan nan ... 291.0 295.0 301.0
    nbart_nir_1  (time, y, x) float32 nan nan nan nan ... 223.0 224.0 248.0
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Combine observations into a noise-free statistical summary image

Individual remote sensing images can be affected by noisy and incomplete data (e.g. due to clouds). To produce cleaner images that we can feed into the image segmentation algorithms, we can create summary images, or composites, that combine multiple images into one image to reveal the ‘typical’ appearance of the landscape for a certain time period. In the code below, we take the noisy, incomplete satellite images we just loaded and calculate the mean Normalised Difference Vegetation Index (NDVI). The mean NDVI will be our input into the segmentation algorithms. We will write the NDVI composite to a geotiff, as the Shepherd Segmentation runs on disk.

Calculate mean NDVI

# First we calculate NDVI on each image in the timeseries
ndvi = calculate_indices(ds, index='NDVI', collection='ga_s2_3')

# For each pixel, calculate the mean NDVI throughout the whole timeseries
ndvi = ndvi.mean(dim='time', keep_attrs=True)

# Plot the results to inspect
ndvi.NDVI.plot(vmin=0.1, vmax=1.0, cmap='gist_earth_r', figsize=(10, 10))

<matplotlib.collections.QuadMesh at 0x7f37476fe1c0>

Quickshift Segmentation

Using the function quickshift from the python package scikit-image, we will conduct an image segmentation on the mean NDVI array. We then calculate a zonal mean across each segment using the input dataset. Our last step is to export our results as a GeoTIFF.

Follow the quickshift hyperlink above to see the input parameters to the algorithm, and the following link for an explanation of quickshift and other segmentation algorithms in scikit-image.

# Convert our mean NDVI xarray into a numpy array, we need
# to be explicit about the datatype to satisfy quickshift
input_array = ndvi.NDVI.values.astype(np.float64)

# Calculate the segments
segments = quickshift(input_array,

# Calculate the zonal mean NDVI across the segments
segments_zonal_mean_qs = scipy.ndimage.mean(input=input_array,

# Plot to see result
plt.figure(figsize=(10, 10))
plt.imshow(segments_zonal_mean_qs, cmap='gist_earth_r', vmin=0.1, vmax=1.0)

<matplotlib.colorbar.Colorbar at 0x7f372c14f4f0>

Export result to GeoTIFF

To export our result to a GeoTIFF raster file, we first need to convert our plain np.array back to an xr.DataArray, then use the write_cog function.

# Convert numpy array to xarray.DataArray
segments_zonal_mean_qs = xr.DataArray(segments_zonal_mean_qs,
             dims=['y', 'x'],

# Write array to GeoTIFF

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: December 2023

Compatible datacube version:



Tags: NCI compatible, sandbox compatible, sentinel 2, load_ard, rgb, calculate_indices, image segmentation, image compositing, scikit-image, GEOBIA, quickshift, NDVI, GeoTIFF, exporting data