Introduction to plotting 86c52424188346fc8463ad0bf3642b06

Background

Data visualisation is an important component of working with Earth Observation data. The xarray Python package provides a range of straightforward data plotting options that allow users to quickly generate simple plots from multi-dimensional datasets. To generate more complex and informative plots, the DEA Notebooks repository also provides a custom plotting module with additional easy-to-use functionality.

Description

This introductory notebook demonstrates how to visualise DEA satellite data returned from running a datacube query. The notebook demonstrates commonly used xarray plotting methods, as well as custom functions provided in the dea_tools.plotting module.

Topics covered in this notebook include:

  1. View an area of interest prior to querying the datacube

  2. Querying the datacube and loading data

  3. Plotting single band data (e.g. a single satellite band)

    • Selecting and plotting individual timesteps

    • Plotting multiple timesteps

    • Customising plot appearance

  4. Plotting three-band true or false colour imagery

    • Plotting single timesteps

    • Plotting multiple timesteps

    • Customising plot appearance


Getting started

To run this introduction to plotting data loaded from the datacube, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

The first step is to run %matplotlib inline, which ensures figures plot correctly in the Jupyter notebook. Next, load the datacube package to enable loading data, and a selection of custom DEA functions from the dea_tools.plotting module.

[1]:
%matplotlib inline

import datacube
import sys

sys.path.insert(1, '../Tools/')
from dea_tools.plotting import display_map, rgb

Connect to the datacube

The next step is to connect to the datacube database. The resulting dc datacube object can then be used to load data. The app parameter is a unique name used to identify the notebook that does not have any effect on the analysis.

[2]:
dc = datacube.Datacube(app="05_Plotting")

Analysis parameters

The following variables are required to establish a query for this notebook: - lat_range: The latitude range to analyse (e.g. (-27.715, -27.755)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - lon_range: The longitude range to analyse (e.g. (153.42, 153.46)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - time_range: The date range to analyse (e.g. ("2013", "2017")).

[3]:
lat_range = (-27.58, -27.666)
lon_range = (153.3, 153.4)
time_range = ("2013", "2017")

View the queried location

Before running a query and extracting and analysing data, it is useful to double-check that your location is correct. The display_map() function shows your selected area as a red rectangle on an interactive map. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

[4]:
display_map(x=lon_range, y=lat_range)
[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query and view data

The variables specified above are used to load data from the DEA datacube through the dc.load(), which was introduced in the Loading data notebook. This introduction will continue to use the GeoMAD product, as used in the Loading data notebook.

[5]:
ds = dc.load(product="ga_ls8cls9c_gm_cyear_3",
             x=lon_range,
             y=lat_range,
             time=time_range)

ds
[5]:
<xarray.Dataset> Size: 18MB
Dimensions:       (time: 5, y: 372, x: 375)
Coordinates:
  * time          (time) datetime64[ns] 40B 2013-07-02T11:59:59.999999 ... 20...
  * y             (y) float64 3kB -3.165e+06 -3.165e+06 ... -3.176e+06
  * x             (x) float64 3kB 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 1MB 236 244 235 230 224 ... 286 275 276 287
    nbart_green   (time, y, x) int16 1MB 354 369 358 350 343 ... 392 374 371 394
    nbart_red     (time, y, x) int16 1MB 284 286 278 273 255 ... 353 334 326 346
    nbart_nir     (time, y, x) int16 1MB 2484 2567 2580 2588 ... 1959 2048 2223
    nbart_swir_1  (time, y, x) int16 1MB 1069 1093 1054 1006 ... 1230 1149 1200
    nbart_swir_2  (time, y, x) int16 1MB 470 473 454 432 411 ... 657 581 530 543
    sdev          (time, y, x) float32 3MB 0.0006154 0.0001635 ... 0.001664
    edev          (time, y, x) float32 3MB 116.4 86.19 80.38 ... 162.9 156.3
    bcdev         (time, y, x) float32 3MB 0.02473 0.01709 ... 0.03776 0.03605
    count         (time, y, x) int16 1MB 8 8 8 8 8 8 8 ... 11 11 11 11 11 11 11
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Plotting single band images

The xarray package provides built-in methods for plotting individual data variables or measurements.

To do this, identify the band to plot. In this example, the nbart_swir_1 satellite band is used. To plot a single band, the data must be an xarray.DataArray (to revise the difference between xarray.Dataset and xarray.DataArray objects, refer back to the Loading data notebook):

[6]:
ds.nbart_swir_1
[6]:
<xarray.DataArray 'nbart_swir_1' (time: 5, y: 372, x: 375)> Size: 1MB
array([[[1069, 1093, 1054, ...,   38,   37,   37],
        [1085, 1074, 1021, ...,   37,   36,   36],
        [1067, 1036,  972, ...,   34,   34,   34],
        ...,
        [1249, 1294,  882, ..., 1097, 1092, 1139],
        [ 810,  801,  745, ..., 1095, 1066, 1145],
        [1123,  882,  865, ..., 1073, 1016, 1086]],

       [[1128, 1139, 1110, ...,   46,   45,   47],
        [1193, 1141, 1077, ...,   47,   45,   47],
        [1221, 1144, 1053, ...,   47,   45,   47],
        ...,
        [1355, 1344,  906, ..., 1107, 1105, 1177],
        [ 873,  895,  805, ..., 1100, 1070, 1171],
        [1184,  934,  871, ..., 1086, 1021, 1088]],

       [[1015, 1044, 1014, ...,   34,   33,   35],
        [1055, 1045,  977, ...,   33,   33,   30],
        [1078, 1014,  944, ...,   34,   33,   31],
        ...,
        [1237, 1343,  977, ..., 1066, 1038, 1095],
        [ 799,  828,  832, ..., 1072, 1045, 1114],
        [1225,  885,  843, ..., 1056,  999, 1067]],

       [[1068, 1071, 1039, ...,   38,   38,   38],
        [1114, 1081, 1012, ...,   37,   37,   37],
        [1140, 1061,  987, ...,   37,   37,   36],
        ...,
        [1287, 1343,  980, ..., 1107, 1088, 1161],
        [ 873,  914,  954, ..., 1101, 1059, 1157],
        [1227,  943,  896, ..., 1090, 1029, 1103]],

       [[1071, 1068, 1016, ...,   54,   55,   54],
        [1063, 1048,  975, ...,   55,   57,   56],
        [1094, 1032,  946, ...,   56,   58,   57],
        ...,
        [1257, 1269,  900, ..., 1250, 1193, 1228],
        [ 898,  880,  899, ..., 1244, 1176, 1238],
        [1232,  969,  939, ..., 1230, 1149, 1200]]], dtype=int16)
Coordinates:
  * time         (time) datetime64[ns] 40B 2013-07-02T11:59:59.999999 ... 201...
  * y            (y) float64 3kB -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 3kB 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 4B 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Selecting and plotting a single timestep

The returned object header above specifies that ds.swir1 is a xarray.DataArray with five timesteps (i.e. <xarray.DataArray 'swir1' (time: 5, y: 446, x: 450)>). To make a plot for a single timestep only, select the desired timestep using one of the following options:

  1. .isel(): This stands for “index selection”, which selects individual timesteps from a dataset based on the sequence of loaded timesteps. Counting in Python begins at 0, so to select the first timestep in the xarray.DataArray we can specify .isel(time=0):

[7]:
first_timestep = ds.nbart_swir_1.isel(time=0)

first_timestep
[7]:
<xarray.DataArray 'nbart_swir_1' (y: 372, x: 375)> Size: 279kB
array([[1069, 1093, 1054, ...,   38,   37,   37],
       [1085, 1074, 1021, ...,   37,   36,   36],
       [1067, 1036,  972, ...,   34,   34,   34],
       ...,
       [1249, 1294,  882, ..., 1097, 1092, 1139],
       [ 810,  801,  745, ..., 1095, 1066, 1145],
       [1123,  882,  865, ..., 1073, 1016, 1086]], dtype=int16)
Coordinates:
    time         datetime64[ns] 8B 2013-07-02T11:59:59.999999
  * y            (y) float64 3kB -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 3kB 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 4B 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref
  1. .sel(): This selects data using real-world coordinate labels like time. For example, from the Coordinates section, the first timestep (i.e. the year 2013) is selected from the xarray.DataArray by specifying .sel(time='2013'):

[8]:
first_timestep = ds.nbart_swir_1.sel(time='2013')

first_timestep
[8]:
<xarray.DataArray 'nbart_swir_1' (time: 1, y: 372, x: 375)> Size: 279kB
array([[[1069, 1093, 1054, ...,   38,   37,   37],
        [1085, 1074, 1021, ...,   37,   36,   36],
        [1067, 1036,  972, ...,   34,   34,   34],
        ...,
        [1249, 1294,  882, ..., 1097, 1092, 1139],
        [ 810,  801,  745, ..., 1095, 1066, 1145],
        [1123,  882,  865, ..., 1073, 1016, 1086]]], dtype=int16)
Coordinates:
  * time         (time) datetime64[ns] 8B 2013-07-02T11:59:59.999999
  * y            (y) float64 3kB -3.165e+06 -3.165e+06 ... -3.176e+06 -3.176e+06
  * x            (x) float64 3kB 2.066e+06 2.066e+06 ... 2.077e+06 2.077e+06
    spatial_ref  int32 4B 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

After selecting a timestep, the .plot() method can be used to display the chosen nbart_swir_1 data:

[9]:
first_timestep.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7ff3d0c7faf0>
../../../_images/notebooks_Beginners_guide_05_Plotting_22_1.png

Plotting multiple timesteps

It is often useful to produce plots for a single measurement across time, for example to compare change between satellite observations or summary datasets. To plot multiple images, skip the isel() step above and plot the entire xarray.DataArray directly.

To plot multiple timesteps in one figure, it is necessary to instruct the .plot() function to put each timestep in a different column. This is done by specifying .plot(col="time"):

[10]:
ds.nbart_swir_1.plot(col="time")
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7ff3a7e1c280>
../../../_images/notebooks_Beginners_guide_05_Plotting_24_1.png

This kind of plotting is called “facetted plotting”. For more information, refer to the xarray documentation

Customising plot appearance

The plots above are dark and difficult to see clearly. To improve the appearance of xarray plots, use the robust=True argument to optimise the plot colours by clipping extreme values or outliers. This will use the 2nd and 98th percentiles of the data to compute the color limits:

[11]:
ds.nbart_swir_1.plot(col="time", robust=True)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7ff3a7a76c50>
../../../_images/notebooks_Beginners_guide_05_Plotting_27_1.png

Plots can be further customised by adding custom colour maps/styles using the cmap parameter.

When choosing a colour map for a plot, it is important to choose a set of colours that are perceived logically by the human eye. The best colour maps are “perceptually uniform”: these colour maps increase logically from dark to light colours, where equal increases in lightness/darkness correspond to equal changes in data values. Some best-practice perceptually uniform colour maps include:

"viridis", "plasma", "inferno", "magma", "cividis"

For further reading about perceptually uniform colour maps in data visualisation, refer to the matplotlib documentation

It is also important to consider colour blindness when selecting a colour map. xarray supports many colour maps from the “colorbrewer” family of colour maps which are optimised for colour blindness. You can use the interactive online tool to browse all available colour maps, or choose from one of the following commonly used options:

"Greys", "Purples", "Blues", "Greens", "Oranges", "Reds",
"YlOrBr", "YlOrRd", "OrRd", "PuRd", "RdPu", "BuPu",
"GnBu", "PuBu", "YlGnBu", "PuBuGn", "BuGn", "YlGn"

For a full list of available colour maps you can refer to this list.

The example cell below plots the data with the perceptually uniform magma colour map:

[12]:
ds.nbart_swir_1.plot(col="time", robust=True, cmap="magma")
[12]:
<xarray.plot.facetgrid.FacetGrid at 0x7ff3a7bfb2e0>
../../../_images/notebooks_Beginners_guide_05_Plotting_29_1.png

Plotting true or false colour RGB images

Although xarray makes it easy to plot single band images, plotting a three band colour photo-like image is less straightforward.

To make this easier, the dea-notebooks repository provides a custom rgb() function that is designed for plotting three band images. The rgb() function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.

Providing the nbart_red, nbart_green and nbart_blue measurements from a dataset will produce a true colour image (akin to how humans view the landscape). Providing nbart_nir, nbart_red and nbart_green measurements or any other set of three satellite bands from a dataset will produce a false colour image.

Hence, the rgb() function can be used to visualise the data returned by a query. It requires the minimum input of:

  • ds: The xarray.Dataset object

  • bands: Three bands for display (these must be measurements found in the dataset)

  • index: The timestep to view, default is 0

Plotting a single timestep

The time dimension of an xarray.Dataset describes how many timesteps exist for the loaded data. In the rgb() function, the index variable is asking for which timesteps to view (similar to the isel() example above). Remember: counting in Python begins at 0 so to view the earliest timesteps set index=0:

[13]:
# View a red, green, blue (true colour) image of the first timestep
rgb(ds, bands=["nbart_red", "nbart_green", "nbart_blue"], index=0)
../../../_images/notebooks_Beginners_guide_05_Plotting_32_0.png

It is possible to change the input bands to plot a false colour image, which can provide different insights in a landscape. The false colour band combination (nbart_swir_1, nbart_nir, nbart_green) emphasises growing vegetation in green, and water in deep blue:

[14]:
# View a swir1, nir, green (false colour) image of the first timestep
rgb(ds, bands=['nbart_swir_1', 'nbart_nir', 'nbart_green'], index=0)
../../../_images/notebooks_Beginners_guide_05_Plotting_34_0.png

Plotting multiple timesteps

As discussed in the single band example above, it can be useful to visualise multiple timesteps in a single plot (e.g. to compare change over time).

The rgb() function can also do this, as long as a list of timesteps to view is provided to the index argument, e.g. index=[X1, X2, ...]. The example cell below plots the first and fifth image in the dataset using index=[0, 4] (remembering that counting in Python starts at 0):

[15]:
# View a true colour image for the first and fifth timesteps
rgb(ds, bands=['nbart_red', 'nbart_green', 'nbart_blue'], index=[0, 4])
../../../_images/notebooks_Beginners_guide_05_Plotting_36_0.png

It is also possible to use rgb() to plot all timesteps in a dataset using the col="time" syntax demonstrated in the single band example above:

[16]:
# Plot all timesteps in the dataset
rgb(ds, bands=['nbart_red', 'nbart_green', 'nbart_blue'], col="time")
../../../_images/notebooks_Beginners_guide_05_Plotting_38_0.png

Customising plot appearance

By default, rgb() generates plots with robust=True to improve the appearance of the images by clipping out the darkest and brightest 2% of pixels, using the 2nd and 98th percentiles of the data to compute the colour limits

If this default provides poor results, the plot’s colour stretch can be customised using the percentile_stretch parameter. This clips the most extreme minimum and maximum values in the dataset, improving the contrast and appearance of the plot.

For example, specifying percentile_stretch=[0.05, 0.95] will clip out the darkest and brightest 5% of pixels, focusing the colour stretch on the remaining 90% of less extreme values:

[17]:
rgb(ds,
    bands=['nbart_red', 'nbart_green', 'nbart_blue'],
    index=0,
    percentile_stretch=[0.05, 0.95])

../../../_images/notebooks_Beginners_guide_05_Plotting_40_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: June 2024

Compatible datacube version:

[18]:
print(datacube.__version__)
1.8.6

Tags

Tags: sandbox compatible, NCI compatible, landsat 8, annual geomedian, plotting, dea_plotting, rgb, beginner, data visualisation, colour maps, isel, sel