Introduction to loading data 9e372ac8962849dabdf33417332968ab

Background

Loading data from the Digital Earth Australia (DEA) instance of the Open Data Cube requires the construction of a data query that specifies the what, where, and when of the data request. Each query returns a multi-dimensional xarray object containing the contents of your query. It is essential to understand the xarray data structures as they are fundamental to the structure of data loaded from the datacube. Manipulations, transformations and visualisation of xarray objects provide datacube users with the ability to explore and analyse DEA datasets, as well as pose and answer scientific questions.

Description

This notebook will introduce how to load data from the DEA datacube through the construction of a query and use of the dc.load() function. Topics covered include:

  1. Loading data using dc.load()

  2. Interpreting the resulting xarray.Dataset object

    • Inspecting an individual xarray.DataArray

  3. Customising parameters passed to the dc.load() function

    • Loading specific measurements

    • Loading data for coordinates in a custom coordinate reference system (CRS)

    • Projecting data to a new CRS and spatial resolution

    • Specifying a specific spatial resampling method

  4. Loading data using a reusable dictionary query

  5. Loading matching data from multiple products using like

  6. Adding a progress bar to the data load


Getting started

To run this introduction to loading data from DEA, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

The datacube package is required to query the datacube database and load some data. The with_ui_cbk function from odc.ui enables a progress bar when loading large amounts of data.

[1]:
import datacube
from odc.ui import with_ui_cbk

Connect to the datacube

The next step is to connect to the datacube database. The resulting dc datacube object can then be used to load data. The app parameter is a unique name used to identify the notebook that does not have any effect on the analysis.

[2]:
dc = datacube.Datacube(app="04_Loading_data")

Loading data using dc.load()

Loading data from the datacube uses the dc.load() function.

The function requires the following minimum arguments:

  • product: The data product to load (to revise DEA products, see the Products and measurements notebook).

  • x: The spatial region in the x dimension. By default, the x and y arguments accept queries in a geographical co-ordinate system WGS84, identified by the EPSG code 4326.

  • y: The spatial region in the y dimension. The dimensions longitude/latitude and x/y can be used interchangeably.

  • time: The temporal extent. The time dimension can be specified using a tuple of datetime objects or strings in the “YYYY”, “YYYY-MM” or “YYYY-MM-DD” format.

For example, to load 2015 data from the Landsat 8 NBAR-T annual geomedian product for Moreton Bay in southern Queensland, use the following parameters:

  • product: ga_ls8cls9c_gm_cyear_3

  • x: (153.3, 153.4)

  • y: (-27.5, -27.6)

  • time: ("2015-01-01", "2015-12-31")

Run the following cell to load all datasets from the ga_ls8cls9c_gm_cyear_3 product that match this spatial and temporal extent:

[3]:
ds = dc.load(product="ga_ls8cls9c_gm_cyear_3",
             x=(153.3, 153.4),
             y=(-27.5, -27.6),
             time=("2015-01-01", "2015-12-31"))

ds
[3]:
<xarray.Dataset> Size: 4MB
Dimensions:       (time: 1, y: 424, x: 384)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06
  * x             (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269
    nbart_green   (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366
    nbart_red     (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322
    nbart_nir     (time, y, x) int16 326kB 94 94 95 96 ... 2691 2437 2132 2237
    nbart_swir_1  (time, y, x) int16 326kB 56 57 56 57 ... 1432 1177 1018 1059
    nbart_swir_2  (time, y, x) int16 326kB 46 46 46 48 47 ... 716 579 490 496
    sdev          (time, y, x) float32 651kB 0.003487 0.003262 ... 0.001991
    edev          (time, y, x) float32 651kB 133.7 127.6 125.6 ... 176.0 169.4
    bcdev         (time, y, x) float32 651kB 0.09439 0.09096 ... 0.0404 0.03889
    count         (time, y, x) int16 326kB 16 16 16 16 16 15 ... 12 12 12 12 12
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Interpreting the resulting xarray.Dataset

The variable ds has returned an xarray.Dataset containing all data that matched the spatial and temporal query parameters inputted into dc.load.

Dimensions

  • This header identifies the number of timesteps returned in the search (time: 1) as well as the number of pixels in the x and y directions of the data query.

Coordinates

  • time identifies the date attributed to each returned timestep.

  • x and y are the coordinates for each pixel within the spatial bounds of the query.

Data variables

  • These are the measurements available for the nominated product. For every date (time) returned by the query, the measured value at each pixel (y, x) is returned as an array for each measurement. Each data variable is itself an xarray.DataArray object (see below).

Attributes

  • crs identifies the coordinate reference system (CRS) of the loaded data.

Inspecting an individual xarray.DataArray

The xarray.Dataset loaded above is itself a collection of individual xarray.DataArray objects that hold the actual data for each data variable/measurement. For example, all measurements listed under Data variables above (e.g. nbart_blue, nbart_green, nbart_red, nbart_nir, nbart_swir_1, nbart_swir_2) are xarray.DataArray objects.

These xarray.DataArray objects can be inspected or interacted with by using either of the following syntaxes:

ds["measurement_name"]

or

ds.measurement_name

The ability to access individual variables means that these can be directly viewed, or further manipulated to create new variables. For example, run the following cell to access data from the near infra-red satellite band (i.e. nir):

[5]:
ds.nbart_nir
[5]:
<xarray.DataArray 'nbart_nir' (time: 1, y: 424, x: 384)> Size: 326kB
array([[[  94,   94,   95, ...,   79,   87,   90],
        [  94,   92,   93, ...,   84,   85,  118],
        [  93,   90,   91, ...,   79,   82,  136],
        ...,
        [3174, 2840, 2626, ..., 2070, 2375, 2466],
        [2776, 2905, 2660, ..., 2076, 2284, 2489],
        [2516, 2828, 2621, ..., 2437, 2132, 2237]]], dtype=int16)
Coordinates:
  * time         (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y            (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06
  * x            (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
    spatial_ref  int32 4B 3577
Attributes:
    units:         1
    nodata:        -999
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Note that the object header informs us that it is an xarray.DataArray containing data for the nbart_nir satellite band.

Like an xarray.Dataset, the array also includes information about the data’s dimensions (i.e. (time: 1, y: 508, x: 461)), coordinates and attributes. This particular data variable/measurement contains some additional information that is specific to the nbart_nir band, including details of array’s nodata value (i.e. nodata: -999).

For a more in-depth introduction to xarray data structures, refer to the official xarray documentation

Customising the dc.load() function

The dc.load() function can be tailored to refine a query.

Customisation options include:

  • measurements: This argument is used to provide a list of measurement names to load, as listed in dc.list_measurements(). For satellite datasets, measurements contain data for each individual satellite band (e.g. near infrared). If not provided, all measurements for the product will be returned.

  • crs: The coordinate reference system (CRS) of the query’s x and y coordinates is assumed to be WGS84/EPSG:4326 unless the crs field is supplied, even if the stored data is in another projection or the output_crs is specified. The crs parameter is required if the query’s coordinates are in any other CRS.

  • group_by: Satellite datasets based around scenes can have multiple observations per day with slightly different time stamps as the satellite collects data along its path. These observations can be combined by reducing the time dimension to the day level using group_by=solar_day.

  • output_crs and resolution: To reproject or change the resolution the data, supply the output_crs and resolution fields.

  • resampling: This argument allows you to specify a custom spatial resampling method to use when data is reprojected into a different CRS.

Example syntax on the use of these options follows in the cells below.

For help or more customisation options, run help(dc.load) in an empty cell or visit the function’s documentation page

Specifying measurements

By default, dc.load() will load all measurements in a product.

To load data from the red, green and blue satellite bands only, add measurements=["nbart_red", "nbart_green", "nbart_blue"] to the query:

[7]:
# Note the optional inclusion of the measurements list
ds_rgb = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                 measurements=["nbart_red", "nbart_green", "nbart_blue"],
                 x=(153.3, 153.4),
                 y=(-27.5, -27.6),
                 time=("2015-01-01", "2015-12-31"))

ds_rgb
[7]:
<xarray.Dataset> Size: 983kB
Dimensions:      (time: 1, y: 424, x: 384)
Coordinates:
  * time         (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y            (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06
  * x            (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
    spatial_ref  int32 4B 3577
Data variables:
    nbart_red    (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322
    nbart_green  (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366
    nbart_blue   (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Note that the Data variables component of the xarray.Dataset now includes only the measurements specified in the query (i.e. the red, green and blue satellite bands).

Loading data for coordinates in any CRS

By default, dc.load() assumes that the queried x and y coordinates are in the WGS84/EPSG:4326 CRS. If these coordinates are in a different coordinate system, specify this using the crs parameter.

The example cell below loads data for a set of x and y coordinates defined in Australian Albers (EPSG:3577), ensuring that the dc.load() function accounts for this by including crs="EPSG:3577":

[8]:
# Note the new `x` and `y` coordinates and `crs` parameter
ds_custom_crs = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                        time=("2015-01-01", "2015-12-31"),
                        x=(2069310, 2077064),
                        y=(-3155823, -3168513),
                        crs="EPSG:3577")

ds_custom_crs
[8]:
<xarray.Dataset> Size: 3MB
Dimensions:       (time: 1, y: 423, x: 259)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06
  * x             (x) float64 2kB 2.069e+06 2.069e+06 ... 2.077e+06 2.077e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 219kB 462 462 459 456 ... 382 392 375 370
    nbart_green   (time, y, x) int16 219kB 476 475 470 469 ... 449 453 442 438
    nbart_red     (time, y, x) int16 219kB 213 211 208 208 ... 221 221 214 209
    nbart_nir     (time, y, x) int16 219kB 82 82 80 78 78 79 ... 79 77 80 75 75
    nbart_swir_1  (time, y, x) int16 219kB 49 48 48 47 45 47 ... 37 35 37 35 34
    nbart_swir_2  (time, y, x) int16 219kB 41 40 39 37 36 37 ... 28 27 28 27 28
    sdev          (time, y, x) float32 438kB 0.003445 0.00351 ... 0.005286
    edev          (time, y, x) float32 438kB 94.66 94.49 87.37 ... 105.6 111.0
    bcdev         (time, y, x) float32 438kB 0.07609 0.07581 ... 0.08184 0.08964
    count         (time, y, x) int16 219kB 17 17 17 17 17 17 ... 12 12 13 12 12
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

CRS reprojection

Certain applications may require that data is output into a specific CRS. Data can be reprojected by specifying the new output_crs and identifying the resolution required.

The example cell below reprojects data to a new CRS (UTM Zone 56S, EPSG:32756) and resolution (250 x 250 m). Note that for most CRSs, the first resolution value is negative (e.g. (-250, 250)):

[9]:
ds_reprojected = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                         x=(153.3, 153.4),
                         y=(-27.5, -27.6),
                         time=("2015-01-01", "2015-12-31"),
                         output_crs="EPSG:32756",
                         resolution=(-250, 250))

ds_reprojected
[9]:
<xarray.Dataset> Size: 47kB
Dimensions:       (time: 1, y: 45, x: 40)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06
  * x             (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05
    spatial_ref   int32 4B 32756
Data variables:
    nbart_blue    (time, y, x) int16 4kB 455 445 437 429 424 ... 428 418 393 385
    nbart_green   (time, y, x) int16 4kB 466 445 425 415 400 ... 488 481 459 448
    nbart_red     (time, y, x) int16 4kB 206 193 183 176 169 ... 246 242 233 220
    nbart_nir     (time, y, x) int16 4kB 78 76 77 73 74 75 ... 85 80 86 84 81 81
    nbart_swir_1  (time, y, x) int16 4kB 47 44 44 41 41 42 ... 42 38 44 41 38 40
    nbart_swir_2  (time, y, x) int16 4kB 39 36 36 33 34 34 ... 34 31 35 32 30 32
    sdev          (time, y, x) float32 7kB 0.003344 0.003487 ... 0.006524
    edev          (time, y, x) float32 7kB 96.69 95.62 91.36 ... 120.7 134.8
    bcdev         (time, y, x) float32 7kB 0.07422 0.07294 ... 0.1174 0.119
    count         (time, y, x) int16 4kB 17 16 16 16 16 16 ... 13 14 14 13 13 13
Attributes:
    crs:           EPSG:32756
    grid_mapping:  spatial_ref

Note that the crs attribute in the Attributes section has changed to EPSG:32756. Due to the larger 250 m resolution, there are now fewer pixels on the x and y dimensions (e.g. x: 40, y: 45 compared to x: 461, y: 508 in earlier examples).

Spatial resampling methods

When a product is re-projected to a different CRS and/or resolution, the new pixel grid may differ from the original input pixels by size, number and alignment. It is therefore necessary to apply a spatial “resampling” rule that allocates input pixel values into the new pixel grid.

By default, dc.load() resamples pixel values using “nearest neighbour” resampling, which allocates each new pixel with the value of the closest input pixel. Depending on the type of data and the analysis being run, this may not be the most appropriate choice (e.g. for continuous data).

The resampling parameter in dc.load() allows you to choose a custom resampling method from the following options:

"nearest", "cubic", "bilinear", "cubic_spline", "lanczos",
"average", "mode", "gauss", "max", "min", "med", "q1", "q3"

The example cell below requests that all loaded data is resampled using “average” resampling:

[10]:
# Note the additional `resampling` parameter
ds_averesampling = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                               x=(153.3, 153.4),
                               y=(-27.5, -27.6),
                               time=("2015-01-01", "2015-12-31"),
                               resolution=(-250, 250),
                               resampling="average")

ds_averesampling

[10]:
<xarray.Dataset> Size: 63kB
Dimensions:       (time: 1, y: 51, x: 47)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 408B -3.156e+06 -3.156e+06 ... -3.168e+06
  * x             (x) float64 376B 2.067e+06 2.068e+06 ... 2.079e+06 2.079e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 5kB 476 469 474 478 487 ... 542 354 305 284
    nbart_green   (time, y, x) int16 5kB 519 514 524 532 542 ... 697 427 478 500
    nbart_red     (time, y, x) int16 5kB 236 232 241 253 259 ... 610 241 398 367
    nbart_nir     (time, y, x) int16 5kB 95 89 84 86 86 ... 334 166 1869 2902
    nbart_swir_1  (time, y, x) int16 5kB 56 52 47 49 50 ... 76 150 76 918 1262
    nbart_swir_2  (time, y, x) int16 5kB 46 42 37 39 40 41 ... 55 98 46 457 585
    sdev          (time, y, x) float32 10kB 0.002588 0.002426 ... 0.00074
    edev          (time, y, x) float32 10kB 123.2 112.8 111.7 ... 166.6 190.7
    bcdev         (time, y, x) float32 10kB 0.09107 0.08658 ... 0.04773 0.02964
    count         (time, y, x) int16 5kB 16 16 15 15 15 15 ... 11 10 10 10 10 12
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Python dictionaries can be used to request different sampling methods for different measurements. This can be particularly useful when some measurements contain contain categorical data which require resampling methods such as “nearest” or “mode” that do not modify the input pixel values.

The example cell below specifies resampling={"red": "nearest", "*": "average"}, which implements “nearest” neighbour resampling for the red satellite band only. "*": "average" will apply “average” resampling for all other satellite bands:

[11]:
ds_customresampling = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                              x=(153.3, 153.4),
                              y=(-27.5, -27.6),
                              time=("2015-01-01", "2015-12-31"),
                              output_crs="EPSG:32756",
                              resolution=(-250, 250),
                              resampling={"red": "nearest", "*": "average"})

ds_customresampling
[11]:
<xarray.Dataset> Size: 47kB
Dimensions:       (time: 1, y: 45, x: 40)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06
  * x             (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05
    spatial_ref   int32 4B 32756
Data variables:
    nbart_blue    (time, y, x) int16 4kB 458 448 436 428 424 ... 421 420 397 385
    nbart_green   (time, y, x) int16 4kB 470 451 424 414 401 ... 484 481 462 449
    nbart_red     (time, y, x) int16 4kB 209 198 182 176 169 ... 251 243 234 221
    nbart_nir     (time, y, x) int16 4kB 79 78 76 73 74 75 ... 76 77 85 85 81 81
    nbart_swir_1  (time, y, x) int16 4kB 47 45 43 41 41 42 ... 36 37 43 42 38 39
    nbart_swir_2  (time, y, x) int16 4kB 39 37 35 33 34 34 ... 28 30 34 33 30 31
    sdev          (time, y, x) float32 7kB 0.003474 0.003705 ... 0.006441
    edev          (time, y, x) float32 7kB 98.57 97.29 94.86 ... 120.2 131.2
    bcdev         (time, y, x) float32 7kB 0.07532 0.07276 ... 0.1148 0.117
    count         (time, y, x) int16 4kB 17 17 16 16 16 16 ... 13 14 14 13 13 13
Attributes:
    crs:           EPSG:32756
    grid_mapping:  spatial_ref

For more information about spatial resampling methods, see the following guide

Loading data using the query dictionary syntax

It is often useful to re-use a set of query parameters to load data from multiple products. To achieve this, load data using the “query dictionary” syntax. This involves placing the query parameters inside a Python dictionary object which can be re-used for multiple data loads:

[12]:
query = {"x": (153.3, 153.4),
         "y": (-27.5, -27.6),
         "time": ("2015-01-01", "2015-12-31")}

The query dictionary object can be added as an input to dc.load().

The ** syntax below is Python’s “keyword argument unpacking” operator. This operator takes the named query parameters listed in the query dictionary (e.g. "x": (153.3, 153.4)), and “unpacks” them into the dc.load() function as new arguments. For more information about unpacking operators, refer to the Python documentation

[13]:
ds = dc.load(product="ga_ls8cls9c_gm_cyear_3",
             **query)

ds
[13]:
<xarray.Dataset> Size: 4MB
Dimensions:       (time: 1, y: 424, x: 384)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06
  * x             (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269
    nbart_green   (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366
    nbart_red     (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322
    nbart_nir     (time, y, x) int16 326kB 94 94 95 96 ... 2691 2437 2132 2237
    nbart_swir_1  (time, y, x) int16 326kB 56 57 56 57 ... 1432 1177 1018 1059
    nbart_swir_2  (time, y, x) int16 326kB 46 46 46 48 47 ... 716 579 490 496
    sdev          (time, y, x) float32 651kB 0.003487 0.003262 ... 0.001991
    edev          (time, y, x) float32 651kB 133.7 127.6 125.6 ... 176.0 169.4
    bcdev         (time, y, x) float32 651kB 0.09439 0.09096 ... 0.0404 0.03889
    count         (time, y, x) int16 326kB 16 16 16 16 16 15 ... 12 12 12 12 12
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Query dictionaries can contain any set of parameters that would usually be provided to dc.load():

[14]:
query = {"x": (153.3, 153.4),
         "y": (-27.5, -27.6),
         "time": ("2015-01-01", "2015-12-31"),
         "output_crs": "EPSG:32756",
         "resolution": (-250, 250)}

ds_ls8 = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                 **query)

ds_ls8

[14]:
<xarray.Dataset> Size: 47kB
Dimensions:       (time: 1, y: 45, x: 40)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06
  * x             (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05
    spatial_ref   int32 4B 32756
Data variables:
    nbart_blue    (time, y, x) int16 4kB 455 445 437 429 424 ... 428 418 393 385
    nbart_green   (time, y, x) int16 4kB 466 445 425 415 400 ... 488 481 459 448
    nbart_red     (time, y, x) int16 4kB 206 193 183 176 169 ... 246 242 233 220
    nbart_nir     (time, y, x) int16 4kB 78 76 77 73 74 75 ... 85 80 86 84 81 81
    nbart_swir_1  (time, y, x) int16 4kB 47 44 44 41 41 42 ... 42 38 44 41 38 40
    nbart_swir_2  (time, y, x) int16 4kB 39 36 36 33 34 34 ... 34 31 35 32 30 32
    sdev          (time, y, x) float32 7kB 0.003344 0.003487 ... 0.006524
    edev          (time, y, x) float32 7kB 96.69 95.62 91.36 ... 120.7 134.8
    bcdev         (time, y, x) float32 7kB 0.07422 0.07294 ... 0.1174 0.119
    count         (time, y, x) int16 4kB 17 16 16 16 16 16 ... 13 14 14 13 13 13
Attributes:
    crs:           EPSG:32756
    grid_mapping:  spatial_ref

After specifying the reusable query, it can be easily used to load data from a different product. The example cell below loads Landsat 7 data for the same extent, time, output CRS and resolution as the previously loaded Landsat 8 data:

[15]:
ds_ls7 = dc.load(product="ga_ls7e_gm_cyear_3",
                 **query)

ds_ls7
[15]:
<xarray.Dataset> Size: 47kB
Dimensions:       (time: 1, y: 45, x: 40)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06
  * x             (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05
    spatial_ref   int32 4B 32756
Data variables:
    nbart_blue    (time, y, x) int16 4kB 439 411 388 386 394 ... 352 351 337 333
    nbart_green   (time, y, x) int16 4kB 481 450 414 407 407 ... 454 450 442 419
    nbart_red     (time, y, x) int16 4kB 248 225 206 204 205 ... 268 260 255 238
    nbart_nir     (time, y, x) int16 4kB 150 137 134 141 142 ... 122 124 130 135
    nbart_swir_1  (time, y, x) int16 4kB 79 76 77 79 84 80 ... 84 80 63 63 68 75
    nbart_swir_2  (time, y, x) int16 4kB 67 65 64 71 70 67 ... 73 71 51 53 53 65
    sdev          (time, y, x) float32 7kB 0.006507 0.007285 ... 0.01002
    edev          (time, y, x) float32 7kB 249.9 212.5 171.2 ... 145.0 145.2
    bcdev         (time, y, x) float32 7kB 0.2078 0.2035 ... 0.1341 0.1357
    count         (time, y, x) int16 4kB 12 12 12 12 12 12 ... 11 11 9 9 10 10
Attributes:
    crs:           EPSG:32756
    grid_mapping:  spatial_ref

Other helpful tricks

Loading data “like” another dataset

Another option for loading matching data from multiple products is to use dc.load()’s like parameter. This will copy the spatial and temporal extent and the CRS/resolution from an existing dataset, and use these parameters to load new data from a new product.

The example cell below loads another Landsat 7 dataset that exactly matches the ds_ls8 dataset loaded earlier:

[16]:
ds_ls7 = dc.load(product="ga_ls7e_gm_cyear_3",
                 like=ds_ls8)

ds_ls7
[16]:
<xarray.Dataset> Size: 47kB
Dimensions:       (time: 1, y: 45, x: 40)
Coordinates:
  * time          (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999
  * y             (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06
  * x             (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05
    spatial_ref   int32 4B 32756
Data variables:
    nbart_blue    (time, y, x) int16 4kB 439 411 388 386 394 ... 352 351 337 333
    nbart_green   (time, y, x) int16 4kB 481 450 414 407 407 ... 454 450 442 419
    nbart_red     (time, y, x) int16 4kB 248 225 206 204 205 ... 268 260 255 238
    nbart_nir     (time, y, x) int16 4kB 150 137 134 141 142 ... 122 124 130 135
    nbart_swir_1  (time, y, x) int16 4kB 79 76 77 79 84 80 ... 84 80 63 63 68 75
    nbart_swir_2  (time, y, x) int16 4kB 67 65 64 71 70 67 ... 73 71 51 53 53 65
    sdev          (time, y, x) float32 7kB 0.006507 0.007285 ... 0.01002
    edev          (time, y, x) float32 7kB 249.9 212.5 171.2 ... 145.0 145.2
    bcdev         (time, y, x) float32 7kB 0.2078 0.2035 ... 0.1341 0.1357
    count         (time, y, x) int16 4kB 12 12 12 12 12 12 ... 11 11 9 9 10 10
Attributes:
    crs:           PROJCS["WGS 84 / UTM zone 56S",GEOGCS["WGS 84",DATUM["WGS_...
    grid_mapping:  spatial_ref

Adding a progress bar

When loading large amounts of data, it can be useful to view the progress of the data load. The progress_cbk parameter in dc.load() adds a progress bar that indicates how the load is progressing:

Progress bar

The example cell below loads 5 years of data (2013, 2014, 2015, 2016 and 2017) from the ls8_nbart_geomedian_annual product with a progress bar:

[17]:
query = {"x": (153.3, 153.4),
         "y": (-27.5, -27.6),
         "time": ("2013", "2017")}

ds_progress = dc.load(product="ga_ls8cls9c_gm_cyear_3",
                      progress_cbk=with_ui_cbk(),
                      **query)

ds_progress
[17]:
<xarray.Dataset> Size: 21MB
Dimensions:       (time: 5, y: 424, x: 384)
Coordinates:
  * time          (time) datetime64[ns] 40B 2013-07-02T11:59:59.999999 ... 20...
  * y             (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06
  * x             (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
    spatial_ref   int32 4B 3577
Data variables:
    nbart_blue    (time, y, x) int16 2MB 473 474 473 469 470 ... 329 302 287 284
    nbart_green   (time, y, x) int16 2MB 522 525 524 519 524 ... 505 437 383 384
    nbart_red     (time, y, x) int16 2MB 241 246 245 242 247 ... 383 342 329 333
    nbart_nir     (time, y, x) int16 2MB 85 89 86 82 83 ... 2839 2514 2333 2412
    nbart_swir_1  (time, y, x) int16 2MB 45 48 48 43 46 ... 1366 1099 992 1079
    nbart_swir_2  (time, y, x) int16 2MB 36 38 37 34 35 ... 667 636 509 455 487
    sdev          (time, y, x) float32 3MB 0.003244 0.003646 ... 0.00157
    edev          (time, y, x) float32 3MB 114.4 108.7 109.3 ... 210.3 218.2
    bcdev         (time, y, x) float32 3MB 0.09557 0.08731 ... 0.04308 0.04944
    count         (time, y, x) int16 2MB 9 9 9 9 9 9 9 ... 14 14 14 13 13 13 13
Attributes:
    crs:           EPSG:3577
    grid_mapping:  spatial_ref

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on GitHub.

Last modified: June 2024

Compatible datacube version:

[18]:
print(datacube.__version__)
1.8.18

Tags

Tags: sandbox compatible, NCI compatible, dc.load, xarray.Dataset, xarray.DataArray, landsat 7, landsat 8, annual geomedian, crs, reprojecting data, resampling data, beginner

[ ]: