Introduction to loading data
Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_ls7e_gm_cyear_3, ga_ls8cls9c_gm_cyear_3
Prerequisites: Users of this notebook should have a basic understanding of:
How to run a Jupyter notebook
The basic structure of the DEA satellite datasets
Inspecting available DEA products and measurements
Background
Loading data from the Digital Earth Australia (DEA) instance of the Open Data Cube requires the construction of a data query that specifies the what, where, and when of the data request. Each query returns a multi-dimensional xarray object containing the contents of your query. It is essential to understand the xarray
data structures as they are fundamental to the structure of data
loaded from the datacube. Manipulations, transformations and visualisation of xarray
objects provide datacube users with the ability to explore and analyse DEA datasets, as well as pose and answer scientific questions.
Description
This notebook will introduce how to load data from the DEA datacube through the construction of a query and use of the dc.load()
function. Topics covered include:
Loading data using
dc.load()
Interpreting the resulting
xarray.Dataset
objectInspecting an individual
xarray.DataArray
Customising parameters passed to the
dc.load()
functionLoading specific measurements
Loading data for coordinates in a custom coordinate reference system (CRS)
Projecting data to a new CRS and spatial resolution
Specifying a specific spatial resampling method
Loading data using a reusable dictionary query
Loading matching data from multiple products using
like
Adding a progress bar to the data load
Getting started
To run this introduction to loading data from DEA, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.
Load packages
The datacube
package is required to query the datacube database and load some data. The with_ui_cbk
function from odc.ui
enables a progress bar when loading large amounts of data.
[1]:
import datacube
from odc.ui import with_ui_cbk
Connect to the datacube
The next step is to connect to the datacube database. The resulting dc
datacube object can then be used to load data. The app
parameter is a unique name used to identify the notebook that does not have any effect on the analysis.
[2]:
dc = datacube.Datacube(app="04_Loading_data")
Loading data using dc.load()
Loading data from the datacube uses the dc.load() function.
The function requires the following minimum arguments:
product
: The data product to load (to revise DEA products, see the Products and measurements notebook).x
: The spatial region in the x dimension. By default, the x and y arguments accept queries in a geographical co-ordinate system WGS84, identified by the EPSG code 4326.y
: The spatial region in the y dimension. The dimensionslongitude
/latitude
andx
/y
can be used interchangeably.time
: The temporal extent. The time dimension can be specified using a tuple of datetime objects or strings in the “YYYY”, “YYYY-MM” or “YYYY-MM-DD” format.
For example, to load 2015 data from the Landsat 8 NBAR-T annual geomedian product for Moreton Bay in southern Queensland, use the following parameters:
product
:ga_ls8cls9c_gm_cyear_3
x
:(153.3, 153.4)
y
:(-27.5, -27.6)
time
:("2015-01-01", "2015-12-31")
Run the following cell to load all datasets from the ga_ls8cls9c_gm_cyear_3
product that match this spatial and temporal extent:
[3]:
ds = dc.load(product="ga_ls8cls9c_gm_cyear_3",
x=(153.3, 153.4),
y=(-27.5, -27.6),
time=("2015-01-01", "2015-12-31"))
ds
[3]:
<xarray.Dataset> Size: 4MB Dimensions: (time: 1, y: 424, x: 384) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 * x (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269 nbart_green (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366 nbart_red (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322 nbart_nir (time, y, x) int16 326kB 94 94 95 96 ... 2691 2437 2132 2237 nbart_swir_1 (time, y, x) int16 326kB 56 57 56 57 ... 1432 1177 1018 1059 nbart_swir_2 (time, y, x) int16 326kB 46 46 46 48 47 ... 716 579 490 496 sdev (time, y, x) float32 651kB 0.003487 0.003262 ... 0.001991 edev (time, y, x) float32 651kB 133.7 127.6 125.6 ... 176.0 169.4 bcdev (time, y, x) float32 651kB 0.09439 0.09096 ... 0.0404 0.03889 count (time, y, x) int16 326kB 16 16 16 16 16 15 ... 12 12 12 12 12 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Interpreting the resulting xarray.Dataset
The variable ds
has returned an xarray.Dataset
containing all data that matched the spatial and temporal query parameters inputted into dc.load
.
Dimensions
This header identifies the number of timesteps returned in the search (
time: 1
) as well as the number of pixels in thex
andy
directions of the data query.
Coordinates
time
identifies the date attributed to each returned timestep.x
andy
are the coordinates for each pixel within the spatial bounds of the query.
Data variables
These are the measurements available for the nominated product. For every date (
time
) returned by the query, the measured value at each pixel (y
,x
) is returned as an array for each measurement. Each data variable is itself anxarray.DataArray
object (see below).
Attributes
crs
identifies the coordinate reference system (CRS) of the loaded data.
Inspecting an individual xarray.DataArray
The xarray.Dataset
loaded above is itself a collection of individual xarray.DataArray
objects that hold the actual data for each data variable/measurement. For example, all measurements listed under Data variables above (e.g. nbart_blue
, nbart_green
, nbart_red
, nbart_nir
, nbart_swir_1
, nbart_swir_2
) are xarray.DataArray
objects.
These xarray.DataArray
objects can be inspected or interacted with by using either of the following syntaxes:
ds["measurement_name"]
or
ds.measurement_name
The ability to access individual variables means that these can be directly viewed, or further manipulated to create new variables. For example, run the following cell to access data from the near infra-red satellite band (i.e. nir
):
[5]:
ds.nbart_nir
[5]:
<xarray.DataArray 'nbart_nir' (time: 1, y: 424, x: 384)> Size: 326kB array([[[ 94, 94, 95, ..., 79, 87, 90], [ 94, 92, 93, ..., 84, 85, 118], [ 93, 90, 91, ..., 79, 82, 136], ..., [3174, 2840, 2626, ..., 2070, 2375, 2466], [2776, 2905, 2660, ..., 2076, 2284, 2489], [2516, 2828, 2621, ..., 2437, 2132, 2237]]], dtype=int16) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06 * x (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Attributes: units: 1 nodata: -999 crs: EPSG:3577 grid_mapping: spatial_ref
Note that the object header informs us that it is an xarray.DataArray
containing data for the nbart_nir
satellite band.
Like an xarray.Dataset
, the array also includes information about the data’s dimensions (i.e. (time: 1, y: 508, x: 461)
), coordinates and attributes. This particular data variable/measurement contains some additional information that is specific to the nbart_nir
band, including details of array’s nodata value (i.e. nodata: -999
).
For a more in-depth introduction to
xarray
data structures, refer to the official xarray documentation
Customising the dc.load()
function
The dc.load()
function can be tailored to refine a query.
Customisation options include:
measurements:
This argument is used to provide a list of measurement names to load, as listed indc.list_measurements()
. For satellite datasets, measurements contain data for each individual satellite band (e.g. near infrared). If not provided, all measurements for the product will be returned.crs:
The coordinate reference system (CRS) of the query’sx
andy
coordinates is assumed to beWGS84
/EPSG:4326
unless thecrs
field is supplied, even if the stored data is in another projection or theoutput_crs
is specified. Thecrs
parameter is required if the query’s coordinates are in any other CRS.group_by:
Satellite datasets based around scenes can have multiple observations per day with slightly different time stamps as the satellite collects data along its path. These observations can be combined by reducing thetime
dimension to the day level usinggroup_by=solar_day
.output_crs
andresolution
: To reproject or change the resolution the data, supply theoutput_crs
andresolution
fields.resampling
: This argument allows you to specify a custom spatial resampling method to use when data is reprojected into a different CRS.
Example syntax on the use of these options follows in the cells below.
For help or more customisation options, run
help(dc.load)
in an empty cell or visit the function’s documentation page
Specifying measurements
By default, dc.load()
will load all measurements in a product.
To load data from the red
, green
and blue
satellite bands only, add measurements=["nbart_red", "nbart_green", "nbart_blue"]
to the query:
[7]:
# Note the optional inclusion of the measurements list
ds_rgb = dc.load(product="ga_ls8cls9c_gm_cyear_3",
measurements=["nbart_red", "nbart_green", "nbart_blue"],
x=(153.3, 153.4),
y=(-27.5, -27.6),
time=("2015-01-01", "2015-12-31"))
ds_rgb
[7]:
<xarray.Dataset> Size: 983kB Dimensions: (time: 1, y: 424, x: 384) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06 * x (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Data variables: nbart_red (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322 nbart_green (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366 nbart_blue (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Note that the Data variables component of the xarray.Dataset
now includes only the measurements specified in the query (i.e. the red
, green
and blue
satellite bands).
Loading data for coordinates in any CRS
By default, dc.load()
assumes that the queried x
and y
coordinates are in the WGS84
/EPSG:4326
CRS. If these coordinates are in a different coordinate system, specify this using the crs
parameter.
The example cell below loads data for a set of x
and y
coordinates defined in Australian Albers (EPSG:3577
), ensuring that the dc.load()
function accounts for this by including crs="EPSG:3577"
:
[8]:
# Note the new `x` and `y` coordinates and `crs` parameter
ds_custom_crs = dc.load(product="ga_ls8cls9c_gm_cyear_3",
time=("2015-01-01", "2015-12-31"),
x=(2069310, 2077064),
y=(-3155823, -3168513),
crs="EPSG:3577")
ds_custom_crs
[8]:
<xarray.Dataset> Size: 3MB Dimensions: (time: 1, y: 423, x: 259) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 * x (x) float64 2kB 2.069e+06 2.069e+06 ... 2.077e+06 2.077e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 219kB 462 462 459 456 ... 382 392 375 370 nbart_green (time, y, x) int16 219kB 476 475 470 469 ... 449 453 442 438 nbart_red (time, y, x) int16 219kB 213 211 208 208 ... 221 221 214 209 nbart_nir (time, y, x) int16 219kB 82 82 80 78 78 79 ... 79 77 80 75 75 nbart_swir_1 (time, y, x) int16 219kB 49 48 48 47 45 47 ... 37 35 37 35 34 nbart_swir_2 (time, y, x) int16 219kB 41 40 39 37 36 37 ... 28 27 28 27 28 sdev (time, y, x) float32 438kB 0.003445 0.00351 ... 0.005286 edev (time, y, x) float32 438kB 94.66 94.49 87.37 ... 105.6 111.0 bcdev (time, y, x) float32 438kB 0.07609 0.07581 ... 0.08184 0.08964 count (time, y, x) int16 219kB 17 17 17 17 17 17 ... 12 12 13 12 12 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
CRS reprojection
Certain applications may require that data is output into a specific CRS. Data can be reprojected by specifying the new output_crs
and identifying the resolution
required.
The example cell below reprojects data to a new CRS (UTM Zone 56S, EPSG:32756
) and resolution (250 x 250 m). Note that for most CRSs, the first resolution value is negative (e.g. (-250, 250)
):
[9]:
ds_reprojected = dc.load(product="ga_ls8cls9c_gm_cyear_3",
x=(153.3, 153.4),
y=(-27.5, -27.6),
time=("2015-01-01", "2015-12-31"),
output_crs="EPSG:32756",
resolution=(-250, 250))
ds_reprojected
[9]:
<xarray.Dataset> Size: 47kB Dimensions: (time: 1, y: 45, x: 40) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06 * x (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05 spatial_ref int32 4B 32756 Data variables: nbart_blue (time, y, x) int16 4kB 455 445 437 429 424 ... 428 418 393 385 nbart_green (time, y, x) int16 4kB 466 445 425 415 400 ... 488 481 459 448 nbart_red (time, y, x) int16 4kB 206 193 183 176 169 ... 246 242 233 220 nbart_nir (time, y, x) int16 4kB 78 76 77 73 74 75 ... 85 80 86 84 81 81 nbart_swir_1 (time, y, x) int16 4kB 47 44 44 41 41 42 ... 42 38 44 41 38 40 nbart_swir_2 (time, y, x) int16 4kB 39 36 36 33 34 34 ... 34 31 35 32 30 32 sdev (time, y, x) float32 7kB 0.003344 0.003487 ... 0.006524 edev (time, y, x) float32 7kB 96.69 95.62 91.36 ... 120.7 134.8 bcdev (time, y, x) float32 7kB 0.07422 0.07294 ... 0.1174 0.119 count (time, y, x) int16 4kB 17 16 16 16 16 16 ... 13 14 14 13 13 13 Attributes: crs: EPSG:32756 grid_mapping: spatial_ref
Note that the crs
attribute in the Attributes section has changed to EPSG:32756
. Due to the larger 250 m resolution, there are now fewer pixels on the x
and y
dimensions (e.g. x: 40, y: 45
compared to x: 461, y: 508
in earlier examples).
Spatial resampling methods
When a product is re-projected to a different CRS and/or resolution, the new pixel grid may differ from the original input pixels by size, number and alignment. It is therefore necessary to apply a spatial “resampling” rule that allocates input pixel values into the new pixel grid.
By default, dc.load()
resamples pixel values using “nearest neighbour” resampling, which allocates each new pixel with the value of the closest input pixel. Depending on the type of data and the analysis being run, this may not be the most appropriate choice (e.g. for continuous data).
The resampling
parameter in dc.load()
allows you to choose a custom resampling method from the following options:
"nearest", "cubic", "bilinear", "cubic_spline", "lanczos",
"average", "mode", "gauss", "max", "min", "med", "q1", "q3"
The example cell below requests that all loaded data is resampled using “average” resampling:
[10]:
# Note the additional `resampling` parameter
ds_averesampling = dc.load(product="ga_ls8cls9c_gm_cyear_3",
x=(153.3, 153.4),
y=(-27.5, -27.6),
time=("2015-01-01", "2015-12-31"),
resolution=(-250, 250),
resampling="average")
ds_averesampling
[10]:
<xarray.Dataset> Size: 63kB Dimensions: (time: 1, y: 51, x: 47) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 408B -3.156e+06 -3.156e+06 ... -3.168e+06 * x (x) float64 376B 2.067e+06 2.068e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 5kB 476 469 474 478 487 ... 542 354 305 284 nbart_green (time, y, x) int16 5kB 519 514 524 532 542 ... 697 427 478 500 nbart_red (time, y, x) int16 5kB 236 232 241 253 259 ... 610 241 398 367 nbart_nir (time, y, x) int16 5kB 95 89 84 86 86 ... 334 166 1869 2902 nbart_swir_1 (time, y, x) int16 5kB 56 52 47 49 50 ... 76 150 76 918 1262 nbart_swir_2 (time, y, x) int16 5kB 46 42 37 39 40 41 ... 55 98 46 457 585 sdev (time, y, x) float32 10kB 0.002588 0.002426 ... 0.00074 edev (time, y, x) float32 10kB 123.2 112.8 111.7 ... 166.6 190.7 bcdev (time, y, x) float32 10kB 0.09107 0.08658 ... 0.04773 0.02964 count (time, y, x) int16 5kB 16 16 15 15 15 15 ... 11 10 10 10 10 12 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Python dictionaries can be used to request different sampling methods for different measurements. This can be particularly useful when some measurements contain contain categorical data which require resampling methods such as “nearest” or “mode” that do not modify the input pixel values.
The example cell below specifies resampling={"red": "nearest", "*": "average"}
, which implements “nearest” neighbour resampling for the red
satellite band only. "*": "average"
will apply “average” resampling for all other satellite bands:
[11]:
ds_customresampling = dc.load(product="ga_ls8cls9c_gm_cyear_3",
x=(153.3, 153.4),
y=(-27.5, -27.6),
time=("2015-01-01", "2015-12-31"),
output_crs="EPSG:32756",
resolution=(-250, 250),
resampling={"red": "nearest", "*": "average"})
ds_customresampling
[11]:
<xarray.Dataset> Size: 47kB Dimensions: (time: 1, y: 45, x: 40) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06 * x (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05 spatial_ref int32 4B 32756 Data variables: nbart_blue (time, y, x) int16 4kB 458 448 436 428 424 ... 421 420 397 385 nbart_green (time, y, x) int16 4kB 470 451 424 414 401 ... 484 481 462 449 nbart_red (time, y, x) int16 4kB 209 198 182 176 169 ... 251 243 234 221 nbart_nir (time, y, x) int16 4kB 79 78 76 73 74 75 ... 76 77 85 85 81 81 nbart_swir_1 (time, y, x) int16 4kB 47 45 43 41 41 42 ... 36 37 43 42 38 39 nbart_swir_2 (time, y, x) int16 4kB 39 37 35 33 34 34 ... 28 30 34 33 30 31 sdev (time, y, x) float32 7kB 0.003474 0.003705 ... 0.006441 edev (time, y, x) float32 7kB 98.57 97.29 94.86 ... 120.2 131.2 bcdev (time, y, x) float32 7kB 0.07532 0.07276 ... 0.1148 0.117 count (time, y, x) int16 4kB 17 17 16 16 16 16 ... 13 14 14 13 13 13 Attributes: crs: EPSG:32756 grid_mapping: spatial_ref
For more information about spatial resampling methods, see the following guide
Loading data using the query dictionary syntax
It is often useful to re-use a set of query parameters to load data from multiple products. To achieve this, load data using the “query dictionary” syntax. This involves placing the query parameters inside a Python dictionary object which can be re-used for multiple data loads:
[12]:
query = {"x": (153.3, 153.4),
"y": (-27.5, -27.6),
"time": ("2015-01-01", "2015-12-31")}
The query dictionary object can be added as an input to dc.load()
.
The
**
syntax below is Python’s “keyword argument unpacking” operator. This operator takes the named query parameters listed in the query dictionary (e.g."x": (153.3, 153.4)
), and “unpacks” them into thedc.load()
function as new arguments. For more information about unpacking operators, refer to the Python documentation
[13]:
ds = dc.load(product="ga_ls8cls9c_gm_cyear_3",
**query)
ds
[13]:
<xarray.Dataset> Size: 4MB Dimensions: (time: 1, y: 424, x: 384) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 * x (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 326kB 469 471 475 480 ... 313 277 257 269 nbart_green (time, y, x) int16 326kB 510 513 518 524 ... 489 431 363 366 nbart_red (time, y, x) int16 326kB 232 235 238 241 ... 376 332 311 322 nbart_nir (time, y, x) int16 326kB 94 94 95 96 ... 2691 2437 2132 2237 nbart_swir_1 (time, y, x) int16 326kB 56 57 56 57 ... 1432 1177 1018 1059 nbart_swir_2 (time, y, x) int16 326kB 46 46 46 48 47 ... 716 579 490 496 sdev (time, y, x) float32 651kB 0.003487 0.003262 ... 0.001991 edev (time, y, x) float32 651kB 133.7 127.6 125.6 ... 176.0 169.4 bcdev (time, y, x) float32 651kB 0.09439 0.09096 ... 0.0404 0.03889 count (time, y, x) int16 326kB 16 16 16 16 16 15 ... 12 12 12 12 12 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Query dictionaries can contain any set of parameters that would usually be provided to dc.load()
:
[14]:
query = {"x": (153.3, 153.4),
"y": (-27.5, -27.6),
"time": ("2015-01-01", "2015-12-31"),
"output_crs": "EPSG:32756",
"resolution": (-250, 250)}
ds_ls8 = dc.load(product="ga_ls8cls9c_gm_cyear_3",
**query)
ds_ls8
[14]:
<xarray.Dataset> Size: 47kB Dimensions: (time: 1, y: 45, x: 40) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06 * x (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05 spatial_ref int32 4B 32756 Data variables: nbart_blue (time, y, x) int16 4kB 455 445 437 429 424 ... 428 418 393 385 nbart_green (time, y, x) int16 4kB 466 445 425 415 400 ... 488 481 459 448 nbart_red (time, y, x) int16 4kB 206 193 183 176 169 ... 246 242 233 220 nbart_nir (time, y, x) int16 4kB 78 76 77 73 74 75 ... 85 80 86 84 81 81 nbart_swir_1 (time, y, x) int16 4kB 47 44 44 41 41 42 ... 42 38 44 41 38 40 nbart_swir_2 (time, y, x) int16 4kB 39 36 36 33 34 34 ... 34 31 35 32 30 32 sdev (time, y, x) float32 7kB 0.003344 0.003487 ... 0.006524 edev (time, y, x) float32 7kB 96.69 95.62 91.36 ... 120.7 134.8 bcdev (time, y, x) float32 7kB 0.07422 0.07294 ... 0.1174 0.119 count (time, y, x) int16 4kB 17 16 16 16 16 16 ... 13 14 14 13 13 13 Attributes: crs: EPSG:32756 grid_mapping: spatial_ref
After specifying the reusable query, it can be easily used to load data from a different product. The example cell below loads Landsat 7 data for the same extent, time, output CRS and resolution as the previously loaded Landsat 8 data:
[15]:
ds_ls7 = dc.load(product="ga_ls7e_gm_cyear_3",
**query)
ds_ls7
[15]:
<xarray.Dataset> Size: 47kB Dimensions: (time: 1, y: 45, x: 40) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06 * x (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05 spatial_ref int32 4B 32756 Data variables: nbart_blue (time, y, x) int16 4kB 439 411 388 386 394 ... 352 351 337 333 nbart_green (time, y, x) int16 4kB 481 450 414 407 407 ... 454 450 442 419 nbart_red (time, y, x) int16 4kB 248 225 206 204 205 ... 268 260 255 238 nbart_nir (time, y, x) int16 4kB 150 137 134 141 142 ... 122 124 130 135 nbart_swir_1 (time, y, x) int16 4kB 79 76 77 79 84 80 ... 84 80 63 63 68 75 nbart_swir_2 (time, y, x) int16 4kB 67 65 64 71 70 67 ... 73 71 51 53 53 65 sdev (time, y, x) float32 7kB 0.006507 0.007285 ... 0.01002 edev (time, y, x) float32 7kB 249.9 212.5 171.2 ... 145.0 145.2 bcdev (time, y, x) float32 7kB 0.2078 0.2035 ... 0.1341 0.1357 count (time, y, x) int16 4kB 12 12 12 12 12 12 ... 11 11 9 9 10 10 Attributes: crs: EPSG:32756 grid_mapping: spatial_ref
Other helpful tricks
Loading data “like” another dataset
Another option for loading matching data from multiple products is to use dc.load()
’s like
parameter. This will copy the spatial and temporal extent and the CRS/resolution from an existing dataset, and use these parameters to load new data from a new product.
The example cell below loads another Landsat 7 dataset that exactly matches the ds_ls8
dataset loaded earlier:
[16]:
ds_ls7 = dc.load(product="ga_ls7e_gm_cyear_3",
like=ds_ls8)
ds_ls7
[16]:
<xarray.Dataset> Size: 47kB Dimensions: (time: 1, y: 45, x: 40) Coordinates: * time (time) datetime64[ns] 8B 2015-07-02T11:59:59.999999 * y (y) float64 360B 6.958e+06 6.958e+06 ... 6.947e+06 6.947e+06 * x (x) float64 320B 5.296e+05 5.299e+05 ... 5.391e+05 5.394e+05 spatial_ref int32 4B 32756 Data variables: nbart_blue (time, y, x) int16 4kB 439 411 388 386 394 ... 352 351 337 333 nbart_green (time, y, x) int16 4kB 481 450 414 407 407 ... 454 450 442 419 nbart_red (time, y, x) int16 4kB 248 225 206 204 205 ... 268 260 255 238 nbart_nir (time, y, x) int16 4kB 150 137 134 141 142 ... 122 124 130 135 nbart_swir_1 (time, y, x) int16 4kB 79 76 77 79 84 80 ... 84 80 63 63 68 75 nbart_swir_2 (time, y, x) int16 4kB 67 65 64 71 70 67 ... 73 71 51 53 53 65 sdev (time, y, x) float32 7kB 0.006507 0.007285 ... 0.01002 edev (time, y, x) float32 7kB 249.9 212.5 171.2 ... 145.0 145.2 bcdev (time, y, x) float32 7kB 0.2078 0.2035 ... 0.1341 0.1357 count (time, y, x) int16 4kB 12 12 12 12 12 12 ... 11 11 9 9 10 10 Attributes: crs: PROJCS["WGS 84 / UTM zone 56S",GEOGCS["WGS 84",DATUM["WGS_... grid_mapping: spatial_ref
Adding a progress bar
When loading large amounts of data, it can be useful to view the progress of the data load. The progress_cbk
parameter in dc.load()
adds a progress bar that indicates how the load is progressing:
The example cell below loads 5 years of data (2013, 2014, 2015, 2016 and 2017) from the ls8_nbart_geomedian_annual
product with a progress bar:
[17]:
query = {"x": (153.3, 153.4),
"y": (-27.5, -27.6),
"time": ("2013", "2017")}
ds_progress = dc.load(product="ga_ls8cls9c_gm_cyear_3",
progress_cbk=with_ui_cbk(),
**query)
ds_progress
[17]:
<xarray.Dataset> Size: 21MB Dimensions: (time: 5, y: 424, x: 384) Coordinates: * time (time) datetime64[ns] 40B 2013-07-02T11:59:59.999999 ... 20... * y (y) float64 3kB -3.156e+06 -3.156e+06 ... -3.168e+06 * x (x) float64 3kB 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 2MB 473 474 473 469 470 ... 329 302 287 284 nbart_green (time, y, x) int16 2MB 522 525 524 519 524 ... 505 437 383 384 nbart_red (time, y, x) int16 2MB 241 246 245 242 247 ... 383 342 329 333 nbart_nir (time, y, x) int16 2MB 85 89 86 82 83 ... 2839 2514 2333 2412 nbart_swir_1 (time, y, x) int16 2MB 45 48 48 43 46 ... 1366 1099 992 1079 nbart_swir_2 (time, y, x) int16 2MB 36 38 37 34 35 ... 667 636 509 455 487 sdev (time, y, x) float32 3MB 0.003244 0.003646 ... 0.00157 edev (time, y, x) float32 3MB 114.4 108.7 109.3 ... 210.3 218.2 bcdev (time, y, x) float32 3MB 0.09557 0.08731 ... 0.04308 0.04944 count (time, y, x) int16 2MB 9 9 9 9 9 9 9 ... 14 14 14 13 13 13 13 Attributes: crs: EPSG:3577 grid_mapping: spatial_ref
Recommended next steps
To continue working through the notebooks in this beginner’s guide, the following notebooks are designed to be worked through in the following order:
Loading data (this notebook)
Once you have worked through the beginner’s guide, you can join advanced users by exploring:
A demonstration of how to load cloud-free observations in the using load_ard notebook.
The “DEA products” directory in the repository, where you can explore DEA products in depth.
The “How_to_guides” directory, which contains a recipe book of common techniques and methods for analysing DEA data.
The “Real_world_examples” directory, which provides more complex workflows and analysis case studies.
Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
GitHub.
Last modified: June 2024
Compatible datacube version:
[18]:
print(datacube.__version__)
1.8.18
Tags
Tags: sandbox compatible, NCI compatible, dc.load, xarray.Dataset, xarray.DataArray, landsat 7, landsat 8, annual geomedian, crs, reprojecting data, resampling data, beginner
[ ]: