Exporting data to NetCDF files
Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_ls8cls9c_gm_cyear_3
Background
NetCDF is a file format for storing multidimensional scientific data. This file format supports datasets containing multiple observation dates, as well as multiple bands. It is a native format for storing the xarray
datasets that are produced by Open Data Cube, i.e. by dc.load
commands.
NetCDF files should follow Climate and Forecast (CF) metadata conventions for the description of Earth sciences data. By providing metadata such as geospatial coordinates and sensor information in the same file as the data, CF conventions allow NetCDF files to be “self-describing”. This makes CF-compliant NetCDFs a useful way to save multidimensional data loaded from Digital Earth Australia, as the data can later be loaded with all the information required for further analysis.
The xarray
library which underlies the Open Data Cube (and hence Digital Earth Australia) was specifically designed for representing NetCDF files in Python. However, some geospatial metadata is represented quite differently between the NetCDF-CF conventions versus the GDAL (or proj4) model that is common to most geospatial software (including ODC, e.g. for reprojecting raster data when necessary). The main difference between to_netcdf
(in xarray
natively) and
write_dataset_to_netcdf
(provided by datacube
) is that the latter is able to appropriately serialise the coordinate reference system object which is associated to the dataset.
Description
In this notebook we will load some data from Digital Earth Australia and then write it to a (CF-compliant) NetCDF file using the write_dataset_to_netcdf
function provided by datacube
. We will then verify the file was saved correctly, and (optionally) clean up.
Getting started
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages
[1]:
%matplotlib inline
import datacube
import xarray as xr
from datacube.drivers.netcdf import write_dataset_to_netcdf
Connect to the datacube
[2]:
dc = datacube.Datacube(app='Exporting_NetCDFs')
Load data from the datacube
Here we load a sample dataset from the DEA Landsat-8 Annual Geomedian product (ga_ls8c_nbart_gm_cyear_3
). The loaded data is multidimensional, and contains two time-steps (2015, 2016) and six satellite bands (blue
, green
, red
, nir
, swir1
, swir2
).
[3]:
lat, lon = -35.282052, 149.128667 # City Hill, Canberra
buffer = 0.01 # Approx. 1km
# Load data from the datacube
ds = dc.load(product='ga_ls8cls9c_gm_cyear_3',
lat=(lat - buffer, lat + buffer),
lon=(lon - buffer, lon + buffer),
time=('2015', '2016'))
# Print output data
ds
[3]:
<xarray.Dataset> Size: 304kB Dimensions: (time: 2, y: 82, x: 71) Coordinates: * time (time) datetime64[ns] 16B 2015-07-02T11:59:59.999999 2016-0... * y (y) float64 656B -3.956e+06 -3.956e+06 ... -3.959e+06 * x (x) float64 568B 1.549e+06 1.549e+06 ... 1.551e+06 1.551e+06 spatial_ref int32 4B 3577 Data variables: nbart_blue (time, y, x) int16 23kB 646 652 693 686 ... 519 558 534 425 nbart_green (time, y, x) int16 23kB 838 843 962 945 ... 770 794 779 641 nbart_red (time, y, x) int16 23kB 910 999 1183 1157 ... 828 856 812 636 nbart_nir (time, y, x) int16 23kB 1939 2016 2134 2349 ... 2607 2676 2619 nbart_swir_1 (time, y, x) int16 23kB 1825 2052 2318 2395 ... 1998 2049 1795 nbart_swir_2 (time, y, x) int16 23kB 1450 1632 1933 1924 ... 1276 1266 1050 sdev (time, y, x) float32 47kB 0.001365 0.001093 ... 0.007874 edev (time, y, x) float32 47kB 446.6 572.6 674.6 ... 529.2 521.8 bcdev (time, y, x) float32 47kB 0.06709 0.06056 ... 0.07159 0.07628 count (time, y, x) int16 23kB 23 23 23 23 23 23 ... 15 16 16 17 17 Attributes: crs: epsg:3577 grid_mapping: spatial_ref
Export to a NetCDF file
To export a CF-compliant NetCDF file, we use the write_dataset_to_netcdf
function:
[4]:
write_dataset_to_netcdf(ds, 'output_netcdf.nc')
That’s all. The file has now been produced, and stored in the current working directory.
Reading back from saved NetCDF
Let’s start just by confirming the file now exists. We can use the special !
command to run command line tools directly within a Jupyter notebook. In the example below, ! ls *.nc
runs the ls
shell command, which will give us a list of any files in the NetCDF file format (i.e. with file names ending with .nc
).
For an introduction to using shell commands in Jupyter, see the guide here.
[5]:
! ls *.nc
output_netcdf.nc
We could inspect this file using external utilities such as gdalinfo
or ncdump
, or open it for visualisation e.g. in QGIS
.
We can also load the file back into Python using xarray
:
[6]:
# Load the NetCDF from file
reloaded_ds = xr.open_dataset('output_netcdf.nc')
# Print loaded data
reloaded_ds
[6]:
<xarray.Dataset> Size: 467kB Dimensions: (time: 2, y: 82, x: 71) Coordinates: * time (time) datetime64[ns] 16B 2015-07-02T11:59:59 2016-07-01T23... * y (y) float64 656B -3.956e+06 -3.956e+06 ... -3.959e+06 * x (x) float64 568B 1.549e+06 1.549e+06 ... 1.551e+06 1.551e+06 spatial_ref int32 4B ... Data variables: nbart_blue (time, y, x) float32 47kB ... nbart_green (time, y, x) float32 47kB ... nbart_red (time, y, x) float32 47kB ... nbart_nir (time, y, x) float32 47kB ... nbart_swir_1 (time, y, x) float32 47kB ... nbart_swir_2 (time, y, x) float32 47kB ... sdev (time, y, x) float32 47kB ... edev (time, y, x) float32 47kB ... bcdev (time, y, x) float32 47kB ... count (time, y, x) float32 47kB ... Attributes: date_created: 2024-06-17T06:29:45.296627 Conventions: CF-1.6, ACDD-1.3 history: NetCDF-CF file created by datacube version '1.8.1... geospatial_bounds: POLYGON ((149.11516275211676 -35.272298024386345,... geospatial_bounds_crs: EPSG:4326 geospatial_lat_min: -35.29422745554414 geospatial_lat_max: -35.269737305748244 geospatial_lat_units: degrees_north geospatial_lon_min: 149.11516275211676 geospatial_lon_max: 149.14203545880687 geospatial_lon_units: degrees_east
We can now use this reloaded dataset just like the original dataset, for example by plotting one of its colour bands:
[9]:
reloaded_ds.nbart_red.plot(col='time')
[9]:
<xarray.plot.facetgrid.FacetGrid at 0x7f51a71a8520>
Clean-up
To remove the saved NetCDF file that we created, run the cell below. This is optional.
[10]:
! rm output_netcdf.nc
Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
GitHub.
Last modified: June 2024
Compatible datacube version:
[11]:
print(datacube.__version__)
1.8.18
Tags
Tags: sandbox compatible, NCI compatible, annual geomedian, NetCDF, write_dataset_to_netcdf, exporting data, metadata, shell commands
[ ]: