Polygonising pixel edges
Sign up to the DEA Sandbox to run this notebook interactively from a browser
Compatibility: Notebook currently compatible with both the
NCI
andDEA Sandbox
environmentsProducts used: ga_s2am_ard_3, ga_s2bm_ard_3, ga_ls5t_ard_3, ga_ls7e_ard_3, ga_ls8c_ard_3
Background
The ability to extract pixel boundaries as a shapefile or other vector file type is very useful for remote sensing applications where the boundaries of the pixel are needed. This is useful for cases such as matching drone imagery with remotely sensed imagery.
Description
This notebook uses Digital Earth Australia to retrieve satellite data, creates a vector file from the pixel boundaries of the data, and exports to file.
First we load some data for a chosen time frame using the dea-notebooks
load_ard
functionThen we convert our raster data into a polygon per pixel
Then we export our pixel edges polygons as a vector file (e.g. ESRI Shapefile or GeoJSON)
This loop is repeated (once for Sentinel-2 data and once for Landsat data)
Getting started
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages
Import Python packages that are used for the analysis.
[1]:
%matplotlib inline
import datacube
import rasterio.features
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import shape
from datacube.utils.cog import write_cog
import sys
sys.path.insert(1, '../Tools/')
from dea_tools.datahandling import load_ard
from dea_tools.plotting import rgb
Define useful functions
The function below is a helper function to help you make a nice filename for the output files
[2]:
def create_filename(ds, image_index, query, prefix=""):
"""
Create_filename is a handy function to turn your xarray.Dataset
and image number into a nice output filename for your shapefile.
Parameters
----------
ds : xarray.Dataset
Input xarray.Dataset data
image_index : int
An integer value between 0 and however many images you have
prefix : string, optional
An optional name prefix to identify your area of interest
"""
assert ds
image_date = str(ds.nbart_red[image_index].time.data)[:10]
image_coords = f"{query['x'][0]}_{query['y'][0]}_{query['x'][1]}_{query['y'][1]}"
if prefix:
output_name = f"{str(prefix)}_{image_date}_{image_coords}"
else:
output_name = f"{image_date}_{image_coords}"
print(output_name)
return output_name
Connect to the datacube
Connect to the datacube so we can access DEA data. The app
parameter is a unique name for the analysis which is based on the notebook file name.
[3]:
dc = datacube.Datacube(app="Polygonise_pixel_edges")
Analysis parameters
x
: a tuple of longitude/Easting boundaries defining the area of interest in the format (smaller, larger) e.g.(153.40, 153.50)
y
: a tuple of latitude/Northing boundaries defining the area of interest in the format (smaller, larger) e.g.(-27.40, -27.50)
time
: a tuple of time values in a pandas datetime format defining the time period of interest in the format (smaller, larger) e.g.("2018-08-01", "2018-08-20")
products
: A list of product names to load from the datacube e.g.['ga_ls7e_ard_3', 'ga_ls8c_ard_3']
measurements
: A list of band names to load from the satellite product e.g.['nbart_red', 'nbart_green']
resolution
: The spatial resolution of the loaded satellite data e.g. for Landsat, this is(-30, 30)
output_crs
: The coordinate reference system/map projection to load data into, e.g.'EPSG:3577'
to load data in the Albers Equal Area projectiongroup_by
: How to group the data e.g."solar_day"
Sentinel-2 example
Set up a datacube query for area of interest
Set up a query to load Sentinel-2 satellite data:
[4]:
# Create a query object
s2_query = {
"x": (153.44, 153.45),
"y": (-27.42, -27.43),
"time": ("2018-08-01", "2018-08-20"),
"products": ("ga_s2am_ard_3", "ga_s2bm_ard_3"),
"measurements": ["nbart_red", "nbart_green", "nbart_blue", "fmask"],
"output_crs": "EPSG:3577",
"resolution": (-10, 10),
"group_by": "solar_day",
}
Load available data from Sentinel 2 satellites:
[5]:
# Load Sentinel-2 data using `load_ard`. We're not using fmask to mask our
# pixel quality, as we're after the surrounds of the pixels not big white blobs.
s2_ds = load_ard(
dc=dc,
min_gooddata=0.95, # only take scenes with less than 5% fmask activations
mask_pixel_quality=False,
**s2_query
)
Finding datasets
ga_s2am_ard_3
ga_s2bm_ard_3
Counting good quality pixels for each time step
Filtering to 8 out of 8 time steps with at least 95.0% good quality pixels
Loading 8 time steps
Plot up some data to take a look at it
Use the rgb
function from dea_tools.plotting
module to look at our area of interest in true colour
[6]:
rgb(s2_ds, bands=["nbart_red", "nbart_green", "nbart_blue"], col="time")
Choose an image index to use to polygonise the pixels
Change the number in the cell below to select a different image from the images in the previous cell; 0 is the first image.
[7]:
image_index = 6
Segment our image into per-pixel-polygons
Use the red band to create a polygon for each pixel.
[8]:
# Input array to segment and vectorise
input_array = s2_ds.nbart_red[image_index]
input_transform = s2_ds.affine
input_crs = s2_ds.crs
# Create array with a unique value per cell
unique_pixels = np.arange(input_array.size).reshape(input_array.shape)
# Vectorise each unique feature in array
vectors = rasterio.features.shapes(
source=unique_pixels.astype(np.int16), transform=input_transform
)
# Extract polygons and values from generator
vectors = list(vectors)
values = [value for polygon, value in vectors]
polygons = [shape(polygon) for polygon, value in vectors]
# Create a geopandas dataframe populated with the polygon shapes
s2_poly_gdf = gpd.GeoDataFrame(data={"id": values}, geometry=polygons, crs=input_crs)
Investigate the results
Print and plot the first 5 entries in the polygon GeoDataFrame:
[9]:
# Print the first 5 rows in the GeoDataFrame
s2_poly_gdf.head()
[9]:
id | geometry | |
---|---|---|
0 | 0.0 | POLYGON ((2084090.000 -3149240.000, 2084090.00... |
1 | 1.0 | POLYGON ((2084100.000 -3149240.000, 2084100.00... |
2 | 2.0 | POLYGON ((2084110.000 -3149240.000, 2084110.00... |
3 | 3.0 | POLYGON ((2084120.000 -3149240.000, 2084120.00... |
4 | 4.0 | POLYGON ((2084130.000 -3149240.000, 2084130.00... |
Plot shapefile over raster
[10]:
# Plot raster data
fig, ax1 = plt.subplots(figsize=[8, 8])
rgb(
s2_ds.isel(time=image_index),
bands=["nbart_red", "nbart_green", "nbart_blue"],
ax=ax1,
)
# Plot our shapefile over top of our raster
s2_poly_gdf.boundary.plot(color=None, edgecolor="k", linewidth=0.2, ax=ax1)
plt.show()
Write polygonised pixel edges to file
Change the prefix in the cell below for a customised filename, and write our new segmented shapefile out to a file:
[11]:
output_filename = create_filename(s2_ds, image_index, s2_query, prefix="NS_s2")
s2_poly_gdf.to_file(f"{output_filename}.shp")
Write corresponding RGB GeoTIFF image data to file
Write the corresponding GeoTIFF out to a file, for checking purposes:
[12]:
s2_rgb = s2_ds.isel(time=image_index).to_array()
write_cog(s2_rgb, f"{output_filename}.tif", overwrite=True)
[12]:
PosixPath('NS_s2_2018-08-18_153.44_-27.42_153.45_-27.43.tif')
Landsat example
Set up a datacube query for area of interest
Set up a query to reproduce our workflow using Landsat satellite data:
[13]:
# Create a landsat query
ls_query = {
"x": (153.44, 153.45),
"y": (-27.42, -27.43),
"time": ("2018-08-01", "2018-08-20"),
"products": ["ga_ls5t_ard_3", "ga_ls7e_ard_3", "ga_ls8c_ard_3"],
"measurements": [
"nbart_green",
"nbart_red",
"nbart_blue",
], # only get the bands we need for plotting and segmenting
"output_crs": "EPSG:3577",
"resolution": (-30, 30),
"group_by": "solar_day",
}
Load available data from Landsat satellites
[14]:
# Load landsat data using `load_ard`
ls_ds = load_ard(dc=dc, ls7_slc_off=False, **ls_query)
Finding datasets
ga_ls5t_ard_3
ga_ls7e_ard_3 (ignoring SLC-off observations)
ga_ls8c_ard_3
Applying pixel quality/cloud mask
Loading 1 time steps
Plot data
Use the rgb
function from dea_plotting
module to look at our area of interest in true colour:
[15]:
rgb(ls_ds, bands=["nbart_red", "nbart_green", "nbart_blue"], col="time")
[16]:
image_index = 0
Segment image into per-pixel-polygons
Use the red band to create a polygon for each pixel.
[17]:
# Input array to segment and vectorise
input_array = ls_ds.nbart_red[image_index]
input_transform = ls_ds.affine
input_crs = ls_ds.crs
# Create array with a unique value per cell
unique_pixels = np.arange(input_array.size).reshape(input_array.shape)
# Vectorise each unique feature in array
vectors = rasterio.features.shapes(
source=unique_pixels.astype(np.int16), transform=input_transform
)
# Extract polygons and values from generator
vectors = list(vectors)
values = [value for polygon, value in vectors]
polygons = [shape(polygon) for polygon, value in vectors]
# Create a geopandas dataframe populated with the polygon shapes
ls_poly_gdf = gpd.GeoDataFrame(data={"id": values}, geometry=polygons, crs=input_crs)
Investigate the results
Print and plot the first 5 entries in the polygon GeoDataFrame:
[18]:
# Print the first 5 rows in the GeoDataFrame
ls_poly_gdf.head()
[18]:
id | geometry | |
---|---|---|
0 | 0.0 | POLYGON ((2084070.000 -3149220.000, 2084070.00... |
1 | 1.0 | POLYGON ((2084100.000 -3149220.000, 2084100.00... |
2 | 2.0 | POLYGON ((2084130.000 -3149220.000, 2084130.00... |
3 | 3.0 | POLYGON ((2084160.000 -3149220.000, 2084160.00... |
4 | 4.0 | POLYGON ((2084190.000 -3149220.000, 2084190.00... |
Plot shapefile over raster
[19]:
# Plot raster data
fig, ax1 = plt.subplots(figsize=[8,8])
rgb(ls_ds.isel(time=image_index), bands=["nbart_red", "nbart_green", "nbart_blue"], ax=ax1)
# Plot our shapefile over top of our raster
ls_poly_gdf.boundary.plot(color=None, edgecolor='k', linewidth = 0.2, ax=ax1)
plt.show()
Write polygonised pixel edges out to file
Write our new segmented Landsat pixel edge shapefile out to a file (change the prefix in the cell below for a customised filename):
[20]:
output_filename = create_filename(ls_ds, image_index, ls_query, prefix="NS_ls")
ls_poly_gdf.to_file(f"{output_filename}.shp")
Write corresponding RGB GeoTIFF image data to file
Write the corresponding geotiff out to a file, for checking purposes:
[21]:
ls_rgb = ls_ds.isel(time=image_index).to_array()
write_cog(ls_rgb, f"{output_filename}.tif", overwrite=True)
[21]:
PosixPath('NS_ls_2018-08-15_153.44_-27.42_153.45_-27.43.tif')
Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Discord chat or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
GitHub.
Last modified: December 2023
Compatible datacube version:
[22]:
print(datacube.__version__)
1.8.6
Tags
Tags: NCI compatible, sandbox compatible,:index:landsat 5, landsat 7, landsat 8, sentinel 2, rgb, load_ard, vectorise, shapefile