Method
A 6 x 6 km region of interest (ROI) was selected within each floodplain
area to reasonably manage large data volumes. In summary, remote sensing
imagery, including airborne LiDAR, Sentinel-1 and Sentinel-2, was
obtained to develop a method to derive FTCC. LiDAR data was used to
provide a three-dimensional representation of the ROI’s and derive
high-resolution FTCC data. Representing in-situ data, the LiDAR
derived FTCC was used to calibrate a Random forest model based on
Sentinel-1 and -2. As the Sentinel dataset is open-access, freely
accessible and available globally, the technique can be implemented over
regional or continental scales if training data (direct in-situFTCC measurement, or as here, a LiDAR surrogate) are available.
LiDAR data
LiDAR (Light Detection And Ranging) remote sensing uses pulsed light
waves from an airborne laser to measure distance of Earth objects from
an aircraft via reflectance of light (Dubayah and Drake, 2000). The
returned wavelengths and time combined, allow three dimensional
representations of the reflected surface to be constructed. When LiDAR
is collected over natural environments, 3-D reconstruction of canopy
structure provides fine resolution field representation of the study
location. Airborne LiDAR data for each ROI was obtained from ELVIS
(https://elevation.fsdf.org.au/), a spatial data portal. Each tile
covered 2 km × 2 km. Acquisition date was September 2009 and 2015 for
Yanga and Barmah, respectively, and was performed with two different
LiDAR sensors (Leica ALS50-II -Yanga; Trimble AX60 - Barmah).
The Yanga dataset was collected 0.50 km above the earth surface with a
swath width of 1.6 km and swath overlap of 20%. Similarly, the Barmah
dataset was measured at a height of 0.85 km with a swath width of 1 km
and swath overlap of 30%. Sensors recorded an average point spacing of
around 4.0 and 4.4 points per m2 for Yanga and Barmah,
respectively.
Retrieving vegetation height and fractional tree canopy
cover from LiDAR
data
Tree structural information for both ROIs was retrieved from LiDAR tile
data. Each tile includes a dense collection of ‘points’ based on
reflectance time and georeferencing information, such as x and y
coordinates, point heights and point return ‘types’ (related to time
each point returns to the sensor and height of the object). FUSION
software (http://forsys.cfr.washington.edu/fusion.html) was
implemented to partition a digital surface model and digital terrain
model from the raw LiDAR data (Boehm et al. , 2013). The digital
surface model was applied to approximate elevation of each grid cell.
The digital terrain model was used to estimate elevation of the ground
surface. A canopy height model (Koukoulas and Blackburn, 2005) was
created by subtracting the digital terrain model from the digital
surface model at 1 m spatial resolution. The canopy height map was then
converted from point clouds to pixels. A FTCC product was derived from
the canopy height model using all LiDAR points reflected from 2 m above
the ground surface (referred to as LiDAR FTCC). As the objective of the
study was to map tree canopy cover, smaller shrubs and bushes were
excluded (Equation 1). R package ForestTools was applied to identify
dominant treetops and tree crown radius from the canopy height model. A
moving window was created to scan the canopy height model and tag
treetops that depended on the highest point in the window. The
‘watershed’ method was implemented to outline tree crowns (Beucher and
Meyer, 1993). Finally, from the canopy height model, tree number was
counted as well as the tree height and crown radius.
\(FTCC=\frac{numbers\ of\ pixels\ (height>2m)}{\text{total\ pixel\ numbers\ at\ given\ area}}\)(Equation 1.)
Sentinel-1 data
Sentinel-1A and 1B satellites carry C-band Synthetic Aperture Radar
(SAR) sensors. They are part of the European Space Agency’s Copernicus
mission, and were launched in 2014 (Sentinel-1A) and 2016 (Sentinel-1B).
They are the first globally acquiring SAR sensors, providing
dual-polarized (VV and VH) C-band SAR images with a 12-day repeat path
frequency. Over land, Interferometric Wide imaging mode is the default
automatic imaging mode, with a nominal sensing resolution of 20
(Azimuth) by 5 m (Range).
A Sentinel-1 Ground Range Detected image acquired in May 2016 was
obtained from Sentinel Australia Regional Access (SASA;
https://copernicus.nci.org.au/). Processing was performed with the
Sentinel Application Platform (SNAP) and included updating the orbital
metadata, thermal noise removal, border noise removal, calibration,
range doppler terrain correction and conversion to decibel (Filipponi,
2019). VV and VH bands were converted to Sigma Nought backscattering
coefficients, which includes a compensation for Line-Of-Sight variations
in Range.
Sentinel-2 data
The Sentinel-2 satellites consist of two satellites, launched in 2015
(Sentinel-2A) and 2017 (Sentinel-2B), respectively. Each carry
multispectral sensors with 13 spectral bands recording visible,
near-infrared and short-wave infrared regions of the electro-magnetic
wave spectrum. The revisit time of Sentinel-2 is 10 days.
Sentinel-2 Level 1C (L1C) top-of-atmosphere data with less than 10%
cloud cover, collected in May 2016, was downloaded from SASA. The
original tile (100 km × 100 km) was cropped to the ROIs. Sen2cor was
applied to obtain bottom of atmosphere reflectance, converting data from
L1C to atmospherically corrected L2A (Main-Knorn et al. , 2015).
Ten bands were selected and these represent vegetation functional and
structural information (Verrelst et al. , 2012). Bands include
B2-B8, B8a, and B11-B12 from Sentinel-2. All bands where relevant, were
resampled to 20 m x 20 m (Table 1) .
Random forest regression
analysis
Random forest regression, proposed by Breiman (2001), is an assembling
machine learning algorithm that can be applied to high-dimensional
spatial dataset analysis. Random forest starts with a random selection
of subset data from a training dataset, then creates decision trees for
each sample. A ‘voting’ method is then implemented for the prediction of
each decision tree. The most voted prediction is selected as the final
result among all individual decision trees (Gislason et al. ,
2006).
Random forest regression was employed to determine the relationship
between LiDAR FTCC and Sentinel-1 and Sentinel-2 bands. Before applying
the Random forest regression, Sentinel-1 and 2 bands and the canopy
height model were resampled to the same spatial resolution of 20 m. VV
and VH bands from Sentinel-1 were resampled to 20 m based on Sentinel-2
image resolution using bilinear interpolation. In order to retrieve FTCC
from the canopy height model at Sentinel-2 spatial resolution, a 20 m
fishnet grid was created. FTCC was calculated based on equation 1 from
the canopy height model for each fishnet grid. With resampling of
Sentinel-1, Sentinel-2 and LiDAR FTCC, 733,800 pixels for both Yanga and
Barmah ROIs were created for Random forest training and validation.
Three models were created using ‘randomForest’ (R package) which
included single models trained and predicated for both the Yanga and
Barmah ROIs (RFYanga and RFBarmah where
RF is Random forest) and a model that combined data from both ROIs
(RFall). For each model, the dataset was split into 70%
training and 30% validation by random sampling. A ten-fold
cross-validation was implemented to keep the best performance of each
Random forest model.
Statistical analysis
Root Mean Square Error (RMSE) was applied to analyse the performance of
the Random forest predictor model. The RMSE is defined as;
\(RMSE=\ \frac{\sqrt{\sum_{i=1}^{n}{(x_{ret,i}-\ x_{pre,i})}^{2}}}{N}\)(Equation 2.)
The \(x_{ret,i}\) and \(x_{pre,i}\) are LiDAR FTCC and FTCC predicted by
the Random forest model, respectively. N is the number of pixels used
for prediction. The coefficient of determination was applied to check
the relationship between LiDAR FTCC and predicted FTCC for the ROIs.
Hence, higher R2 indicates the regression model fits
the LiDAR FTCC, and lower RMSE indicates better predictions of the
Random forest models. Data processing, statistical analysis and
visualisation were conducted in R scientific computation environment (R
core team version 3.6) and associated packages obtained from the
comprehensive R archive network (http://cran.r-project.orj).