2.4 Spatial prediction using LANDMAP
To determine the best model, we applied the Super Learner ensemble
algorithm as implemented in the LANDMAP package v0.0.14 for R v4.1.0,
which provides a strategy for automated mapping by performing spatial
prediction using raster data as predictors (Hengl et al., 2018, 2021;
Polley & Laan, 2010; RStudio Team, 2021)
(https://github.com/Envirometrix/landmap). The Super Learner ensemble ML
algorithm developed by Polley & Laan (2010), estimates the performance
of multiple ML models by using cross-validation. It develops an ensemble
of the optimal weighted averages from the models using the test data
performance (van der Laan et al., 2007). The LANDMAP package has 41
different predictive algorithms available. The methods implemented in
the model ensemble were decision trees-based methods (random forest),
kernel-based methods (support vector machines), methods based on neural
networks, and generalized linear models. We assumed that different
methods describe relationships in our data in a different manner.
We took advantage of the geographical distances in our training data and
used the oblique geographic coordinates technique to assume there is no
collinearity between covariates, as used by previous studies (Møller et
al., 2020). We expressed the uncertainty of our estimates in percentage
form as the range of the 68% prediction intervals divided by their mean
prediction for each pixel, as performed by Viscarra Rossel et al.
(2014). We used a 5-fold spatial cross validation (spCV) approach to
assess the predictive accuracy of our modeling framework (Brenning,
2012; James et al., 2013b; Wadoux et al., 2021). The spCV yields model
independent residuals required to compute map quality indicators such
as: the coefficient of determination (r2) and root
mean square error (RMSE). To compare model accuracy among different
forest types we used Taylor diagrams (Wadoux et al., 2022).