Prediction of soil organic carbon content in arid and semi-arid regions
of China using machine learning-based modelling and SHAP interpretation
Abstract
Understanding soil organic carbon (SOC) content is essential for
environmental sustainability and carbon neutrality. Traditional methods
of predicting SOC content are often difficult and imprecise. However,
with the development of machine learning techniques, the ability and
accuracy of predicting SOC content have greatly improved. This study
evaluates various machine learning models, including Random Forest (RF),
Support Vector Machine (SVM), Partial Least Squares Regression (PLSR),
Convolutional Neural Network (CNN), Artificial Neural Network (ANN), and
Extreme Gradient Boosting (XGBoost)—were used to predict SOC content.
The research was conducted in north-central and north-western China,
covering diverse land uses and climatic conditions. A comprehensive
dataset was utilized, including soil samples, DEM data, rainfall and
temperature data, soil moisture, erosion modulus, and NDVI. Ten-fold
cross-validation was used for each model and metrics such as coefficient
of determination (R2), mean absolute error (MAE), mean square error
(MSE), root mean square error (RMSE), and the ratio of performance to
interquartile distance (RPIQ). The XGBoost model outperformed the other
models, achieving R2=0.715, MAE=0.424, MSE=0.707, RMSE=0.781, and
RPIQ=2.565. The land use types included forest, grassland, and farmland.
Air temperature and soil pH were identified as the key factors
influencing SOC content, both showing a negative correlation with SOC
content. For unutilized land, the key factors affecting SOC content were
NDVI and soil pH. Additionally, SHapley Additive exPlanations (SHAP)
were introduced to explain the model’s predictions, demystifying the
machine learning ”black box” and improving the credibility of the
predictions. This work demonstrate the potential of machine learning
models to accurately predict SOC and identify key factors influencing
SOC levels, providing new insights into soil management and climate
change mitigation.