Yi Luo

and 3 more

Soil organic carbon (SOC) as an indicator of soil quality, plays a dual role in stabilizing oasis ecosystems and regulating carbon sequestration in arid, lakeside environments. However, the accurate estimation of SOC using visible-near-infrared (VNIR) spectral data is limited by spectral redundancy and high dimensionality. This research enhances SOC estimation accuracy by combining wavelet analysis and machine learning in the lakeside oasis of Bosten Lake in Xinjiang. SOC content was measured for each sample (82 samples from the 0–20 cm depth), and their corresponding VNIR spectral data were obtained. The hyperspectral reflectance data were processed using continuous wavelet transform (CWT) and discrete wavelet transform (DWT). The successive projections algorithm (SPA), Boruta, and competitive adaptive reweighted sampling (CARS) algorithm identified relevant spectral bands to develop SOC estimation models based on partial least squares regression (PLSR), backpropagation neural networks (BPNN), and random forest (RF) algorithms. The results revealed that CWT offered superior noise reduction performance, particularly at low decomposition scales (1–5), achieving a 19.21% improvement in noise suppression compared to DWT. The optimal CWT-based model showed 23.20% improvement in residual prediction deviation (RPD) compared to the DWT-based counterpart. Feature selection algorithms significantly improved estimation accuracy, with enhancements of up to 49.04% in the determination coefficient (R 2) and 58.23% in RPD. Among the algorithms, CARS provided the highest improvement, followed by SPA and Boruta. Thus, the combination of CWT-1-CARS and the RF algorithm showed the strongest nonlinear modeling performance. This configuration achieved calibration metrics of R 2 = 0.79, root mean square error (RMSE) = 2.57, and RPD = 2.23 to outperform the original spectral models, with improvements of 63.3% over PLSR (RPD = 1.84) and BPNN (RPD = 1.91). The spatial interpolation analysis showed 91.3% consistency with field-measured SOC values, validating the model’s practical reliability. The most sensitive spectral response bands for SOC were primarily located in the visible range (401–504 nm) and the near-infrared range (1,638–2,369 nm). This study establishes a robust technical foundation for accurate estimation of SOC, for precise ecological monitoring, and sustainable management of arid, lakeside oases.