Fig. 3. Correlations (a) and standardized RMSE (b) between observations and historical and AMIP simulations from CMIP6 (1901-2014, solid) and those simulations from CMIP5 that outperform the CMIP6 historical simulations (1901-2003, dotted-dashed, legend entries include “5”). Dots and stars denote the statistic between the MMM and observations, while the curves denote the bootstrapping pdfs. The dotted grey curves display the bootstrapping pdfs for the same statistics applied to a MMM over the CMIP6 piC simulations, and the grey dashed lines mark the one-sided p=0.05 significance level given by the piC distribution. Colored dotted curves and dashed lines show the piC distributions associated with those subsets of forcing agents for which the piC distribution differs noticeably from those of the other subsets of forcing agents.
CMIP5’s AA (r = 0.24, sRMSE = 0.97) and ALL (r = 0.37, sRMSE = 0.95) MMMs achieve significance in both metrics – a fact that, in isolation, is consistent with the suggestion that AA may explain observed variability but underestimate its magnitude. Instead, in CMIP6, AA (r = 0.04, sRMSE = 1.01) and ALL (r = 0.04, sRMSE = 1.02) do not perform statistically better than noise, and GHG performs significantly worse (r = -0.17, sRMSE = 1.03). The additional years included in the CMIP6 simulations (2004-2014) cannot explain the entire deterioration of performance between CMIP5 and CMIP6: even when restricted to CMIP5’s time period, CMIP6 ALL and AA simulations both perform worse than CMIP5 in both metrics (r = 0.07 and sRMSE = 1.00 for AA, r = 0.13 and sRMSE = 0.99 for ALL). Most of the remaining deterioration in performance for AA is due to reduced drying in the 1970s in CMIP6. In CMIP6, NAT (r = 0.19, sRMSE = 0.98) is the only forcing that performs significantly well. We conclude that aside from episodic responses to volcanic eruptions, the ensemble of coupled CMIP6 simulations has no significant skill in simulating historical Sahel rainfall in response to external forcing.
As in CMIP5, the simulated forced component of precipitation changes in CMIP6—given by the MMM—has a much smaller variance than observations (note the amplification of the right ordinates in Figure 2). However, the poor performance of the CMIP6 simulations makes it clear that amplifying the simulated forced component will not help explain observed precipitation.
For simulated atmospheric and oceanic IV (\(\overrightarrow{a}\) and\(\overrightarrow{o}\)) to explain observed precipitation variability, it is not enough that observed yearly Sahelian precipitation anomalies fall within the range of individual simulations (not shown)—the latter must also match the distinctive low-frequency power of the observations. In Figure 4 we compare the power spectra (PS) of piC simulations (colored brown to turquoise by model climatological rainfall) to the observed PS (solid black) and the PS of the ALL-residual (observations minus the ALL MMM, dotted-dashed black). In the observations and the residual, variance at periods longer than about 20 years (low-frequency) is roughly 5 times as large as the high-frequency variance. Low-frequency variability in the piC simulations is smaller than, and inconsistent with, either observed or residual variability. Moreover, it is similar in magnitude to simulated high frequency variability, suggesting that IV in simulated Sahel rainfall derives mostly from atmospheric (\(\overrightarrow{a}\)), rather than oceanic (\(\overrightarrow{o}\)), IV, or that simulated oceanic IV is too white (Eade et al. 2021). Because the shape of the spectrum is wrong, even a bias correction that inflates simulated internal variability would not bring simulations and observations into alignment.