Medium-term electricity demand forecasts (days–weeks ahead) are essential for scheduling, maintenance, and hedging, yet challenging due to multi-scale seasonality and weather-driven nonstationarity. We propose a Multi-Scale Transformer (MSTr) that fuses diurnal, weekly, and seasonal context via scale-specific pooling and learned softmax gating. The model is trained in a leak-safe, direct- H protocol for horizons H∈{24 ,168 ,336 ,672} using only information available at decision time. On hourly data, MSTr consistently outperforms strong baselines (LightGBM, LSTM, and Seasonal/Naive) across RMSE, WAPE, and sMAPE, with the largest gains at 168–336 h, where weekly/seasonal signals dominate. Explainability analyses (SHAP, partial dependence, and temperature–load sensitivity) show that MSTr captures meteorological and calendar effects more faithfully, while ablations confirm the utility of positional encoding, multi-scale pooling, and learned gates. A block-bootstrap evaluation and Diebold–Mariano tests indicate that MSTr’s improvements are statistically significant at key horizons. The approach is efficient, integrated into existing pipelines, and supports transparent reporting through gate weights and seasonal slices.