This paper presents the Autonomous Local Air Quality Monitoring System (ALAMS), an IoT-enabled, power-efficient, and fully autonomous platform for real-time and predictive air-quality assessment in smart cities. ALAMS operates through solar-powered microcontroller units equipped with a comprehensive suite of gas and particulate sensors, capable of monitoring key pollutants such as PM 2. 5, PM 10, CO 2, SO 2, NO, CH 4, NH 3, and O 3, along with temperature, humidity, pressure, and light levels. A major advancement in this study lies in the development of a Physics-Informed and Explainable Convolutional Neural Network–Transformer (PI-CNN-T) model optimised by Multi-Objective Bayesian Optimisation (MOBO). Unlike conventional CNN-BO architectures, the proposed model integrates physical dispersion constraints derived from Gaussian plume equations directly into the training process, ensuring physically consistent and realistic pollutant forecasts. The hybrid CNN–Transformer structure captures both spatial and temporal pollutant dynamics, while SHAP-based feature attribution and attention visualisation enhance interpretability and model transparency. Furthermore, the MOBO framework jointly optimises prediction accuracy and energy efficiency, enabling deployment on low-power IoT nodes. When evaluated on real-world data collected in London, the PI-CNN-T achieved an R 2 of 0.972 and an RMSE of 18.6 µg/m 3 for PM 2. 5 prediction, outperforming baseline CNN-BO models by 14 percent. The proposed architecture therefore represents a significant methodological and practical advance, combining domain-informed learning, explainable AI, and computational optimisation into a unified framework for sustainable environmental monitoring and healthcare applications in smart cities.