Abstract
Fine particulate matter with a size less than 2.5 µm (PM2.5) is
increasing due to economic growth, air pollution, and forest fires in
some states in the United States. Although previous studies have
attempted to retrieve the spatial and temporal behavior of PM2.5 using
aerosol remote sensing and geostatistical estimation methods the coarse
resolution and accuracy limit these methods. In this paper the
performance of machine learning models on predicting PM2.5 is assessed
with Linear Regression (LR), Decision Tree (DT), Gradient Boosting
Regression (GBR), AdaBoost Regression (ABR), XG Boost (XGB), k-nearest
neighbors (KNN), Long Short-Term Memory (LSTM), Random Forest (RF), and
support vector machine (SVM) using PM2.5 station data from 2017-2021. To
compare the accuracy of all the nine machine learning models the
coefficient of determination (R2), root mean square error (RMSE),
Nash-Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and
percent bias (PBIAS) were evaluated. Among all nine models the RF and
SVM models were the best for predicting PM2.5 concentrations. Comparison
of the PM2.5 performance metrics displayed that the models had better
predictive behavior in the western United States than that in the
eastern United States.