loading page

Predicting PM2.5 Concentrations Across USA Using Machine Learning
  • P. Preetham Vignesh,
  • Jonathan H Jiang,
  • Pangaluru Kishore
P. Preetham Vignesh
University High School
Author Profile
Jonathan H Jiang
Jet Propulsion Laboratory, California Institute of Technology

Corresponding Author:jonathan.h.jiang@jpl.nasa.gov

Author Profile
Pangaluru Kishore
University of California, Irvine
Author Profile

Abstract

Fine particulate matter with a size less than 2.5 µm (PM2.5) is increasing due to economic growth, air pollution, and forest fires in some states in the United States. Although previous studies have attempted to retrieve the spatial and temporal behavior of PM2.5 using aerosol remote sensing and geostatistical estimation methods the coarse resolution and accuracy limit these methods. In this paper the performance of machine learning models on predicting PM2.5 is assessed with Linear Regression (LR), Decision Tree (DT), Gradient Boosting Regression (GBR), AdaBoost Regression (ABR), XG Boost (XGB), k-nearest neighbors (KNN), Long Short-Term Memory (LSTM), Random Forest (RF), and support vector machine (SVM) using PM2.5 station data from 2017-2021. To compare the accuracy of all the nine machine learning models the coefficient of determination (R2), root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), root mean square error ratio (RSR), and percent bias (PBIAS) were evaluated. Among all nine models the RF and SVM models were the best for predicting PM2.5 concentrations. Comparison of the PM2.5 performance metrics displayed that the models had better predictive behavior in the western United States than that in the eastern United States.