Prediction of air pollution events in São Paulo based on surface
meteorological variables
Abstract
Large urban centers like the Metropolitan Region of São Paulo (MASP) are
impacted by air pollution, especially by Inhalable particle matter
(PM10). Persistent exceedance events (PEE) are defined as exceedance
events that last for many consecutive days and occur simultaneously at
many air quality monitoring stations across the MASP. This study aims to
develop a predictive model for the occurrence of PEE in the MASP based
on surface meteorological variables. Hourly PM10 concentrations from 12
air quality monitoring stations in the MASP between 2005 and 2021 were
provided by the São Paulo State Environmental Agency (CETESB). Daily
data on surface meteorological variables were provided by the IAG/USP
meteorological station. Persistent exceedance events (PEE) were
identified using the criteria: exceedance events that occurred
simultaneously in at least 50% monitoring stations, persisting for at
least 5 consecutive days. PEE occurrence was represented as a timeseries
of a binary variable. The resulting daily dataset had 6204 lines and 13
attributes, without missing values. The dataset was divided into a
training set (80%) and a test set (20%). A logistic regression model
was applied, having the PEE occurrence (positive = 1) as the target
value. The Variance Inflation Factor and the Stepwise Feature Selection
method was applied to obtain an optimized subset of predictors. Model
accuracy was accessed by the ROC curve and by a confusion matrix.
Results indicate that PEE can be satisfactorily predicted by surface
meteorological variables using a logistic regression. As for the next
steps, we intend to extract easy-tocommunicate classification rules,
aiming to support the development of warnings systems for air quality
poor conditions in the MASP.