Machine Learning Approaches to Develop Weather Normalize Models for Urban Air Quality

Sammanfattning: According to the World Health Organization, almost all human population (99%) lives in 117 countries with over 6000 cities, where air pollutant concentration exceeds recommended thresholds. The most common, so-called criteria, air pollutants that affect human lives, are particulate matter (PM) and gas-phase (SO2, CO, NO2, O3 and others). Therefore, many countries or regions worldwide have imposed regulations or interventions to reduce these effects. Whenever an intervention occurs, air quality changes due to changes in ambient factors, such as weather characteristics and human activities. One approach for assessing the effects of interventions or events on air quality is through the use of the Weather Normalized Model (WNM). However, current deterministic models struggle to accurately capture the complex, non-linear relationship between pollutant concentrations and their emission sources. Hence, the primary objective of this thesis is to examine the power of machine learning (ML) and deep learning (DL) techniques to develop and improve WNMs. Subsequently, these enhanced WNMs are employed to assess the impact of events on air quality. Furthermore, these ML/DL-based WNMs can serve as valuable tools for conducting exploratory data analysis (EDA) to uncover the correlations between independent variables (meteorological and temporal features) and air pollutant concentrations within the models. It has been discovered that DL techniques demonstrated their efficiency and high performance in different fields, such as natural language processing, image processing, biology, and environment. Therefore, several appropriate DL architectures (Long Short-Term Memory - LSTM, Recurrent Neural Network - RNN, Bidirectional Recurrent Neural Network - BIRNN, Convolutional Neural Network - CNN, and Gated Recurrent Unit - GRU) were tested to develop the WNMs presented in Paper I. When comparing these DL architectures and Gradient Boosting Machine (GBM), LSTM-based methods (LSTM, BiRNN) have obtained superior results in developing WNMs. The study also showed that our WNMs (DL-based) could capture the correlations between input variables (meteorological and temporal variables) and five criteria contaminants (SO2, CO, NO2, O3 and PM2.5). This is because the SHapley Additive exPlanations (SHAP) library allowed us to discover the significant factors in DL-based WNMs. Additionally, these WNMs were used to assess the air quality changes during COVID-19 lockdown periods in Ecuador. The existing normalized models operate based on the original units of pollutants and are designed for assessing pollutant concentrations under “average” or consistent weather conditions. Predicting pollution peaks presents an even greater challenge because they often lack discernible patterns. To address this, we enhanced the Weather Normalized Models (WNMs) to boost their performance specifically during daily concentration peak conditions. In the second paper, we accomplished this by developing supervised learning techniques, including Ensemble Deep Learning methods, to distinguish between daily peak and non-peak pollutant concentrations. This approach offers flexibility in categorizing pollutant concentrations as either daily concentration peaks or non-daily concentration peaks. However, it is worth noting that this method may introduce potential bias when selecting non-peak values. In the third paper, WNMs are directly applied to daily concentration peaks to predict and analyse the correlations between meteorological, temporal features and daily concentration peaks of air pollutants.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)