11. SUSTAINABLE CITIES AND COMMUNITIES

Prognosis of air quality index and air pollution using machine learning techniques – Nature

Prognosis of air quality index and air pollution using machine learning techniques – Nature
Written by ZJbTFBGJ2T

Prognosis of air quality index and air pollution using machine learning techniques  Nature

 

Report on Machine Learning for Air Quality Index Prediction and its Contribution to Sustainable Development Goals

Executive Summary

Air pollution presents a critical obstacle to achieving global public health and environmental sustainability targets, directly impacting several Sustainable Development Goals (SDGs). This report details a study that leverages machine learning (ML) to enhance the prediction of the Air Quality Index (AQI), thereby supporting informed policy-making for sustainable urban environments. Five ML models—Gaussian Process Regression (GPR), Ensemble Regression (ER), Support Vector Machine (SVM), Regression Tree (RT), and Kernel Approximation Regression (KAR)—were developed to predict AQI using a reduced set of pollutants. Based on feature importance analysis, only three pollutants (PM2.5, PM10, and CO) were selected from a comprehensive dataset, streamlining the monitoring process to be more cost-effective and scalable. The findings demonstrate high predictive accuracy, with the GPR, ER, SVM, and RT models achieving over 96% accuracy. The GPR model proved superior, with the lowest Root Mean Square Error (RMSE). This research provides a robust framework for efficient air quality monitoring, contributing directly to SDG 3 (Good Health and Well-being), SDG 11 (Sustainable Cities and Communities), and SDG 9 (Industry, Innovation, and Infrastructure).

Introduction: Aligning Air Quality Monitoring with Sustainable Development Goals

The Global Health and Urban Challenge of Air Pollution

Air pollution is a leading environmental threat to human health, responsible for millions of premature deaths annually. This crisis undermines progress towards SDG 3 (Good Health and Well-being) by increasing the incidence of respiratory and cardiovascular diseases. The challenge is particularly severe in the rapidly urbanizing cities of developing nations, where industrial growth and traffic congestion exacerbate pollutant concentrations. This directly contravenes SDG 11 (Sustainable Cities and Communities), which aims to make cities inclusive, safe, resilient, and sustainable. Target 11.6 specifically calls for reducing the adverse per capita environmental impact of cities, with a focus on air quality. Effective and accessible air quality monitoring, communicated through metrics like the Air Quality Index (AQI), is therefore essential for safeguarding public health and fostering sustainable urban development.

Leveraging Innovation for Sustainable Solutions

Traditional air quality monitoring systems are often resource-intensive, posing a significant barrier for developing countries. This study addresses this gap by applying advanced machine learning techniques, a key component of SDG 9 (Industry, Innovation, and Infrastructure), to create a more efficient and cost-effective AQI prediction model. By identifying the most influential pollutants (PM2.5, PM10, and CO) from a real-world dataset from Gazipur, Bangladesh, the research proposes a simplified framework that reduces the need for extensive data collection. The objective is to develop a scalable and accurate predictive tool that empowers authorities in resource-constrained environments to implement timely and effective air pollution control strategies, thereby advancing multiple SDGs simultaneously.

Methodology for a Cost-Effective AQI Prediction Framework

Data Acquisition and Preparation

The study utilized a publicly available dataset of hourly air pollution data collected in Gazipur, Bangladesh, from January 1 to December 31, 2022. The original dataset contained concentration levels for six pollutants:

  • Particulate Matter (PM2.5)
  • Particulate Matter (PM10)
  • Carbon Monoxide (CO)
  • Nitrogen Dioxide (NO2)
  • Sulfur Dioxide (SO2)
  • Ozone (O3)

The data underwent a rigorous preparation process, including the removal of outliers and normalization using the min-max scaling technique to ensure all features were on a comparable scale for ML model training. The dataset was then partitioned, with 80% used for training and 20% for testing the models.

Feature Selection for Optimized Monitoring

To create a model that is both accurate and practical for real-world application, a Random Forest algorithm was used to evaluate the importance of each of the six pollutants in predicting the AQI. The analysis identified the three most influential features, which were selected for the final models:

  1. PM2.5 (Highest Importance)
  2. PM10
  3. CO (Lowest of the three, but still significant)

By focusing on these three key pollutants, the proposed framework significantly reduces the operational costs and complexity associated with air quality monitoring, directly supporting the implementation of SDG 11.6 in cities with limited resources.

Machine Learning Model Development and Evaluation

Five distinct regression models were developed and compared for their ability to predict AQI from the three selected pollutants:

  • Gaussian Process Regression (GPR)
  • Ensemble Regression (ER)
  • Support Vector Machine (SVM)
  • Regression Tree (RT)
  • Kernel Approximation Regression (KAR)

Model performance was systematically evaluated using standard statistical metrics, including the Coefficient of Determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). To ensure robustness and prevent overfitting, a tenfold cross-validation technique was applied during the training phase.

Results and Analysis: Performance of Predictive Models

Seasonal Pollutant Analysis and AQI Trends

The analysis of air quality parameters in Gazipur revealed distinct seasonal patterns. The AQI was highest during the winter months (December-February), corresponding with elevated concentrations of PM2.5, PM10, and CO. This is attributed to meteorological conditions like temperature inversions that trap pollutants. Conversely, the AQI improved during the monsoon season (June-September) due to rainfall washing pollutants from the atmosphere. These findings underscore the dynamic nature of air pollution and highlight the periods of greatest risk to public health, providing critical information for health advisories and aligning with the protective aims of SDG 3 and SDG 11.

Comparative Performance of Machine Learning Models

The comparative analysis confirmed the high efficacy of using a reduced feature set for AQI prediction. Four of the five models demonstrated excellent performance on the testing dataset:

  • SVM: R2 = 0.9937
  • GPR: R2 = 0.9913
  • ER: R2 = 0.9908
  • RT: R2 = 0.9643
  • KAR: R2 = 0.8236

While SVM achieved the highest R2 value in testing, the GPR model demonstrated the best overall balance of high accuracy and low error, with an RMSE of 1.219 during testing. The strong performance of GPR, ER, and SVM validates the feasibility of accurate, low-cost AQI prediction. The poor performance of the KAR model indicates its limitations for this specific application.

Conclusion and Implications for Sustainable Urban Development

Key Findings and Model Efficacy

This study successfully demonstrates that an accurate and cost-effective AQI prediction model can be developed using only three key pollutants: PM2.5, PM10, and CO. The research confirmed that machine learning models, particularly Gaussian Process Regression (GPR), can predict AQI with over 99% accuracy, providing a powerful tool for environmental management. The findings support a shift towards more streamlined and economically viable monitoring systems, which are crucial for deployment in developing urban areas where resources are often limited.

Contribution to Sustainable Development Goals

The outcomes of this research provide tangible support for several Sustainable Development Goals:

  • SDG 3 (Good Health and Well-being): By enabling accurate and timely AQI forecasting, the model allows public health officials to issue warnings and implement protective measures, reducing the health burden of air pollution.
  • SDG 9 (Industry, Innovation, and Infrastructure): The study showcases the application of innovative technologies like machine learning and IoT to solve pressing environmental challenges, fostering resilient and sustainable infrastructure.
  • SDG 11 (Sustainable Cities and Communities): The research directly addresses Target 11.6 by providing a practical and affordable solution for cities to monitor and manage air quality, making urban environments safer and more sustainable.
  • SDG 13 (Climate Action): Monitoring pollutants like CO, which are often co-emitted with greenhouse gases from sources such as traffic and industry, contributes to broader efforts to mitigate climate change.

Limitations and Future Directions

The study’s primary limitations include its reliance on a one-year dataset from a single city and the exclusion of meteorological variables. Future work should aim to validate the models across different geographical regions and longer timeframes, incorporate meteorological data to enhance predictive accuracy, and apply formal statistical tests to further assess the significance of performance differences between models.

Analysis of Sustainable Development Goals in the Article

1. Which SDGs are addressed or connected to the issues highlighted in the article?

The article on air pollution and its prediction using machine learning addresses several Sustainable Development Goals (SDGs). The primary connections are with goals related to health, sustainable urban environments, and technological innovation.

  • SDG 3: Good Health and Well-being: The article directly links air pollution to significant public health challenges. It states, “Air pollution is one of the most significant threats to public health globally, contributing to over seven million premature deaths each year.” It also mentions specific pollutants like PM, O3, NO2, SO2, and CO causing “serious health problems.” This establishes a clear connection to ensuring healthy lives and promoting well-being.
  • SDG 11: Sustainable Cities and Communities: The research focuses on air quality in an urban context, specifically mentioning challenges in “developing countries such as Egypt and Bangladesh” driven by “rapid urbanization, industrial growth, [and] traffic congestion.” The study uses a dataset from Gazipur, Bangladesh, and aims to provide tools for urban planners to create strategies to address air pollution, which is central to making cities inclusive, safe, resilient, and sustainable.
  • SDG 9: Industry, Innovation, and Infrastructure: The core of the study is the development and application of innovative technologies—specifically, five machine learning models—to solve a critical environmental problem. The article highlights the use of an “IoT-based monitoring system” and aims to develop a “cost-effective AQI prediction” model. This aligns with SDG 9’s emphasis on building resilient infrastructure, promoting sustainable industrialization, and fostering innovation.

2. What specific targets under those SDGs can be identified based on the article’s content?

Based on the issues discussed, the following specific SDG targets are relevant:

  1. Target 3.9: By 2030, substantially reduce the number of deaths and illnesses from hazardous chemicals and air, water and soil pollution and contamination.

    • Explanation: The article’s introduction explicitly states that air pollution contributes to “over seven million premature deaths each year” and that in Cairo, it was “linked to approximately 19,200 premature deaths in 2017 alone.” The entire study is premised on monitoring and predicting air pollution to devise control strategies, which directly contributes to reducing the health burden mentioned in this target.
  2. Target 11.6: By 2030, reduce the adverse per capita environmental impact of cities, including by paying special attention to air quality and municipal and other waste management.

    • Explanation: The study is centered on urban air quality. It analyzes data from Gazipur, Bangladesh, a city facing challenges from urbanization and industrial growth. The development of a model to predict the Air Quality Index (AQI) is a direct effort to “pay special attention to air quality” and provide “policymakers and urban planners” with tools to manage the environmental impact of cities.
  3. Target 9.5: Enhance scientific research, upgrade the technological capabilities of industrial sectors in all countries, in particular developing countries…

    • Explanation: The article’s main objective is to develop a “machine learning-based framework for accurate and cost-effective AQI prediction.” By comparing five different ML models (GPR, ER, SVM, RT, KAR) and proposing a scalable solution for “resource-constrained environments,” the study directly contributes to enhancing scientific research and upgrading technological capabilities for environmental monitoring in developing nations.

3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?

The article mentions and implies several indicators that align with the official SDG indicators for the identified targets.

  • For Target 3.9:

    • Indicator 3.9.1: Mortality rate attributed to household and ambient air pollution. The article directly references this by citing statistics on “premature deaths” caused by air pollution, such as “4.2 million due to outdoor air pollutants.” This metric is a core justification for the research.
  • For Target 11.6:

    • Indicator 11.6.2: Annual mean levels of fine particulate matter (e.g. PM2.5 and PM10) in cities (population weighted). The study is fundamentally based on measuring and predicting the concentrations of air pollutants. It explicitly uses data for “PM2.5, PM10, CO, NO2, SO2, and O3” to calculate the Air Quality Index (AQI). The analysis identifies “PM2.5, PM10, and CO” as the most influential pollutants, directly aligning with this indicator. The AQI itself serves as a composite indicator derived from these pollutant levels.
  • For Target 9.5:

    • Implied Indicator: Development and adoption of new technologies. While not a formal SDG indicator, the article’s entire methodology serves as a proxy for progress. The development, testing, and comparison of five machine learning models for a “scalable and efficient model for real-world air quality monitoring” is a tangible measure of enhancing technological capabilities for sustainable development. The study’s conclusion that the GPR model is highly effective (R2 of 0.9913 in testing) demonstrates successful innovation.

4. Summary Table of SDGs, Targets, and Indicators

SDGs Targets Indicators
SDG 3: Good Health and Well-being 3.9: Substantially reduce deaths and illnesses from hazardous chemicals and air pollution. 3.9.1: Mortality rate attributed to household and ambient air pollution (mentioned as “premature deaths” in the article).
SDG 11: Sustainable Cities and Communities 11.6: Reduce the adverse per capita environmental impact of cities, paying special attention to air quality. 11.6.2: Annual mean levels of fine particulate matter (PM2.5 and PM10) in cities (the article’s core data includes PM2.5, PM10, and CO concentrations used to calculate the Air Quality Index).
SDG 9: Industry, Innovation, and Infrastructure 9.5: Enhance scientific research and upgrade technological capabilities, particularly in developing countries. Implied: Development and application of advanced technologies (Machine Learning, IoT) for cost-effective and scalable environmental monitoring.

Source: nature.com

 

Prognosis of air quality index and air pollution using machine learning techniques – Nature

About the author

ZJbTFBGJ2T