Air Quality Forecasting in London Using Random Forest Regression
DOI:
https://doi.org/10.54097/w7yyaw27Keywords:
Air quality forecasting; PM2.5; Random Forest.Abstract
This paper explores short-term forecasting of daily PM2.5 and AQI levels in London during 2024, employing Random Forest (RF) regression as the primary modeling approach. Air quality records were utilized to compile a set of predicted values consisting of the 6 pollutants recorded (PM10, NO₂, SO₂, CO, and O₃) and daily average values obtained via data pre-processing. Temporal variables such as day-of-year and weekend indicators were incorporated to account for seasonal patterns and human activity effects. The RF model, trained on multivariate inputs, demonstrates strong predictive accuracy, with results showing close alignment between predicted and observed values. Feature importance analysis reveals PM10 as the dominant predictor for PM2.5, supported by known emission and dispersion dynamics. For future work, RF’s performance is also extended to benchmark with other regression approaches such as AutoRegressive Integrated Moving Average (ARIMA), Long Short-Term Memory (LSTM) and XGBoost in terms of its differences of prediction accuracy, model explainability and computation speed. The findings contribute to data-driven urban pollution monitoring, offering practical insights for short-term forecasting and public health early warning systems.
Downloads
References
[1] Kumar U and Jain V K. ARIMA forecasting of ambient air pollutants (O₃, NO, NO₂ and CO). Stochastic Environmental Research and Risk Assessment, Stoch Environ Res Risk Assess. 2010, vol. 24, no. 5, pp. 751–760.
[2] Mou J, Zhao X, Fan J, Yan Z, Yan Y, Zeng D, Luo W, and Fan Z. Time series prediction of AQI in Shenzhen based on ARIMA model. Journal of Environmental Hygiene, 2017, vol. 7, no. 2, pp. 102–107.
[3] Bhatti U A, Yan Y, Zhou M, Ali S, Hussain A, Huo Q, Yu Z, and Yuan L. Time series analysis and forecasting of air pollution particulate matter (PM2.5): An SARIMA and factor analysis approach. IEEE Access, 2021, vol. 9, pp. 29222–29235.
[4] Iskandaryan D, Ramos F, and Trilles S. Air quality prediction in smart cities using machine learning technologies based on sensor data: A review. Applied Sciences, 2020, vol. 10, no. 7, p. 2401.
[5] Ren W, Liu J, Deng Y, Zou Y, Liu J, and Wang S. Research and prediction on air pollution coupling based on RF regression and LSTM neural networks: A case study of Beijing. Science and Technology Innovation, 2025, no. 10, pp. 22–26.
[6] Danesh Yazdi M, Kuang Z, Dimakopoulou K, Barratt B, Suel E, Amini H, Lyapustin A, Katsouyanni K, and Schwartz J. Predicting fine particulate matter (PM2.5) in the greater London area: An ensemble approach using machine learning methods. Remote Sensing, 2020, vol. 12, no. 6, p. 914.
[7] Li H. Prediction of air quality based on CEEMDAN-LSTM-ARIMA model. M.S. thesis, Chongqing University, 2023.
[8] Wang J. Prediction of Beijing air quality index based on SARIMA-LSTM-BP neural network. M.S. thesis, Shandong University of Finance and Economics, 2024.
[9] Kang J, Zou X, Tan J, Li J, and Karimian H. Short-term PM2.5 concentration changes prediction: a comparison of meteorological and historical data. Sustainability, 2023, vol. 15, no. 14, p. 11408.
[10] Gil-Alana L A, Yaya O S, and Carmona-González N. Air quality in London: evidence of persistence, seasonality and trends. Theoretical and Applied Climatology, 2020, vol. 142, pp. 103–115.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

