Comparison of Regression and Deep Learning Approaches in Modeling Time Series to Predict Air Pollutant Concentration in City of Tehran

Document Type : Research Paper

Authors

Imam Khomeini International University, College of Engineering, Qazvin, Iran

Abstract

The rapid growth of urbanization and the global population have resulted in climate change, air contamination, and various human health problems. Thus, estimating air pollution indices has become important to environmental science studies. With relevant data increasingly available, machine learning frameworks have been proposed as a particularly useful method to predict air pollution. Based on four years of Tehran’s neighborhood air pollution data analysis, this paper proposes three machine learning approaches to predict NO2 and CO concentration: Autoregressive Integrated Moving Average (ARIMA), Long Short-Term Memory Networks (LSTM), and Multiple Linear Regression (MLR). This paper compared the ability of the ARIMA, LSTM, and MLR machine learning methods to forecast the daily concentrations of NO2 and CO at Punak air quality monitoring station, from 2017 to 2020. By applying four performance measurements, the ARIMA model displays the worst performance among the three models in all datasets with RMSE values of 47.39 and 1.29, and 0.012 and 0.01 for NO2 and CO respectively. The LSTM and MLR models achieve the best forecasting result with RMSE = 17.6 and 6.41, MAE = 10.59 and 4.33, = 0.458 and 0.46, and RRSE =1.06 and 1.10 for NO2 forecasting and RMSE = 0.42 and 0.32, MAE = 0.24 and 0.25, 0.96 and 0.98, and RRSE = 0.43 and 0.44 for CO forecasting.

Keywords


Abdullah, S., Ismail, M., & Fong, S. Y. (2017). Multiple Linear Regression (MLR) models for long term Pm 10 concentration forecasting during different monsoon seasons. Journal of Sustainability Science and Management, 12(1), 60–69.
 
Athira, V., Geetha, P., Vinayakumar, R., & Soman, K. P. (2018). DeepAirNet: Applying Recurrent Networks for Air Quality Prediction. Procedia Computer Science, 132, 1394–1403. https://doi.org/10.1016/j.procs.2018.05.068
 
Brunekreef, B., & Holgate, S. T. (2002). Air pollution and health. Lancet, 360(9341), 1233–1242.
https://doi.org/10.1016/S0140-6736(02)11274-8
 
Connor, J. T., Martin, R. D., & Atlas, L. E. (1994). Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2), 240–254. https://doi.org/10.1109/72.279188
 
(2002). The Ongoing Challenge of Managing Carbon Monoxide Pollution in Fairbanks, Alaska. In The Ongoing Challenge of Managing Carbon Monoxide Pollution in Fairbanks, Alaska. National Academies Press. https://doi.org/10.17226/10378
 
Dey, S., Sibanda, P., Gupta, S., & Chakraborty, A. (2009). Analyzing and predicting the criteria pollutants over a tropical urban area by using statistical models. 2.
 
Dragomir, C. M., Voiculescu, M., Constantin, D. E., & Georgescu, L. P. (2015). Prediction of the NO2 concentration data in an urban area using multiple regression and neuronal networks. AIP Conference Proceedings, 1694(2).
https://doi.org/10.1063/1.4937255
 
(2001). Latest findings on national air quality: 2000 status and trends. EPA Publications, 454 K-01–002, 2–26.
 
Figueiredo Filho, D. B., Silva Júnior, J. A., & Rocha, E. C. (2011). What is R2 all about? Leviathan (São Paulo), 3, 60. https://doi.org/10.11606/issn.2237-4485.lev.2011.132282
 
Frank R. Giordano, W. P. F. and, & Horton, S. B. (2000). A course in mathematical modeling. In Richard Stratton (Vol. 37, Issue 05). Richard Stratton.
 
Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM. IEE Conference Publication, 2(470), 850–855. https://doi.org/10.1049/cp:19991218
 
Hamzaçebi, C. (2008). Improving artificial neural networks’ performance in seasonal time series forecasting. Information Sciences, 178(23), 4550–4559. https://doi.org/10.1016/j.ins.2008.07.024
 
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
 
Jenkins, B. O. X., Approach, A., Predicting, T. O., Fdi, N. E. T., & In, I. (2011). Box-Jenkins ARIMA approach to predicting net FDI inflows in Zimbabwe. 87737.
 
Juhos, I., Makra, L., & Tóth, B. (2008). Forecasting of traffic origin NO and NO2 concentrations by Support Vector Machines and neural networks using Principal Component Analysis. Simulation Modelling Practice and Theory, 16(9), 1488–1502. https://doi.org/10.1016/j.simpat.2008.08.006
 
Kumar, K., Yadav, A. K., Singh, M. P., Hassan, H., & Jain, V. K. (2004). Forecasting daily maximum surface ozone concentrations in brunei darussalam—an ARIMA modeling approach. Journal of the Air and Waste Management Association, 54(7), 809–814. https://doi.org/10.1080/10473289.2004.10470949
 
Mohammadi-Zadeh, M. J., Karbassi, A., Bidhendi, G. N., Abbaspour, M., & Padash, A. (2017). An Analysis of Air Pollutants’ Emission Coefficient in the Transport Sector of Tehran. Open Journal of Ecology, 07(05), 309–323.
https://doi.org/10.4236/oje.2017.75022
 
Roberts, S., Arseneault, L., Barratt, B., Beevers, S., Danese, A., Odgers, C. L., Moffitt, T. E., Reuben, A., Kelly, F. J., & Fisher, H. L. (2019). Exploration of NO 2 and PM 2.5 air pollution and mental health problems using high-resolution data in London-based children from a UK longitudinal cohort study. Psychiatry Research, 272(2), 8–17. https://doi.org/10.1016/j.psychres.2018.12.050
 
Safriet, D. W., & Brooks, G. (1989). Estimating air toxics emissions from coal and oil combustion sources.
 
Srivastava, C., Singh, S., & Singh, A. P. (2019). Estimation of air pollution in Delhi using machine learning techniques. 2018 International Conference on Computing, Power and Communication Technologies, GUCON 2018, 304–309. https://doi.org/10.1109/GUCON.2018.8675022
 
Torkian, A., Bayat, R., Najafi, M. A., Arhami, M., & Askariyeh, M. H. (2012). Source Apportionment of Tehran ’ s Air Pollution by Emissions Inventory. International Emission Inventory Conference, August, 41. www.epa.gov/ttnchie1/conference/ei20/finalprogram.pdf
 
Wark, K., Warner, C. F., & Davis, W. T. (1998). Air Pollution: Its Origin and Control (3rd Edition) (3rd Editio).
 
Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/cr030079
Xayasouk, T., Lee, H. M., & Lee, G. (2020). Air pollution prediction using long short-term memory (LSTM) and deep autoencoder (DAE) models. Sustainability (Switzerland), 12(6). https://doi.org/10.3390/su12062570
 
Zhang, G. P. (2007). A neural network ensemble method with jittered training data for time series forecasting. Information Sciences, 177(23), 5329–5346. https://doi.org/10.1016/j.ins.2007.06.015
 
Zhao, J., Deng, F., Cai, Y., & Chen, J. (2019). Long short-term memory - Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere, 220, 486–492. https://doi.org/10.1016/j.chemosphere.2018.12.128
 
Zohdirad, H., Montazeri Namin, M., Ashrafi, K., Aksoyoglu, S., & Prévôt, A. S. H. (2022). Temporal variations, regional contribution, and cluster analyses of ozone and NOx in a middle eastern megacity during summertime over 2017–2019. Environmental Science and Pollution Research, 29(11), 16233–16249. https://doi.org/10.1007/s11356-021-14923-1.