Using Multiple Regression Model and RNN for Imputing the Missing Values of PM10 Datasets

Moamin Amer Hasan Alsaeegh, Osamah Basheer Shukur


The missing value in time series data is a scientific problem that should be solved by imputing these values by following some statistical techniques. This problem is more complex due to the missing values that existed in the dependent (response) variable. Particular matter (PM10) is a time series dataset used to scale air pollution as a dependent variable, while there are many types of pollutants used as independent variables. Malaysian datasets of PM10 and several climate pollutants are examined in this study. This study aims to impute the missing values for different missing rates in a dependent variable with minimum error. In this paper, the independent variables were supposed completed while the missing values have been replaced in different rates and different distributions within the dependent variable. Multiple linear regression (MLR) has been used as a traditional method to impute the different missing values of PM10. Recurrent neural network (RNN) is combined with MLR and used to impute the missing values of PM10. The results reflected that th hybrid method outperformed MLR for imputing the missing values of PM10. In conclusion, the hybrid method MLR-RNN can be used to impute the missing values of PM10 accurately compared to other traditional methods.


multiple linear regression; MLR; missing values; recurrent neural network; RNN.

Full Text:



Hardle W., Simar L., " Applied multivariate statistical analysis ", Berlin and Louvain-la-Neuve, Germany, 2003.5-Neil H.Timm," Applied multivariate analysis ",Springer verlag New York, Inc, 2002.

Dubrov A., "Applied multivariate data analysis ", Statistica, Moscow, 1992.

GBD Factors Collaborators. Global, regional, and national comparative risk assessment of 79 behavioral, environmental and occupational and metabolic risks or clusters of risks, 1990-2015 a systematic analysis for the Global Burden of Disease Study 2015.Lancet.2016 oct, 388(10053):1659-1724.

Sharaf, H. K., Ishak, M. R., Sapuan, S. M., & Yidris, N. (2020). Conceptual design of the cross-arm for the application in the transmission towers by using TRIZ–morphological chart–ANP methods. Journal of Materials Research and Technology, 9(4), 9182-9188.‏

Luo, Y., Cai, X., Zhang, Y., & Xu, J. (2018). Multivariate time series imputation with generative adversarial networks. In Advances in Neural Information Processing Systems (pp. 1596-1607).‏

Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). Brits: Bidirectional recurrent imputation for time series. Advances in Neural Information Processing Systems, 31, 6775-6785.‏

Suo, Q., Yao, L., Xun, G., Sun, J., & Zhang, A. (2019, June). Recurrent Imputation for Multivariate Time Series with Missing Values. In 2019 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 1-3). IEEE.‏

Sharaf, H. K., Ishak, M. R., Sapuan, S. M., Yidris, N., & Fattahi, A. (2020). Experimental and numerical investigation of the mechanical behavior of full-scale wooden cross arm in the transmission towers in terms of load-deflection test. Journal of Materials Research and Technology, 9(4), 7937-7946.‏

Nassar, L., Saad, M., Okwuchi, I. E., Chaudhary, M., Karray, F., & Ponnambalam, K. (2020, October). Imputation impact on strawberry yield and farm price prediction using deep learning. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3599-3605). IEEE.‏

Saad, M., Nassar, L., Karray, F., & Gaudet, V. (2020, October). Tackling Imputation Across Time Series Models Using Deep Learning and Ensemble Learning. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3084-3090). IEEE.‏

Kim, C., Son, Y., & Youm, S. (2019). Chronic disease prediction using character-recurrent neural network in the presence of missing information. Applied Sciences, 9(10), 2170.‏

Yoon, J., Zame, W. R., & van der Schaar, M. (2018). Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Transactions on Biomedical Engineering, 66(5), 1477-1490.‏

Sangeetha, M., & Kumaran, M. S. (2020). Deep learning-based data imputation on time-variant data using recurrent neural network. Soft Computing, 1-12.‏

Khan, Z., Khan, S. M., Dey, K., & Chowdhury, M. (2019). Development and evaluation of recurrent neural network-based models for hourly traffic volume and annual average daily traffic prediction. Transportation Research Record, 2673(7), 489-503.‏



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development