Radial Basis Function (RBF) Neural Network: Effect of Hidden Neuron Number, Training Data Size, and Input Variables on Rainfall Intensity Forecasting

— Mean daily rainfall of more than 30mm could result in flood hazard. Accurate prediction of rainfall intensity could help in forecasting of flash flood and help to save lives and properties. One of the common machine learning techniques in rainfall prediction is Radial Basis Function (RBF) neural network. Rainfall intensity is classified into four categories, i.e. light (<10mm), medium (11-30mm), heavy (31-50mm) and very heavy (>50mm) in this study. The rainfall intensity categories is forecasted using the RBF network model utilizing the daily meteorology data for Kuching, Sarawak, Malaysia. The input vectors being considered for the RBF network model are minimum, maximum and mean temperature (°C), mean relative humidity (%), mean wind speed (m/s), mean sea level pressure (hPa) and mean precipitation (mm) for the year 2009 to 2013. The prime focus in this paper is to analyse the ramification of the training data size, number of hidden neurons, and different input variables (i.e. combination of meteorology data) in influencing the performance of the RBF network model. From this study, it could be concluded that, the factor that would influence the performance of the RBF model is only the input variables used, if and only if the network model is equipped with sufficient number of hidden neurons and trained with adequate number of training data. Another interesting observation from this study is that, the RBF network model produced consistent result throughout the testing using a specific hidden neuron number when the RBF network is retrained and tested.


I. INTRODUCTION
Rainfall intensity refers to the measure of the amount of rain that falls over time and this data has been defined as the most critical flood hazard parameter [1][2][3][4]. Flood is one of the Earth's most common and destructive natural disasters which accounts for the most significant death and financial lost [5]. Therefore, accurate prediction of rainfall intensity could help in lives and properties saving, as well as securing the national economic activities [6].
Rainfall forecasting could be done using the Numerical Weather Prediction (NWP) model, statistical methods and machine learning techniques [6]. By using the numerical solutions of atmospheric hydro thermodynamic equations, NWP, which is also known as physical models, high accuracy could be obtained as long as the complex and meticulous simulation of the physical equations in the atmosphere model is appropriately solved [7,8]. However, this could sometimes lead to unsatisfactory due to the instability of these differential equations [9,10]. Statistical models, which are based on the relationships between the observational relationships, are more straightforward and more comfortable to operate [11]. Yet, the reliance on the stationery relationships between the predictor and predicted variables [12] is the main issue in applying the models in changing climate. Rainfall forecasting is difficult to model as the atmospheric processes is very complicated and nonlinear [13]. Due to this, machine learning techniques are more suitable for rainfall forecasting as machine learnings had shown applauding results in dealing with complex, nonlinear and with predictor variables which are highly correlated [14]. Among the popular machine learning techniques for rainfall predictions are: Radial Basis Neural Network, Generic Programming, Support Vector Regression, M5-Rules, M5-Model trees and k-Nearest Neighbor [15]. An evaluation on these six machine learning methods revealed that Radial Basis Function Neural Network outperforms all other machine learning methods [15].
In our previous work [16], the accuracy of the Backpropagation and Radial Basis Function (RBF) models for rainfall intensity categorization problem was compared. It became apparent that, RBF model, contrasting to Backpropagation neural network model, managed to maintain a consistent result while in terms of accuracy, Backpropagation neural network model is superior. In this research work, impact of different number of hidden neurons and training data, as well as the varied combination of input variables on the RBF model for categorization of the rainfall intensity using meteorology data of Kuching, Sarawak, Malaysia will be analyzed.
In 1988 [17], Broomhead and Lowe introduced the Radial Basis Function (RBF) network model. The overall architecture of a RBF model, as shown in Fig. 1, included three layers, namely, Input, Hidden and Output layers. There is only one single hidden layer in RBF neural network. This is different from a neural network that could possess multiple intermediary layers [18]. The outside information flowed into the network via the input layer. Using nonlinear transformation in the hidden layer, the information from outside is then processed. The output layer combined linear and non-linear radial basis functions [19]. There is an activation function in each hidden neuron. The Gaussian function has been commonly used in the hidden neuron as the activation function. Therefore, RBF model has a nonlinear structure from input to the hidden layer, while the linear structure appears from the hidden layer to the output layer. The quantity of the hidden neurons in the hidden layer determines the complexity and generalization capability of the RBF model [18]. A RBF model is a preferred choice as it could be trained rapidly. Besides, its general applicability is also an important characteristic that makes this a common choice of artificial intelligent model [20]. RBF model was applied to forecast yearly rainfall and the two highest monsoon rainfall months (January and December) for Alexandria City, Egypt [21]. In their research work, a RBF model of single input and single output was constructed. Root Mean Square Error (RMSE) and correlation of coefficient (R 2 ) were calculated as accuracy measures. It was shown that the RBF model managed to achieve RMSE of 27.13 with R 2 = 0.94 for yearly rainfall prediction. For January and December, the RMSE values achieved were 10.61 and 10.85 respectively. The R 2 achieves was 0.89 for January and 0.98 for December.
Using single input and single output RBF neural network model, annual rainfall prediction for three areas in India showed that the RBF neural network model produced different accuracies [22]. The RMSE values achieved ranged from 25.6mm, 63.0mm to 66.4mm for these three areas with a coefficient correlation (R 2 ) of more than 0.9.
As rainfall depends on many weather parameters, i.e. pressure, temperature, wind speed, etc., features selection should be considered in rain predicting algorithms. In [23], a feature selection algorithm for rainfall prediction was proposed and the accuracy of the multi-layer feed-forward neural network, RBF neural network, focused time-delay neural network, and nonlinear autoregressive exogenous input neural network was evaluated. Feature selection is done by sensitivity analysis of the input attributes by removing the non-effective weather attributes. It was noticed that all the evaluated neural network models performed better after the feature selection process.

II. MATERIALS AND METHOD
Sarawak, a state in East Malaysia, has an equatorial climate, which is symbolized by hot temperatures with high humidity [24]. The study area selected is the capital city of Sarawak, i.e., Kuching. Kuching city is situated at latitude and longitude coordinates of 1.5447 and 110.3652 and covers an area of 431 km 2 . The daily historical meteorology parameters, i.e. minimum, maximum and mean temperature (°C), mean relative humidity (%), mean wind speed (m/s), mean sea level pressure (hPa) and mean precipitation (mm) for the year 2009 to 2013 were collected from Malaysian Meteorological Department.

A. Data Pre-processing
Weather data is prone to miss values and the missing values would affect the performance of the underlying neural network model [25]. Therefore, before developing the network model, missing values were deleted from the data set.

B. Data Normalization
After cleaning the data, the data will next go through the normalization process. During data normalization, the data were scaled to -1 and +1. With this process, the capability of the network model developed in handling the data is increased. The calculation of the network model would be done faster and the network model developed will be able to obtain a good result [23]. In this research work, the input data was normalized using the formula below: (1) where: : output of the normalization : maximum normalized value required : minimum normalized value required : data to be normalized : maximum of input data : minimum of input data

C. Different Categories of Rainfall Intensity
Four categories were used to categorize the rainfall intensity data collected in the year 2009 to 2013, i.e. light, moderate, heavy and very heavy rainfall using the intensity range shown in Table I [16].  [26]. Research has also shown that RBF model is optimal when there are many training vectors. The setup properties of the RBF model are shown in TABLE II. The RBF model was trained with the starting of 10 hidden neurons as this number is larger than the total number of input parameters available. An increment of 50 hidden neurons is used for the other two network models tested in this section.

E. Input and Output Data of RBF Neural Network
The six-meteorology data, i.e. minimum, maximum and mean temperature (°C), mean relative humidity (%), mean wind speed (m/s), mean sea level pressure (hPa) and mean precipitation (mm) are used as the input vector for the RBF model. An array of one of these meteorology data will serve as the input nodes of the RBF model. The rainfall classification, which is either light, moderate, bulky or very heavy will serve as the output of the model. The overall architecture of the RBF model used in this research work is illustrated in Fig. 2. Fig. 2 The overall RBF architecture used in this research work.

F. Training and Testing Data
The RBF model learns via the examples given. This process serves as the training of the model. The trained network model is next tested using a set of sensed data. The total of 60 months data obtained from 1st January 2009 to 31st December 2013 was grouped into five groups as shown in Table III. The testing data used was the one-month data from 1st December to 31st December 2013.

G. Experiment Setup
To investigate the impact of different numbers of hidden neurons, training size and input variables on the accuracy of the rainfall intensity forecasting using RBF model, the following experiments are conducted.

1) The number of Hidden Neurons:
The hidden layer, together with the number of neurons in this layer, will influence the accuracy of the neural network model [27]. As RBF model is a single-weight network, the only influencing factor, in this case, will be the number of hidden neurons. Therefore, the testing of the number of hidden layers is excluded. For this section of the experiment, 59 months of data (1st January 2009 to 30th November 2013) will be used for training and the 1-month data from 1st December 2013 to 31st December 2013 will be used for testing (Group 5 in TABLE III). The number of hidden neurons tested was started with a minimum of 10 and in the increment of 50. Three network models of 10, 50 and 100 hidden neurons were developed. Each of these network models was trained and tested 15 times. This set of experiments is run prior to the following two other parts of experiments. At the end of the experiment in this section, the optimal hidden neurons will be obtained and used for the following two parts of experiments.
2) Training Data Size: It is well accepted that a small training data set is not capable to train the network model appropriately [28]. With an increased number of training data, the generalization of the problem underlying would be better modeled by the neural network [16]. However, the performance of the model will converge at one point even with more training data provided [29]. As more training data does not help in increasing the network model performance, the use of more training data will be a waste of resources as well as adding complexity to the network model. Therefore, it would be beneficial to investigate the number of data that is minimal in order to train the RBF model to forecast the rainfall intensity accurately. To investigate these properties, the RBF model with the predetermined hidden neuron obtained from the experiment mentioned above will be used. The division of the training and testing data as shown in Table III

3) Different Combination of Input Variables:
The objective of selecting the correct combination of meteorology data as the input variables is to identify the right predictor variables [30]. The use of the correct predictor variables would reduce the time in training the neural network model. Moreover, the correct predictor variables used will also increase the accuracy of the network performance as well as reducing the complexity of the network model. In this part of the experiment, the network model of m:x:1 will be used, where m is the number of different combinations of meteorology data, and x is the number of optimum hidden neurons obtained from the previous experiment. The output layer will only be one of the four rainfall intensity categories (refer to Table 1). A set of 15 different combinations of meteorology data as shown in Table IV are used in order to verify the optimum input for the RBF model.

A. Number of Hidden Neurons
The performance of the RBF model obtained using the three different numbers of hidden neurons (10, 50 and 100) is shown in Table V.   Fig. 3 and Fig. 4 show the MSE and R2 values obtained for the 15-testing done using the 10, 50 and 100 hidden neurons. From these graphs, the RBF model produced consistent results throughout the 15 testings for the 10, 50 and 100 hidden neurons used. The best MSE and R2 values obtained are 0.2206 and 0.8191 with 10 hidden neurons. Therefore, the optimal hidden neurons used will be 10 hidden neurons in the following experiments.  From Table VI the MSE values obtained using the increasing number of training data do not show a consistent trend. The network performance in terms of MSE deteriorated as the size of the data for training increased from 12 months to 24 months. However, the performance of the model increased when the data used for training increased to 36 and 48 months. The MSE increased a little as the size of the training data raised to 59 months from 48 months. Due to the inconsistent trend of MSE value and as the increment of MSE is only around 0.32% for 48 and 59 months of training data, the optimum training data size to be used is 59 months.

C. Using Different Combination of Meteorology Data
With 10 hidden neurons and 59 months of training data, the contribution of the different combination of meteorology data (Table IV) as the input variable of the RBF network model is given in Table VII.
From Table VII the best combination of meteorology data was Group C, i.e., Daily minimum temperature, Daily Maximum Temperature, Daily Mean Temperature, Daily Mean Relative Humidity and Daily Mean Sea Level Pressure (without Daily Mean Wind Speed). This is better illustrated using the line graph in Fig. 6. With this combination, the RBF neural network produced MSE of 0.1885 with R 2 = 0.8467. The MSE obtained with this combination of meteorology data is 14.5% better as compared when all the six-meteorology data used (MSE = 0.2206, R 2 = 0.8191). In addition, it can also be seen that the use of only the Daily Mean Wind Speed produced the worst forecasting result (MSE = 0.6453, R 2 = 0.1997). This is followed using only

IV. CONCLUSION
This study aims to analyze the properties of the RBF model that influence the accuracy of rainfall intensity forecasting. The experiment results show that the size of the training data and the combination of different meteorology data affect the performance of the RBF model. In this study, the number of hidden neurons does not influence the RBF model. This could be explained by that, 10 hidden neurons are sufficient to model the rainfall intensity problem. Increasing the size of hidden neurons in the RBF model did not seem to affect the accuracy of the rainfall intensity forecast. However, the use of more hidden neurons will increase the complexity of the network model. For rainfall intensity forecasting, the less contribution meteorology data found were Daily Sea Level Pressure and Daily Mean Wind Speed. Another important observation from this study is that, although the RBF neural network model did not produce consistent MSE and R2 results when different training data size was used, the variance between the values obtained was small when the network model is trained with enough training data. In another word, more training data does not seem to enhance the RBF model's performance after the network has learned from sufficient data. Therefore, it could be concluded that the main factor affecting the performance of the RBF model is the different combinations of input variables used. The use of the incorrect combination of contributing variables could deteriorate the performance of the network model