Prediction of Hourly Cooling Energy Consumption of Educational Buildings Using Artiﬁcial Neural Network

— Predicating the required building energy when it is in the design stage and before being constructed considers a crucial step for in charge people. Hence, the main aim of this research is to accurately forecast the needed building cooling energy per hour for educational buildings at University of Technology in Iraq. For this purpose, the feed forward artificial neural network (ANN) has been selected as an efficient technique to develop such a predication system. Firstly, the main building parameters have been investigated and then only the most important ones were chosen to be used as inputs to the ANN model. However, due to the long time period that is required to collect actual consumed building energy in order to be employed for ANN model training, the hourly analysis program (HAP), which is a building simulation software, has been utilized to produce a database covering the summer months in Iraq. Different training algorithms and range of learning rate values have been investigated, and the Bayesian regularization backpropagation training algorithm and 0.05 learning rate were found very suitable for precise cooling energy prediction. To evaluate the performance of the optimized ANN model, mean square error (MSE) and correlation coefficient (R) have been adopted. The MSE and R indices for the predication results proved that the optimized ANN model is having a high predication accuracy with 5.99*10 -6 and 0.9994, respectively.


I. INTRODUCTION
Rapid growth in population and industrial production have increased the energy demand in recent decades. The continued high energy consumption worldwide has led to a large number of environmental problems, including air and water pollution. Building sector, however, represents the leading energy consumer by consuming one-fifth of the world's total energy and severely accounts one-third of the global greenhouse gas (GHG) emissions [1]. Among all the required energy in typical buildings the HVAC (heating ,ventilating, and air conditioning) systems consume the most significant amount of electricity in residential buildings [2]. Thus, buildings have become the focus for the decision makers when they try to design and implement policies to effectively reduce energy consumption in buildings. Thus, it is important to develop energy consumption forecasting methods to reduce energy consumption and thereby reduce cost and environmental pollution.
In this regard, many researchers have developed different algorithms based on artificial intelligence techniques for energy prediction since 1990s. For example, Javeed Nizami and Al-Garni [3] deigned an artificial neural network model to relate the electrical energy consumption in Saudi Arabia to population and different weather parameters, such as temperature, humidity, and solar radiation. Six years of data from August 1987 to July 1992 were used for training the model and one year from August 1992 to July 1993 were used for validation purposes. The mean square error (MSE) and determination coefficient (R²) were 0.001 and 0.002, respectively. For performance evaluation the neural network model was compared with the regression model using data that was not used in the training process. The comparison result showed that the neural network model has better predication performance than the regression model with MSE reach to 0.001 for the former and 0.011 for the latter. In another research a number of heating load cases acquired for various buildings, that were ranging from small to large, were used to train a suitable network for heating load forecast [4]. The aim was to produce a network to be able to handle unusual cases with minimum number of inputs, which were type of windows and walls, and areas of windows, walls, partitions and floors. Chaves [5] designed ANN models that were able to predict energy consumption and evolution with an accuracy of up to 99%. Campinas city in Brazil was taken as a case study, the input data were temperature, wind speed, time, day and month. Number of hidden layers and learning rate were changed in order to evaluate their influence on the training result. The specific goal was to compare the performance of different neural networks as an alternative to traditional forecasting methods.
A model for forecasting hourly electric load for a large commercial office building in China based on radial basis function neural network (RBFNN) has been conducted by Mai,et al. [6]. This study has used 4776 data (24 sets each day) for training; input variables include weather condition (hot, cool and cold), day type, time and historical load. Four different evaluation criteria, which are the mean absolute percentage error (MAPE), root mean square error (RMSE), mean bias error (MBE) and R 2 , have been considered for best model selection. Ahmad, et al. [7] has compared the performance of the widely-used feed-forward backpropagation ANN with random forest (RF) in predicting the hourly HVAC electricity consumption of a hotel in Madrid, Spain. The values of performance metrics are calculated considering some or all of the ten input variables (outdoor air temperature, dew point temperature, relative humidity, wind speed, hour of the day, day of the week, month of the year, number of guests for the day, number of rooms booked). They concluded that the ANN performed marginally better than RF. A recently conducted research has aimed to develop prediction models for HVAC related energy saving in office buildings [8]. The data-driven modeling makes use of data gathered from several energy audit reports. These reports contain building and energy consumption data for 56 office buildings in Singapore. Two models are developed using multiple linear regression (MLR) and ANN. The results show that the ANN model is more accurate with mean MAPE of 14.8%. The best combination of variables to achieve this comprise gross floor area, air conditioning energy consumption, operational hours and chiller plant efficiency.
In this research, however, the main objective is to develop an ANN model for accurate estimation of hourly cooling energy of educational buildings. Building analysis program, HAP, is utilized for generating training and testing database based on the climate records in Iraq. After this section there are six sections; Section II shows the main characteristics of the investigated building in this research. Section III discusses the ANN structure and the mathematical equations that it is based upon. The input variables to the ANN model are discussed in Section IV while ANN training and optimizing step is deliberated in Section V. The results of the designed ANN model are then presented in Section VI, and finally, the conclusions of the paper are provided in Section VII.

A. Data Generation and Building Characteristics
To obtain cooling energy data for the considered buildings in this research in order to be employed for ANN model design, a building energy simulation software is used. This software is called HAP (Hourly Analysis Program), which is a powerful building simulation software developed by Carrier Corporation [9,10]. It applies the ASHRAEendorsed transfer function approach for hourly estimation of the building cooling load. It can perform a detailed analysis for a 24-hour day, 12-month a year considering all aspects that could effect in increasing the required cooling load, such as windows, walls, doors, floors, roofs, people and equipment. For accurate estimation results, building specifications and climate conditions, including diffuse solar radiation, external dry bulb temperature, prevailing wind speed and relative humidity, have to be supplied to the program.
However, the building that have been investigated in this research is located in Baghdad at 44.23E longitudes, 33.23N latitude and elevation of 34.1 meters above the mean sea level; specifically, it is the building of Mechanical Engineering Department at University of Technology/Baghdad. It has four floors, including the ground floor (Fig. 1); the main specification of the building is summarized in Table I. Building structure is mainly composed of huge steel beams, bricks and cement mortar. The physical properties of these materials, together with the glazing characteristics, are also used as part of the input data to HAP software. The used outside temperature (dry bulb temperature) is 48 o C, taken from the Iraqi meteorological and seismology organization, and the temperature inside the building is considered to be comfortable at 24 C o and relative humidity (RH) 50%.

B. Artificial Neural Network
Artificial neural network is a modelling algorithm that mimics the problem-solving that happens in human brains [11,12]. It contains an input layer, hidden layer/layers, and an output layer. Each layer contains simple computing neurons, comparable to the neurons in the biological nervous systems, joined to the neurons in the following layer via weighted connections, but the neurons in the same layer are not connected with each other. Pattern recognition, classification and regression analysis represent the main themes that the ANN can be used to solve. For weights calculation and error minimization backpropagation algorithm, which is a computational method that employs the gradient descent optimization to tune the weight of neurons until the gradients are reduced, is commonly utilized. Fig. 2 illustrates a structure of a backpropagation ANN with a single hidden layer. Fig. 2 Architecture of a feed-forward ANN [13] The inputs to the ANN in Fig. 2 are . . … … . , which are selected independent variables that effectively influencing building cooling load and is the predicated cooling load. Additionally, let be the actual value of the cooling load for the corresponding training data set. The corresponding weight to each node in the hidden layer is , whereas the weight corresponding to each node in the output layer is . Number of input nodes, number of hidden layer nodes and number of output nodes in the neural network are , and 1, respectively. However, is the threshold of each node. Training of ANN is firstly started by forward propagation of the input data and hence the output of the hidden ( ) and output ( ) layers will be as follows [12,14].
is the transfer/activation function, hence, the Sigmoid and Tansigmoid (Equation 3) functions are usually used in the hidden layers [15]. Then, the difference between the predicted and expected output vectors is evaluated using the square error function, which is calculated as in Equation 4.
However, the error of the output node can be obtained as in Equation 5.
Hence, the weight and threshold in the output and hidden layers are adjusted using the following formulas.
where ( and ( are the weights correction at iteration ( and * is the learning rate of the neural network.

C. Input Variables to the ANN Model
Many different variables are influencing the cooling energy demand of buildings [16]. These, for instant, include time of a year, building's location, which accordingly effects on local temperature, humidity, wind speed, building material and size, occupants activities, etc. However, using too many variables for designing an ANN model is not always leading to accurate predication results and, of course, complicating the model [17]. Thus, in order to reduce the complexity of the ANN model and also to reduce the required computational efforts and time only six most effective variables are utilized for ANN design, which are chosen based on the reviewed papers [18][19][20] and preliminary analysis using HAP software. These variables are discussed in the following paragraphs.

1) Time
The cooling loads are going to be calculated for each hour within the working/busy period, which is from 8:00 to 15:00 (the working hours in Iraq). These busy hours represent the most effective period that influence the energy consumption in the considered building. Thus, time variable has been used as one of the inputs to the ANN (Table II). However, large buildings, especially universities' building, are not always be fully occupied even during their busy period, which means lighting and other appliance are not continuously turned on throughout this period. Consequently, in order to create accurate training data sets using HAP software, which has the capability to conduct 8760 hour-by-hour (hours of a year) energy analysis [21], for the ANN design it has been assumed that, based on long term visual observation, the building's occupancy ratio is 100% from 8:00 to 12:00, 60% from 12:00 to 14:00 and 50% from 14:00 to 15:00. These percentages have been used in HAP which, accordingly, will vary the lighting and appliance load percentage.

2) Outdoor Dry-Bulb Temperature
Weather conditions that effect on the cooling load include dry and wet-bulb temperatures, humidity, wind speed, etc. After investigating many papers [10,19,22] and analyzing different buildings using HAP software has been concluded that the dry-bulb temperature has a significant impact on the cooling load demand, because of its direct effect on multiple energy sources. The hourly dry-bulb temperatures of the hottest days in June, July, August and September are deemed in generating training data, since this research deals with estimating the required cooling load in Summer. The temperature values are taken from HAP database after defining the building's location parameters, as shown in Table (II). In Table (II) can be seen that July and August are having similar temperatures values. Also, the obtained temperature values have been compared with their peer from the climate database of Baghdad city and was found that they are very close except that of July, which are quite higher than the values that got from HAP. Thus, the obtained temperature values of hottest day in July from the climate database are also considered in training the ANN.

3) Orientation
Building's orientation refers to the direction of the building in which it is based and it affects the building thermal performance by minimizing the direct solar radiation on the building envelop [16]. In the literature has been reported that the selection of building orientation has to be done carefully considering the sun movements according to latitude, and time of day and month [16,23,24]. Thus, this factor has been considered here as an important factor that has to be properly selected by designers. Eight orientations are studied using HAP and their results utilized for training the ANN model. The considered orientations are changing from the North-East (0 o ), the default building's orientation, to the North (315 o ) directions by which the original building is changed by 45 o in the clockwise direction (Table II).

4) Overall Heat Transfer Coefficient
Thermal energy transferred from outside hot ambiance to the conditioned space of the building considers an effective external heat gain [10,25]. In this case, the heat is transferred by conduction through the external walls. Thus, the increase in the external temperature directly affects the internal environment of buildings. Thermal overall heat transfer coefficient (U-Value) for external walls can be one of the leading ways to determine the cooling load in buildings. Reducing the heat transfer coefficient leads to reduce the transferred heat by conduction and consequently decreases the energy demand [26]. This can be achieved by using low heat transfer construction materials in the building's envelope. Four different values of overall heat transfer coefficient for the external walls have been deliberated in training of ANN, as presented in Table II. These values have been calculated based on the commonly used construction materials in Iraq.

5) Space Volume
It has been mentioned earlier that the investigated building in this study has four floors and its air-conditioned space volume is 10137 m 3 , which is estimated by adopting the percentage of the lecture theaters, laboratories, staff and administration offices, since not all spaces in buildings are conditioned to a full comfort level. However, at the University of Technology there are buildings having up to six floors with quite similar percentages of lecture theaters, laboratories and offices. Thus, in order to generalized the proposed ANN model for the other buildings in the university, five and six floors buildings are considered. This means that three values of space volume (Table II) are examined in HAP and then used as inputs to the ANN.

6) Window to Wall Ratio (WWR)
Another crucial constituent that needs to be considered for energy efficiency purposes is window system, due to the important role that it plays in solar gain management and heat exchange processes [20]. Thus, window to wall ratio (WWR), which is the ratio of the glazed surface to the gross facade area, has been considered as one of the important input variables to the ANN, since glass represents the weakest thermal component in the building due to its high U-Value. The building becomes too cold in winter and too hot in the summer if too high WWR is used; this is due to the heat loss, coming sunlight and heat through windows [16,20]. Here, three WWR ratios, shown in Table II, are investigated using HAP and also the got result are utilized in the training process of the ANN model.

D. ANN Model Design (Training, Testing, and Validation)
ANN model is mainly based on machine learning approaches for accurate identification of a particular relationship between the input and output parameters, after being trained with adequate input and output data. Neural network toolbox under Matlab R2017a environment was used to determine the ANN structure. Several models have been tested by changing the number of hidden layers and their neurons, in addition to use different types of activation functions and training algorithms. The used data for developing the ANN model, which have been establish from HAP software for different building configurations, are composed of 2608 patterns, each pattern contains six input (discussed previously in Section III) parameters and one output parameter. The data are divided into two groups; the first group is used for training the ANN model and the second and third groups are for testing and validating it, and they represent 70%, 15% and 15% of the total patterns, correspondingly.
The adopted indices that are used to evaluate the performance of the designed ANN models and then select the most appropriate one throughout this research are the mean square error (MSE) and correlation coefficient 0 , which are calculated using Equations 10 and 11, respectively [17,27]. The average data change is indicated by MSE, which refers to a high predication accuracy if its value is too low while R measures the connection between the two sets of data and its value ranges from 0 (no correlation) to 1 (high correlation). Learning rate (α), which has firstly appeared in Equation  6 previously, is utilized for tuning the weights and biases of an ANN model. A small learning rate value corresponds to an extensive convergence time while increasing its value leads to instability and oscillation in the learning process [27]; thus, it is an important parameter needs be carefully selected. Different learning rate values, ranging from 0.005 to 0.2, are tested to study its effect on the prediction performances of the ANN model and then select the most appropriate value. As illustrated in Fig. 3 and Fig. 4, a learning rate of 0.05 gives the lowest MSE and highest R values, therefore, it is going to be used in training the ANN model. Various backpropagation training algorithms/functions have previously utilized in the literatures. However, each algorithm requires different computational effort and storage amount and there is no one algorithm suits all applications [28]. How the weights are upgraded, to reduce error, and how the learning rate is modified, to reduce the convergence time, represent the main features that distinguishing amongst these algorithms. Generally, selection an algorithm is achieved using a trial process as with learning rate [27,28].
Hence, nine popular training algorithms that are available in Matlab have been tested to decide which one gives the best result for this study. These algorithms include, for instance, Batch gradient descent with variable learning rate (traingdx), Resilient back propagation (trainrp) and Levenberg-Marquardt (trainlm) [28]. Figures 5 and 6 illustrate the prediction performance of the ANN model when different training algorithms were tested. From Fig. 6 can clearly be seen that when 'trainbr' algorithm is used the lowest MSE of 0.0006 is generated, while the highest MSE of 0.043 is achieved when 'traincgf' is applied. Accordingly, the highest R is achieved with lowest MSE of the ANN model, as indicated in Fig. 7 where R is approximately close to 1 for the training algorithm 'trainbr'. Thus, 'trainbr' algorithm, which is based on Bayesian regularization backpropagation [29,30], is employed in the training process of the ANN model.  Hence, the established optimum ANN structure that gives the minimum error value is composed of five layers, as shown in Fig. 7. The first layer is the input layer with six independent inputs, and the fifth layer is the output layer with one dependent output, which is the predicted hourly cooling energy (MW-hr). Between the input and output layers there are three hidden layers; the first hidden layer contains 60 neurons, and the second and third layers have only one neuron. The activation functions of the hidden layers are Tansigmoid, Linear and Tansigmoid, respectively; the activation function of the output layer is Linear. Validation curve of the designed ANN model is presented in Fig. 8, where MSE of training and testing data decreases as the epoch increases and the best performance is recorded at epoch 2487. To validate the performance of the designed ANN model, liner regression analysis between the predicated cooling energy from the network and corresponding HAP targets is investigated. Figure 9 shows the liner regression plot by which the predication results are plotted versus HAP results. The network outputs would be exactly equal the targets if the ANN training is perfect, but practically it rarely be so perfect. Nevertheless, the correlation between the calculated and predicated values is indicated by R in Fig. 9; if its value equals one, this refers to an exact liner relationship between the ANN and HAP results is found. Conversely, there is no relationship between them if R value is close to zero. The R values in this work for training and testing data, as indicated in the corresponding figure, are 0.9995 and 0.99994, which evidences an excellent fit between the target data and ANN output.
Also, for more verification, the obtained testing R values when different data are input to the ANN model compared with R values in [19]. Figure 10 indicates that the got R values in this research at different hours of the day are quite higher than those of [19], which gives a good indication about the accuracy of the designed ANN model.  [19] For purpose of ANN model validation, the established optimum ANN model has been used to predict the required cooling energy when different data sets, that have not been utilized/seen previously in the testing and validation process, are fed to it. Figure 11 displays the predicated cooling energy per hour from the optimized ANN (red square markers) compared with their peers from HAP software (blue triangle markers). X-axis is the data sequence, where 176 data sets are utilized for validation step. Results of both approaches are possessing a very similar behaviour, which confirms the accuracy of the designed ANN model. However, this diversity in the results is due to changing the values of the input variables.
Moreover, Table III contains eight input patterns where the values of U-value, WWR, orientation and space volume of the building have not been changed and only the dry bulb temperature is varied with time. As can be seen, the temperature increases as time passes and, correspondingly, the computed and predicated energy using HAP and ANN model increases too. However, the energy is decreased at 15:00 in spite of reaching the highest temperature, this can be related to decreasing the occupancy ratio (50% at 15:00 as mentioned previously in Section IV). Thus, it can be stated that the developed ANN model can be accurately forecast the required hourly cooling load and therefore can confidentially be used to help the building designers in selecting the best design conditions to minimise the required cooling energy.

IV. CONCLUSION
The commonly used software to estimate the cooling (or heating) energy for buildings, such as HAP, ESP-r and EnergyPlus software, require high amount of information to accurately predict the energy consumption profile for a specific building. Surely, following such an approach could be time consuming and needs high effort from a person with quit good experience. However, using artificial intelligent techniques, such as the ANN, for such types of predication considers a bit easier way, since the highest effort has to be spent just at the model design stage and afterward the designed model can be effortlessly used with less input information. Therefore, the objective of this work was to develop a robust ANN model for hourly cooling energy predication for educational buildings with different construction configurations, buildings of University of Technology in Iraq were taken as a case study. After preliminary consumed energy analysis and investigation of previously conducted work in this field, only the most effective building parameters were select to be utilized as inputs to the ANN model, however, the output of the ANN model was the required hourly cooling energy.
These parameters include time, outdoor dry-bulb temperature, orientation of the building, overall heat transfer coefficient, space volume and window to wall ratio (WWR).  To generate training data for designing the ANN model, HAP software was devoted; by using it wide ranges of input parameter were simulated to imitate different university buildings. The generated data were then used to design the ANN model by utilizing Matlab neural network toolbox. To optimize the structure of the ANN model various ANN arrangements with different number of hidden layers, that having diverse number of neurons, were tested. Two widely applied statistical indicators, named mean square error (MSE) and correlation coefficient (R), were exploited for assessment of the proposed ANN models. Also, two important factors, which are training algorithm and learning rate, were carefully selected based on the calculated MSE and R values. Thus, the considered optimum ANN structure is composed of five layers, input, output and three hidden layers, respectively. The established best training algorithm is Bayesian regularization backpropagation training algorithm with 0.05 learning rate. After establishing the optimum ANN structure, different data sets that have not been seen by the designed ANN were utilized to test it. The results showed that the predicated hourly cooling energy from the ANN model was very accurate with 5.99*10-6 MSE and 0.9994 R.