GRU and XGBoost Performance with Hyperparameter Tuning Using GridSearchCV and Bayesian Optimization on an IoT-Based Weather Prediction System

— Weather is essential to human life, but it is difficult to forecast due to its diverse nature. We evaluated and compared the accuracy of two machine learning algorithms, GRU and XGBoost, in predicting weather patterns. We used GridSearchCV to tune the hyperparameters for the GRU algorithm and Bayesian optimization for the XGBoost algorithm. We used regression to predict weather sensor data and classification to predict rainfall i n the following four days. We then deployed the best-performing model to the cloud server and connected it to the local IoT device with weather sensors in Sedati, Sidoarjo Regency, Indonesia. We conducted tests using data from the BMKG Juanda Sidoarjo and data from the local IoT device. The findings indicated that the XGBoost regression model outperformed the GRU model in the first stage, with an average RMSE of 1.2728125. In comparison, the average RMSE for GRU regression was 1.551666667. In the second stage, however, GRU regression performed better, with an average RMSE of 2.23, while the XGBoost regression had 2.28. In the classification tests, the GRU model had a higher F1 score of 0.88 in the first stage, while the XGBoost classification was 0.86. Both models had the same accuracy of 0.75 when tested with IoT data. However, the GRU classification model was better since it considered the context of the prediction, resulting in a lower likelihood of rain when it was not raining.


I. INTRODUCTION
Accurate weather prediction is essential for many industries and activities, including agriculture, tourism, aviation, and transportation.Weather data is collected using observations, sensors, radar, and satellites from meteorological stations.The Meteorology, Climatology, and Geophysics Agency's current method for developing weather forecasts is called Numerical Weather Prediction (NWP) [1].This method requires several factors, mathematical assumptions, and complex equations, and it requires a thorough understanding of atmospheric dynamics and calculations with many variables and data sets.The development of modern computer hardware has made advances in numerical weather forecasting.The potential for improving weather modeling techniques exists thanks to the availability of large amounts of data and technological advances.We can gain new insights and potentially enhance the existing weather modeling methods by utilizing a deep learning approach using decades of weather observation records.However, a large amount of weather and climate data complicates its analysis.
Researchers have developed a new modeling approach using machine learning that has many benefits over traditional methods.Unlike physical models or numerical weather prediction, machine learning models can provide results in seconds and offer accurate forecasts at a lower cost [2].The purpose of the machine learning approach can be divided into two categories: description and prediction.A descriptive function examines the dynamics of data collection to extract meaningful characteristics.On the other hand, the predictive function looks for patterns in the data that can be used to estimate future outcomes using the variables in the data.These patterns are then used to forecast variables that have not yet been observed.Training and testing data are required for modeling using a machine-learning algorithm.
This study used 20.5 years of data from the BMKG Meteorology Station (Class I, Juanda) in Indonesia.There were issues with the data, including missing values and outliers.These issues may have resulted from data processing issues, data entry mistakes, climatic anomalies, or sensor errors.Therefore, before training the model, it is essential to perform various data preparation steps, such as filling in missing values (interpolation), data cleansing, transformation, and standardization or normalization.
Munandar [3] has used multivariate time series input data using the ARIMA and MLP methods for weather forecasting with solar irradiance targets.MLP regression model singleday output prediction using a multilayer perceptron window method model with data from 3 and 7 days before.In order to predict future data, the ARIMA model considers parameters such as moving average, autoregressive, and data set features.The researchers tested two models and found that the MLP model using deep learning was more effective than the ARIMA model.
Chen et al. [4] have studied the weather prediction of average wind speed, average atmospheric pressure, daily minimum and maximum temperature, relative humidity, and temperature in Shenzhen, China.The researchers employ a fusion model based on LSTM.The method of filtering the correlation coefficients of the components of each variable decomposed by EMD and then recombining the data into an LSTM network maximizes the benefits of EMD in decomposing non-stationary data with seasonal trends.It minimizes the impact of data noise and seasonal fluctuations.The researchers employ a grid search approach for tuning the hyperparameters.
Our proposed research builds on previous work in which we developed a model using Internet of Things technology.This study used a server-side API endpoint to integrate the models deployed on the cloud server with the ESP32 microcontroller.The ESP32 microcontroller was equipped with several sensors to measure meteorological conditions.The prediction model used the data collected from these sensors as input.We also created a web-based surveillance system for this project to allow users to monitor the weather in near real-time and view the weather forecast.

II. MATERIALS AND METHODS
The model development technique used for weather prediction is discussed in this section.Observations of weather at the surface were used to generate forecasts.We proposed two methods: the first used the GRU algorithm, and the second used the XGBoost algorithm.Each algorithm consists of a regression model and a classification model.In this study, we proposed four models consisting of two regression models and two classification models.The regression models were used to predict meteorological element data, including Maximum Temperature (MAX), Minimum Temperature (MIN), Maximum Wind Speed (MXWS), Daily Average Temperature (TEMP), Wind Current Speed (WS), Humidity (RH), Sea Level Atmospheric Pressure (SLP), and Dew Point Temperature (DP).
The classification models were used to classify rainfall prediction (PRCP) for the following four days.We evaluated these models to determine the most reliable models for regression and classification.A locally deployed Internet of Things (IoT) device was used to collect data as inputs for the most reliable models to provide a comprehensive weather forecasting system.An ESP32 microcontroller was connected to BME280, an anemometer, and rain gauge sensors to collect weather data.The server received this data, which was used as input for the prediction model.Since we were not using multilabel classification in this research, the regression model was used to predict the values input into the classification model for the next few days.The models were trained using historical weather data to provide future forecasts.The hyperparameters of the GRU algorithm were tuned using the GridSearchCV method, while the XGBoost algorithm was tuned using Bayesian optimization.

A. Research Methodology
The proposed study comprises three main stages, as shown in Fig. 1.The first stage is data preparation, which involves collecting data sets and pre-processing the data.Data preprocessing includes interpolation, treating outlier values, filtering data using Fast Fourier Transform (FFT), transforming data, and normalizing data.The second stage is modeling with two algorithms, GRU and XGBoost.The hyperparameters of these models need to be tuned to achieve the best performance and accuracy using GridSearchCV and Bayesian optimization.The results are then analyzed and evaluated.The third stage is implementing and integrating the best model with a local IoT device.

B. Collecting Data
The models were trained using data from the BMKG Meteorology Station (Class I, Juanda) in Sidoarjo Regency, Indonesia.The National Climatic Data Center (NCDC) is an institution that gathers and maintains weather data from airport weather stations worldwide, which is made available for download.We used relevant variables from the raw data as input variables for the model, considering the availability of sensors.The data collected between January 2000 and June 2021 was divided into sets for testing and training.
Fig. 2 shows the descriptive statistics of the raw data.The first quartile (Q1) includes data points that are less than 25% of the total, the second quartile (Q2) includes data points that are less than 50% of the total, and the third quartile (Q3) includes data points that are less than 75% of the total, arranged in ascending order.Fig. 2 shows that the minimum and maximum values are significantly different from the values in the first quartile (Q1), the middle quartile (Q2), and the third quartile (Q3).A significant difference between quartiles and minimum-maximum values may indicate skewed data.There may also be outliers in the data.Histograms and boxplots were also used to further check for suspected outliers.Another issue with the dataset is missing values, which were addressed in the data interpolation stage by adding new data points.

C. Pre-Processing Data
The quality of the output of a system, such as a machine learning model, is directly related to the quality of the input data.The "garbage in, garbage out" principle states that if the input data is of poor quality, the resulting output will also be of poor quality.It is crucial to ensure that the input data is high quality to build a reliable and accurate model.The raw data provided by the NCDC had some anomalies and missing values, so we performed pre-processing to clean the data, as shown in Fig. 1.By performing pre-processing, we can improve the quality of the input data and increase the accuracy of the resulting model.
1) Interpolating Data: Data interpolation is a method for filling in missing data points by estimating values based on existing samples.This research employed means imputation, an interpolation using statistical method [5].Missing data points are replaced with the average value from the same day in other years [6].For example, if the sea level pressure record for October 1, 2021, is missing, it would be replaced with the average sea level pressure from prior years.
2) Treating Outlier: A value much smaller or larger than the rest of the data is called an extreme point or outlier.A boxplot is a helpful plot for visualizing data distribution based on five essential calculations: the lowest value, Q1, Q2, Q3, and the highest value.We used boxplots for each variable to identify outliers.The upper and lower bounds of the data set were used to define the cutoff point, as shown by the boxplot.
3) Filtering Data: FFT is a technique that converts data from the spatial or time domain into the frequency domain [7], [8].It uses a complex exponential function to break the data into component frequencies.In contrast, Inverse Fast Fourier Transform (IFFT) transforms data from the frequency domain back into the spatial or time domain.The FFT equation for X(f) of x(t) in continuous time is shown in (1). ⋅ The IFFT equation is written in (2): ⋅ The FFT filter technique converts the data into the frequency domain, reduces or amplifies high frequencies (acting as a low pass filter), and then inverts the filtered result using the IFFT method [9].This filtering step helps to reduce high-point fluctuations, to improve the model's performance.By removing noise from the data, the model can more easily identify underlying patterns and trends, leading to better results [10].This research used a window size (N) of eight to filter the variables, and a two-point sample was taken from the FFT (M) retained value.Fig. 3 shows the distribution of plots after the pre-processing data phase.

4) Transforming Data:
In this step, continuous rainfall data was converted into two categories: no rain (class 0) and rain (class 1).If a rainfall data point was more significant than 0.5 mm, it was labeled as part of the rain class [11].However, the data used in this research was unbalanced, meaning that some classes occurred less often than others.It can cause the model to be biased and perform better for frequent classes than for unusual ones [12].Several solutions to the imbalanced data issue include SMOTE [13].In this research, we did not use SMOTE to handle imbalanced classes because it produces unrealistic sequences for time series data, which does not improve model performance.Instead, we used a weight penalty to address the imbalanced data.It assigns a lower weight to the class with more labels and a higher weight with fewer labels.The classification model naturally gives more weight to the class with more labels, so we needed to weight the loss function to counteract this bias.The formula for estimating penalty weight is shown (3):  4 shows the calculated weights for each class, where the weight is inversely related to the frequency of the data [14].These weight penalties were only applied to the GRU model classification because the XGBoost model can handle imbalanced data without customization.However, using a sampling technique may improve the performance of the XGBoost algorithm [15].

5) Normalizing Data:
Normalizing the data ensured that all the features were on the same scale, which helped the model learn more effectively.Data normalization helps the model to converge faster and produce better results [16].In this study, a minimum-maximum scaler was used for data rescaling.The normalization procedure yields data ranging from 0 to +1 by scaling each feature [17].The minimum-maximum equation is given by ( 4):

D. Modeling
This study used the GRU and XGBoost models for weather prediction.The GRU model is based on recurrent neural networks, while XGBoost is a boosting algorithm that uses decision trees as its base learners.We evaluated the performance of both models by tuning their hyperparameters and comparing their results.

1) Gated Recurrent Unit (GRU) Algorithm:
GRU is a type of recurrent neural network that can be used to improve the performance of vanilla Recurrent Neural Networks (RNN) in predictive modeling tasks.Unlike vanilla RNN, which often suffers from vanishing gradient issues, GRU uses the update and reset gates to prevent vanishing gradients.These gates give GRU a more stable architecture with many hidden layers, improving model performance.The equations are shown in ( 5)- (8).
, -. / 0 1 / ℎ 3 0 4 / (5) 5 -. 6 0 1 6 ℎ 3 0 4 6 (6) Where σ is the sigmoid activation function, Wu, Bu, Wr, and Br are weight matrices and bias vectors for the update and reset gates, xt is the input at time step t, and h(t-1) is the hidden state at the previous time step.The update (ut) and reset gate (rt) control the flow of information in the GRU, allowing it to retain or forget information from previous time steps as needed.The hidden state candidate (ct) is used to generate the hidden state (ht), with the result of the update gate calculation being used to control the effect of the previous hidden state on the hidden state candidate.This method helps the GRU model learn and make accurate predictions [18].To prepare the data for training with a GRU model, we first rearranged it into three-dimensional forms compatible with the GRU compliance layer.The input layer's three-dimensional forms consist of data samples, the number of time steps, and dimensions.Each pattern represents a single sample of data, one measurement point within the sample represents one historical window (time step), and each feature represents a measurement point within the time step.It allowed us to train the GRU model on the pre-processed data [6].
Hyperparameters in machine learning algorithms affect model performance.Tuning hyperparameters can improve prediction accuracy for specific datasets.This study examined the effects of different time-step periods, including two, seven, fourteen, and twenty-one days.GridSearchCV was used to find the optimal hyperparameters for the regression model.The GridSearchCV method for tuning hyperparameters generates and assesses the model for each pair of the provided hyperparameters [19].Table 1 and Table 2 present the hyperparameter values of the regression and classification models using GRU, along with the results that show which hyperparameters were the most reliable.In order to optimize the hyperparameters for the rain category prediction, we manually varied the time step, GRU, Dense units on the hidden layer, and the batch size number.We evaluated the impact on the model performance.The adjustments were made one at a time, and the results from the most successful iteration were used to tune the hyperparameters for the next iteration.This process was repeated until the optimal hyperparameters were found.
We first defined the model architecture to build the GRU models and then used optimization algorithms and loss functions.The regression model was compiled using mean squared loss, while sequence classification model used categorical cross-entropy loss.Loss functions measure how much actual results differ from predicted results.Backpropagation through time is used to adjust the weights and biases in the GRU model to reduce the cost incurred during training.
The optimization algorithm repeatedly adjusts the network weights based on the training data.In this study, we used the adaptive optimization Adam because the default settings are usually practical.We also used a reduced learning rate on plateau function with a starting learning rate 0.001 [20].A large learning rate is desirable at the beginning of training because it can lead to a higher generalization effect.If the metrics are not improving during training, slowing the learning rate can help the algorithm find an optimal solution and avoid oscillations around that solution [21].If the metrics show no improvement after a predetermined number of epochs, we use the ReduceLROnPlateau callback to reduce the learning rate [22].We also used early stopping to prevent overfitting by stopping the training process if the validation loss increases significantly.We can determine the optimal number of epochs by early stopping because the training process will automatically stop at a certain epoch [23].Finally, we saved the best weights during training using the ModelCheckPoint callback, which was then used for deployment.

2) Extreme Gradient Boosting (XGBoost) Algorithm:
XGBoost is a popular machine-learning algorithm often used for regression and classification tasks [24].It is an ensemble learning method that combines the predictions of multiple weak models to create a more accurate final model.XGBoost uses decision trees as its base learners and trains them in an iterative process to improve the model's overall performance.Combining multiple weak models to create a more robust model is known as gradient boosting.This method allows XGBoost to produce highly accurate predictions, making it a popular choice for many machine-learning applications.During each iteration, the error residuals from the preceding model are used to fit the subsequent model.The final prediction is derived by a weighted summation of all the individual tree predictions.The XGBoost algorithm may be thought of as an additive model that is made up of K CART trees, ; is a representation of the predicted value that may be produced by feeding the i-th sample xi into the t-th tree, < = ; is a representation of the prediction outcome of xi, and F is the set space containing all the regression trees [15].The final prediction result formula is given by (9): The objective function (loss function and regularization) at iteration t that has to be minimized is as follows (10): The XGBoost model's hyperparameters were optimized using the Bayesian optimization method.This approach is more efficient than traditional search methods, such as random and grid searches, because it uses a probabilistic model to optimize the process [25].Bayesian optimization uses Bayes' theorem to search global optimization problems efficiently [26].Bayesian optimization involves iteratively searching for the hyperparameters that minimize the objective loss function, using a surrogate function to represent the objective and an acquisition function to guide the search.This method can also reduce computational costs compared to grid search.The optimum hyperparameter values for multilabel regression using XGBoost are shown in Table 3. Determining the range of samples hyperparameters for the classification model on Bayesian optimization processes follows the same pattern as determining the range of samples for the regression model.Table 4 presents the optimal hyperparameter values for the XGBoost classifier, as determined by the Bayesian optimization method.The hyperparameters of the XGBoost model were optimized using Bayesian optimization.It involved searching for the optimal values of the hyperparameters using one hundred assessments of different models for each historical window value (the number of past observations used as features).The Parzen estimators search technique was used to minimize the objective function for the regression and classification models.For the regression model, the objective function was the mean squared loss, while for the classification model, it was the negative accuracy.By minimizing these objective functions, we can improve the performance of the models and achieve better results.The hyperparameters obtained from Bayesian optimization show that the XGBoost model for classification is more complex than the regression model.The large value of depth of each tree, referred to as max_depth, makes the XGBoost model more complex.To prevent overfitting due to the increased complexity of the XGBoost model, we can adjust the values of two hyperparameters: min child weight and gamma.Min child weight is the minimum sum of instance weight in each leaf node, while gamma is the minimum loss reduction to produce a split.By increasing the values of these hyperparameters, we can lower the complexity of the model and prevent it from overfitting to the training data.It can improve the model's performance and help it generalize better to unseen data.Setting the ratio of features used, referred to as colsample_bytree, and the ratio of training instances, referred to as subsample, to a modest number also can reduce model complexity.Tables 3 and 4 also show that the regression model has more boosted trees, or n estimators, than the classification model.The learning rate of the XGBoost model was set to a constant value; when this value decreases, the computation becomes slower but sometimes yields the best optimum solution.The model was trained using the optimal hyperparameter values after the tuning process.Eighty percent of the tabular weather history data were used for training the model and twenty percent for testing.The gradient boosting tree technique is used, where XGBoost uses predictors sequentially and models them based on their predecessors' errors to assign greater weight to betterperforming predictors.The XGBoost model is trained in three stages: raw data, residuals from the previous model, and the sum of the previous models.

E. Evaluation Metrics
To evaluate the performance of the models, we used several metrics.One of these metrics is the Root Mean Square Error (RMSE), a commonly used measure for regression models [27].A smaller RMSE value indicates that the model's predictions are closer to the actual values and therefore have better performance.It calculates the average squared difference between the predicted values and the true values and takes the square root of the result as shown in (11): We compare the predicted values with the original labels to evaluate the performance of the classification model, which allows us to determine the extent to which the model can accurately predict the correct class for each data point [14].We used a variety of metrics to assess the model's performance, including a confusion matrix, accuracy, recall, precision, and F1.A confusion matrix is a tabular representation of possible pairs of predicted and observed values [28].The matrix consists of four possible outcomes: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).The formulae for classification model evaluations can be found given in ( 12)-( 15) [29].

F. Microcontroller and Model Deployment
After selecting the most accurate model, the next step was integrating it with a local IoT device and other components into a complete system.The integrated system allowed for the automatic processing of weather predictions and providing them to users through the application shown in Fig. 5.The sensors connected to an ESP32 microcontroller include a rain gauge, an anemometer, and a BME280.These sensors provide data on rain, wind, sea level atmospheric pressure, temperature, dew point, and relative humidity.The schematic diagram of the microcontroller and sensors used in this research can be seen in Fig. 6.The ESP32 chip includes various communication interfaces such as Wi-Fi, Bluetooth, SPI, and I2C/UART.The availability of two separate processing cores is a significant advantage of the ESP32 chip [30].In this project, ESP32 was used in dual-core mode, with the first core running the rain gauge sensor function and the second core running the anemometer and BME280 sensor functions.This separation of cores was implemented to eliminate delays in reading the rain gauge sensor and improve the overall performance.The weather data JSON was posted to the API endpoint via HTTP every minute using Wi-Fi [31].The API, developed using Python Flask, then connected to the database and performed the necessary prediction processes.After receiving the data, the data was stored in MySQL (a relational database) and Google's Firebase Real-time Database [32].The Firebase Real-time Database (a NoSQL database) was used to store the information provided by the website-based application, as it allows for immediate updates to the data in the application when it is changed [33].Meanwhile, MySQL maintained a record of all sensor readings necessary for weather forecasting, and we used a query program to retrieve the data.The server then fetched the data from MySQL for a certain number of time steps, which were used as predictors for the prediction model.The average value of a day's sensor readings was calculated and used as input for the model at 23:59 WIB.These values were then used to generate the output predictions.

III. RESULT AND DISCUSSION
The model's accuracy was evaluated by comparing its predicted outcomes to the labeled data.The RMSE value was calculated to determine the regression model's effectiveness.A RMSE value indicates that the model's predictions are closer to the actual data, with a zero-value representing perfect accuracy.RMSE is commonly used in weather modeling, air quality studies, and climate studies to measure the accuracy of regression models [34].The evaluation of the classification model is performed separately due to the categorical nature of its predictions.We used a variety of metrics, including the confusion matrix, accuracy, recall, precision, and F1, to assess the model's performance.We can determine the model's ability to accurately predict the different classes by analyzing these metrics.Overall, the evaluation of both the regression and classification models allows us to determine the effectiveness of the proposed model in predicting weather data.

A. Multi-Step Regression Model Testing With a 4-Day Lead Time
This section will present the results of evaluating the tuned model using the Root Mean Squared Error (RMSE) test.The GRU and XGBoost models could forecast the values of various weather parameters, including dew point, maximum temperature, minimum temperature, maximum wind flow speed, sea level atmospheric pressure, temperature, wind flow speed, and relative humidity, for four days in advance.The evaluation was conducted using the weather sensor data collected by the BMKG Meteorology Station (Class I, Juanda) from March 25, 2017, to June 30, 2021.

1) Gated Recurrent Unit (GRU):
We tested the regression model using the results from GridSearchCV's tuning approach of the model hyperparameters by varying the time steps.The optimal hyperparameters for the GRU regression model were as follows: a Bidirectional GRU with 16 units in the first layer, a Dense layer with 512 units in the second layer, a Dense layer with 32 units in the output layer, a time step (historical) of fourteen days, and a batch size of 128.Fourteen-time steps mean that four days of predictions were made using fourteen days of historical data.The neurons in the recurrent neural network receive their input (the predictors) from the previous data based on the number of time steps.Fig. 7 shows the RMSE values for the GRU model's best-fitting model.The chart shows that the RMSE value increases as the prediction time increases, likely due to external factors that were not accounted for during the training process.
The location of an area plays a crucial role in the accuracy of weather predictions.In middle latitudes, weather forecasts can be made up to two weeks in advance [35], but in tropical areas, they can only be made up to four days in advance [36].Long-term forecasting is not reliable as it tends to generate large errors.A high value of RMSE indicates a significant error in the prediction.The weather variables of temperature, maximum temperature, minimum temperature, sea level atmospheric pressure, and average daily dew point all had low RMSE values, indicating that they were accurately predicted.
It could be due to these variables' lack of significant fluctuations, particularly over the four days.On the other hand, the RMSE values for wind flow speed, relative humidity, and maximum wind speed were higher due to their greater fluctuation and volatility.

2) Extreme Gradient Boosting (XGBoost):
The regression model using XGBoost with a seven-day history window, 0.73 subsamples, 0.42 colsample_bytree, seven max_depth, 5.76 min_child_weight, 0.038 learning rate, 438 n_estimators, and 0.0028 gammas yielded the optimal hyperparameters found using Bayesian optimization.Four days of predictions were made using seven-day of historical data.In contrast to the GRU method, which can analyze sequential input models, the XGBoost algorithm predicts using past data as all model features at once.Large dimensions are susceptible to dimensionality issues, which refers to the explosive nature of rising data dimensions and the exponential increase in computational work required for processing.The RMSE values for the optimal XGBoost model on the test data are shown in Fig. 8.
The comparison of the RMSE values between the XGBoost regressor and the GRU regressor model shows that the XGBoost model has a smaller average RMSE value of 1.2728125 for all variables compared to the GRU regression model's value of 1.551666667.It indicates that it performs better in predicting the test data.Fig. 8 also reveals that the RMSE value increases as the duration of the forecasted day increases, and the pattern is similar to the GRU regression model's RMSE pattern.This could be due to external factors not being considered during training.Overall, the XGBoost regressor model is a more effective for predicting weather data because it only uses seven-day historical data.

B. Classification Model Testing
This section presents the results of evaluating the classification models using the GRU sequence classifier and XGboost classifier algorithms.We compared the performance of the models by evaluating their scores from May 16, 2019, to June 30, 2021.

1) Gated Recurrent Unit (GRU):
In this case, the sequence classification model was trained on weather data and was used to make predictions about the likelihood of rain.It was optimized by adjusting various parameters, including the time step, GRU units, number of dense units, and batch size.After testing different combinations of these parameters, the optimal model was found to have a batch size of 128, a time step of two days, and 128 GRU units in the first layer and 64 dense units in the second layer.Using categorical cross entropy as the loss function, the model could make binary predictions about the likelihood of rain.By considering the past two days of data, the model was able to predict the weather for the following day accurately.The results of testing this model are shown in Table 5, where it can be seen that the model achieved F1 score and an accuracy of 0.88.
Additionally, the model was more successful at predicting the likelihood of rain (class 1) than the likelihood of no rain (class 0), as indicated by the higher recall value for class 1.The false positive and false negative rates were 7.97% and 5.14%, respectively, indicating that the model was more likely to predict rain when it was not going to rain.Overall, the optimized sequence classification model performed well in predicting the likelihood of rain.

2) Extreme Gradient Boosting (XGBoost):
The XGBoost classifier model achieved optimal hyperparameters using Bayesian optimization based on the tuning results.These hyperparameters include a maximum depth of 10, a minimum child weight of 2.26, a learning rate of 0.097, a number of estimators of 102, a gamma of 0.85, a subsample of 0.77, a colsample bytree of 0.50, and a fourteen-day history window.Table 5 displays the results of the performance metrics test, which demonstrate that the proposed GRU model outperforms the XGBoost when applied to test data.The XGBoost model's accuracy was 0.86, with a weighted average F1-Score of 0.86.Furthermore, the recall in class 0 (no rain) is lower than in class 1 (rain).The confusion matrix in Fig. 9b has the same spectrum form as the confusion matrix of the GRU model, which has a higher false positive rate of 9.54% than a false negative rate of 4.84%.

C. Evaluation of Model Performance Using Microcontroller Observation Data
Before testing the best model with local IoT data, the sensor values from our device were compared to the data from the sensors at the BMKG Meteorology Station (Class I, Juanda).This approach aims to reduce the error margin by standardizing the sensors' characteristics.The following differential values were obtained for dew point, sea level pressure, temperature, and wind flow speed: 0.94; -1.70; -1.12; -1.48; 1.01.For the next four days, sensor regressions were predicted using two weeks of data (GRU) and one week of data (XGBoost).The classification algorithm was then applied to the predicted sensor data to provide four-day rain category predictions.
1) Gated Recurrent Unit (GRU): Fig. 10 shows the results of sensor prediction based on the most reliable regressor model.The actual data for comparison was collected from the BMKG Meteorology Station (Class I, Juanda).The actual data is in orange, while the forecasted data is in yellow.The data from Fig. 10 can be used to calculate the RMSE and determine how well the model performs in the implementation phase.The average RMSE for the predictions of dew point, maximum temperature, minimum temperature, maximum wind speed, sea level pressure, temperature, wind flow speed, and humidity for the next four days is 0.55, 1.78, 3.04, 2.47, 0.57, 1.56, 2.3, and 5.6, respectively.Since the sea level pressure and dew point variables varied very little over the four-day test, the RMSE values for these variables were quite acceptable.
The RMSE for the humidity variable was the highest among all other variables.The forecasted values for humidity fell between 77.4 and 83.6, while the actual values were between 82 and 88.The model's forecast values had a smaller range than the observed data for humidity.The RMSE values were also relatively high for temperature, minimum, and maximum temperature, indicating that the forecasted values span a wider range than the observed data for these variables.Despite this, the model's forecasts accurately represent the natural environment's features.In their natural states, temperature and humidity are inversely related; when temperatures are high, humidity is often low.The differences in ambient sensor conditions between the BMKG Meteorology Station (Class I, Juanda) sensors and the sensors used in this study may be the reason for the large RMSE results for humidity and other variables.The meteorological station's temperature and relative humidity sensors are protected by radiation shields, which shield them from radiant heat and other environmental influences.
In this research, the sensors were not protected by radiation shields, so they were susceptible to interference from wind, sunshine, and other external factors.Next, we will discuss the results of the classification model.As shown in Table 6, the test using local IoT data yielded three accurate predictions and one incorrect one.The incorrect forecast occurred on June 2, when the model predicted rain but did not rain.However, there was a 32.81% increase in the likelihood that it would not rain on June 2 compared to the previous day.
Additionally, the likelihood of rain on June 3 increased by 9.14% compared to the previous day.The model predicted rain, but there were only 0.55 millimeters of precipitation.On June 4, the probability of precipitation decreased by 8.37%, but the model accurately predicted rainfall of 4 millimeters.Overall, the accuracy of the prediction system using microcontroller input data was 0.75 for predicting the next four days.However, to guarantee the performance of the sequence model, it is necessary to collect more observational data, mainly when it is evaluated with local IoT data as model predictors.The threshold for defining a rain forecast category can also be calculated using the ROC curve metric by examining more data results.
2) Extreme Gradient Boosting (XGBoost): This section discusses the results of testing the XGBoost model using local IoT data.We will begin by examining the performance of the multilabel regression model on the original sensor observation data.Fig. 10 shows a chart of the regression model with sevenday input leading to four-day output.The orange line in the graph represents the observed data, while the green line represents the data predicted by the model.Based on the evaluation, the RMSE for four-day forecasts of dew point, maximum temperature, minimum temperature, maximum wind speed, sea level pressure, temperature, wind flow speed, and relative humidity was 0.31, 1.81, 1.08, 3.95, 0.37, 1.01, 3.62, and 6.16, respectively.These values were obtained by using local IoT data as model predictors.
According to this, the RMSE of sea level atmospheric pressure and the dew point had the lowest RMSE values compared to other variables, which were less than 1.The RMSE for all variable values was smaller than the RMSE obtained by the GRU model, except for humidity, wind flow speed, maximum wind flow speed, and maximum temperature.However, when the average RMSE was calculated for all variables, the RMSE produced by the XGBoost regressor was 2.28, while the RMSE produced by the GRU regressor was 2.23.Although the difference in performance was insignificant, the XGBoost regression model was more feasible to implement because it only needed seven days of history window.In comparison, the GRU model required fourteen days of history window (time step).More history windows make the computation slower because the model must process more predictors as model inputs.
Like the GRU sequence classification model, the XGBoost classifier method predicted three outcomes correctly and one incorrectly.The same incorrect forecast occurred on June 2.Even though there was no actual precipitation on June 2, the likelihood of rain increased by 20.57% compared to the previous day.This is worse than the GRU sequence classification model, which predicted a lower probability of rain than the previous day.Despite this, the accuracy of the XGBoost classification model using local IoT data can still be calculated at 0.75.However, further observation data is desirable to confirm the accuracy when applied to locally collected IoT data.We also proposed a rainfall classification model, where the GRU model had a weighted F1 score of 0.88 and an accuracy of 0.88.The XGBoost model had a weighted F1 score of 0.86 and an accuracy of 0.86.When tested with microcontroller data from local IoT devices, the GRU model performed better because it could use context to make more accurate predictions about the likelihood of rain.However, we plan to improve future research by using more local IoT data and data from other weather sensors, such as UV and wind direction sensors.

Fig. 2
Fig. 2 Descriptive statistics of raw data

Fig. 3
Fig. 3 Distribution plot after pre-processing data

Fig. 4
Fig. 4 Weighting value estimation for each class

Fig. 5
Fig. 5 Display of the website-based application

Fig. 7
Fig. 7 RMSE of GRU regressor over four days

TABLE V THE
PERFORMANCE OF GRU AND XGBOOST CLASSIFICATION MODEL

TABLE VI RAIN
PROBABILITY AND PRESENT WEATHER USING GRU AND XGBOOST