Performance Evaluation of the NASNet Convolutional Network in the Automatic Identification of COVID-19

— This paper evaluates the performance of the Neural Architecture Search Network (NASNet) in the automatic detection of COVID-19 (Coronavirus Disease 2019) from chest x-ray images. COVID-19 is a disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) that produces in patients fever, cough, shortness of breath, muscle pain, sputum production, diarrhea, and even sore throat. The virus spreads through the air, and to date, is expanding as a global pandemic. There is no vaccine, and it is fatal to approximately 2-7% of the infected population. Among the clinical and paraclinical characteristics of infected patients, nodules have been identified in images of chest x-rays that can be visually identified, producing a simple, rapid, and generally available method of identification. However, the rapid spread of the disease means that there is a lack of specialized medical personnel capable of identifying it, which is why automated schemes are being developed. We propose the tuning of a NASNet-type convolutional model to automatically determine the initial state of a patient in the triage process or intervention protocol of health care centers. The neural network is trained with public images of cases positively identified as patients infected with the virus and patients in normal conditions without infection. Performance evaluation is also done with real images unknown to the neuronal model. As for performance metrics, we use the function of loss of cross-entropy (categorical cross-entropy), the accuracy (or success rate), and the MSE (Mean Squared Error). The tuned model was able to correctly classify the test images with an accuracy of 97%.


I. INTRODUCTION
Coronaviruses are RNA viruses that cause respiratory diseases of varying severity, from the common cold to deadly pneumonia [1], [2]. Traditionally, chest x-rays have become a fast, inexpensive, widely available, and highly reliable tool for identifying cases of pneumonia in patients. In these radiological images, it is possible to identify the pulmonary nodules characteristic of pneumonia and therefore have allowed the effective diagnosis of other coronaviruses such as Severe Acute Respiratory Syndrome (SARS) [3], Middle East Respiratory Syndrome (MERS) [4], [5] and Respiratory Distress Syndrome (ARDS) [6]. From published literature related to current research on COVID-19, it has been determined that this disease damages the lung parenchyma in a similar way to other coronavirus infections [7].
The World Health Organization (WHO) declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC) on 30 January 2020 and a pandemic on 11 March 2020 [8], [9]. Due to the airborne capacity of the virus to spread, its novelty and lack of knowledge, and the ability of people to move from one country to another, there is now evidence of the disease in much of the planet (more than 60 countries as of March 1, 2020) [10].
The diagnosis of COVID-19 is made by Real-Time Polymerase Chain Reaction (RT-PCR) which detects the nucleotides of the virus [11]. This test in cases of low viral load produces false negatives, which is why it is necessary to perform the test in protocols over several days [12]- [14]. Besides, it requires the taking of samples, their handling, processing, and transport. Due to the economic conditions in many countries, access to this test is very limited, and each test has response times of up to 24 hours or more [15]. The equipment needed to take a chest X-ray is available in most medical centers worldwide, so it is often considered a routine test [16]. An x-ray image beside the equipment only needs access to electrical power and a radiology specialist for interpretation [17].
The clinical and paraclinical features of COVID-19 infection have been documented in research publications since early 2020 [11]. Among the indicators of the infection is the fact that patients present abnormalities in chest CT (Computed Tomography) and x-rays images, with most having bilateral involvement [18]. Patients referred to intensive care normally present in these images multiple bilateral lobes, while others identified with the virus but not treated in intensive care due to less severity of their cases, were detected in the images bilateral lobes of lower intensity but with similar characteristics.
Patients infected with COVID-19 with pneumonia show specific patterns in the images of chest x-rays that serve to identify the presence of the virus. Unfortunately, these patterns are not easy to identify with the naked eye [14], [19], [20]. Specialized radiologists can distinguish the COVID-19 from images with high specificity, but with moderate sensitivity [14]. The speed of the spread of this new virus is strongly dependent on the ability and speed of identifying infected patients reliably (low false-positive rate). Local authorities in each country are currently facing this problem to reduce the spread, and therefore the saturation of their medical facilities and the number of deaths related to the virus [21].
Immediate solutions are aimed at adequate infection control, both in sick patients and in measures of isolation from the general population. However, these measures must go hand in hand with the use of tools that allow the timely detection of the disease, both to stop the spread of the virus and to ensure the care required by patients affected by COVID-19 [22], [23].
We propose the use of a neuronal model based on convolutional networks trained explicitly as an AI tool for the rapid and low-cost detection of individuals with COVID-19 [24]- [29]. For the model, we selected the NASNet deep network by Google Brain, due to its high performance against architectures like Inception-v2, Inception-v3, Xception, ResNet, and Inception-ResNet-v2 [30]- [32]. The model was optimized for a dataset with X-ray images taken from patients who have tested positive for COVID-19 and healthy people. Performance metrics applied to a set of images unknown to the model yielded results far superior to those reported in the literature [33], [34].
The following part of the paper is arranged in this way. Section 2 presents preliminary concepts and problem formulation. Section 3 illustrates the design profile and development methodology. Section 4 presents the preliminary results. And finally, in Section 5, we present our conclusions.

II. MATERIALS AND METHOD
The research group has developed some classification models based on convolutional networks for use in assistive robots. Most of these applications require visually determining a user condition that triggers a certain behavior in the robot. Previous research has shown the ability of our models to identify specific patterns in x-ray images. With the great social and economic impact that the COVID-19 has had, particularly in societies with limited resources, the concern to develop a similar tool for the initial screening of patients with suspected infection was born. In principle, our image classification models could be used to develop such a tool.
In the specific case of the diagnosis of COVID-19, RT-PCR is used in conjunction with two or three test protocols to identify the nucleotides of the virus. However, this test has limited availability in developing and poor countries and requires long intervals (one or more days) to deliver conclusive results. These are logistical problems that can increase exponentially in places with limited isolation, where infected individuals with no symptoms can take the infection rate to levels impossible to manage by medical facilities.
In these cases, it would be desirable to have a system capable of working with existing equipment in medical centers, as this would reduce the investment and implementation time. It would also be desirable that this system could discriminate quickly and with a high rate of reliability to those patients infected with the virus, from those who are not. This process would not only allow focusing the attention on the infected patients but also to isolate them more quickly and reduce the virus dispersion rate.
The vast majority of medical centers in the world have xray equipment. This technology has been used for a medical diagnosis for over 100 years, and with various technologies can be found in virtually any medical center in the world. A chest x-ray is a diagnostic test that can be performed in minutes, which can quickly diagnose a large number of patients. A bottleneck, however, is the need for specialized medical personnel to interpret the images correctly. The number of personnel is usually much lower than the number of X-ray images taken under normal conditions, and much more would be needed to diagnose COVID-19.
A high-performance automated system is therefore required (in terms of classification capacity, speed of response, and resource consumption) capable of identifying sick patients directly from the images. This system would take as input the images of chest x-rays taken from the patients and would determine which category they correspond to Normal (not sick by  or Sick. This system would be made up of a classification model trained from information collected from other sick patients as well as healthy people (Fig. 1). This model should identify the characteristics of the virus in the images, and learn this information to classify unknown cases. The damage caused by the virus to patients' lungs has particular characteristics that are common to all patients regardless of age, size, or sex. This means that the dataset must be made up of patients of all ages, sizes, and sexes. The greater the variability in the dataset images, the greater the ability of the model to identify the relevant parameters in the images. The same happens with the position of the individual in the image, identifying the Region of Interest (ROI) in the image using traditional image processing techniques could make most of the images captured in reallife unusable, for this reason, the model must be immune to this problem, which implies the use of convolutional networks.
We propose the use of the Neural Architecture Search Network (NASNet) as the structure for the convolutional model. This deep network was introduced in early 2018 by the Google Brain team. In its design, they sought to define a building block with high performance in the categorization of a small set of images (CIFAR-10). They then generalized the block to a more extensive data set (ImageNet). In this way, this architecture achieves a high classification capacity and a reduced number of parameters (Fig. 2). We compiled our dataset from public images published due to the rapid spread of the COVID-19 outbreak (Fig. 1). We have formed a dataset of 240 chest x-rays corresponding to the same number of patients, half in the Normal category and half in the Sick category. The images come from two public repositories. The first corresponds to X-ray images taken from patients who have tested positive for COVID-19. It is under construction by Dr. Joseph Cohen, a postdoctoral fellow at the University of Montreal [35], ignoring MERS, SARS, and ARDS cases. This initial database was complemented with images taken from public articles, and we performed data augmentation to increase its size [36], [37]. The second repository, corresponding to healthy patients, corresponds to images taken from healthy children (and with pneumonia, but these were only used in validation processes) [38]. These images were also filtered, and a dataset of the same size as the one resulting in the Sick category was chosen.
There is no specific information related to the age or sex of the patients in the images used in the Sick category, but this is good to increase the variability in the dataset and facilitate the identification of essential characteristics [39], [40].
We built a 771 layers NASNet network with two output nodes (one for each category) and 256x256x3 input nodes corresponding to the input size of the images, three RGB arrays (Red, Green, Blue) of 256x256 pixels. All the input images were scaled to 256x256 pixels to guarantee uniformity in the training dataset (Fig. 2). We do not consider the aspect ratio of the images as an important factor in the model because the parameters to be identified by the network are not altered when changing the aspect ratio. Besides, the real images with which the model will work have different sizes. This network design produced a total of 4,236,149 trainable parameters and 36,738 fixed parameters of the architecture. We randomly mixed the images during the training process to improve the performance of the network. We normalize the value of each RGB matrix from 0-255 to the range of 0-1 since these are the working values of the network. To perform the training, we randomly separated the dataset into two groups, the first group with 70% of the data, which was used exclusively for training, and a second group with the remaining 30% for the model validation process. As optimization and cost metrics in training, we use stochastic gradient descent, categorical cross-entropy, accuracy, and MSE.
The neural model training code was developed in Python 3.7.3 within the Keras framework (using TensorFlow backend). As libraries, we use Scikit Learn 0.20.

III. RESULTS AND DISCUSSION
The final model was trained for ten epochs. To evaluate the training progress, we calculated in each epoch the values of the function of cross-entropy loss (categorical crossentropy loss), the accuracy (or success rate), and the MSE (Mean Squared Error) (Fig. 4). The behavior of these metrics indicates not only the capacity and accuracy in the classification but also their capacity to identify the patterns in new images, that is, the levels of over-adjustment of the network.
The behavior of the metrics during training can be seen in Figs. 5 and 6. Fig. 5 shows the behavior of the categorical cross-entropy loss (or softmax loss), this metric evaluates how close the model outputs are to the true class, reducing the loss the closer it is. Fig. 6 shows the accuracy of the model with the parameters set in each epoch using both images of the training process and unknown to the model. In Fig. 5 it can be seen that the error is reduced strongly and constantly until the fourth epoch, then it continues to reduce very slowly until a minimum saturation is reached. The error of the validation data is also reduced similarly; the curves are kept parallel, which indicates that there is no overadjustment. In Fig. 6 a similar behavior is observed, the accuracy calculated for the training data and the validation data behave similarly, and with a saturation value around 0.97, this not only shows a high capacity of classification of the model but also that it can maintain this accuracy with new images. The model performance was evaluated using the validation data through its Confusion Matrix, the Precision, Recall and F1-score metrics, and through the ROC (Receiver Operator Characteristic) curve. The confusion matrix (or error matrix, Fig. 7) is a table that displays the model's ability to confuse the categories of the classified elements. On the left is the category to which the element belongs, and on the top is the category in which the element has been classified. Ideally, the elements should be classified in the category to which they belong, that is, they should appear in the diagonal of the matrix. To the right of the matrix is a temperature scale that assigns light colors to the highest concentrations, which in our model are actually in the diagonal. Only two images from the Sick category were wrongly classified in the Normal category, and only one image from the Normal category was classified in the Sick category. Recall (or sensitivity) corresponds to the percentage of correct hits for the total number of elements that belong to the category. The F1-score (also F-score or F-measure) corresponds to a weighted value between precision and recall. All the metrics in our model have a value of 97%, which means that the model has an excellent classification capacity.   The behavior of the loss and accuracy curves shows that our model is perfectly tuned, and does not present any problems of over-fitting. However, the number of images corresponding to the Sick category is still too low to consider the model as final. However, the results of the metrics allow us to classify it as a high performance for the task of automatic identification of contaminated persons. According to these results, it is feasible to develop an autonomous system capable of processing images in realtime in the triage stages of medical centers. The model must be trained and adjusted again according to the availability of new images of people contaminated with this virus.

IV. CONCLUSION
In this paper, we perform the performance evaluation of a NASNet (Neural Architecture Search Network) to classify (and identify) the presence of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in patients from chest x-ray images. We assume that the virus produces damage in the lungs of infected patients and that this damage can be identified visually on chest x-rays. The evaluation of the neuronal model arises from the need to develop a lowcost automated tool capable of speeding up medical care in medical centers and reducing the spread of the virus. We selected the NASNet architecture due to the high performance documented in the literature published in similar applications, and its previous use in other tasks by the research group. The neural model was trained with a deep network of 771 layers and 4,236,149 adjustable parameters. These parameters were adjusted using the categorical cross-entropy loss and the MSE (Mean Squared Error) as an error function. The model was tuned to a 97% F1-score. The tuned model can detect patients with COVID-19 and differentiate them from healthy patients in real-time and with an overall accuracy rate of 97%, a higher value than reported in the published literature [18], [25]- [27]. The result of the evaluation indicates that the model can support the diagnosis of the disease and that it is possible to use it in the development of an automated detection system. This system should have a model with a similar structure but trained with a greater number of images of infected people, adjusting the parameters to achieve equivalent or superior performance.