Development of Mobile Face Verification Based on Locally Normalized Gabor Wavelets

— In this paper, we present a mobile face verification framework for automated attendance monitoring as a solution for more efficient, portable and cost-effective attendance monitoring systems. We use Raspberry Pi as mobile embedded input module connecting the webcam and radio frequency identification (RFID) reader to the personal computer (PC) which provides mobility due to its light weight and wireless connectivity. In order to increase the reliability of the system, we incorporate a face verification method which employs locally-normalized Gabor Wavelets as the features for dual verification stage. We evaluate the accuracy and processing time of the proposed face verification. It found that it produces good accuracy under limited reference sample constraint and fast response for a small number of gallery images. The proposed method delivers 97%, 99.8% and 95.3% accuracy for AR, YALE B and FERET datasets. In term of processing speed, the proposed method managed to classify a single image against 500 gallery images in 1.909 seconds. The system delivers fast verification with high accuracy under the constraint of just single reference sample, which increases the reliability of the proposed system.


I. INTRODUCTION
In Universiti Teknologi MARA (UiTM), the process of taking students' attendance still employ a piece of paper where students need to write their signature on the sheet for every class, lab sessions, workshops and even outdoor programs. This method is not reliable since there is a high risk of losing the data. There is also a high possibility of falsification of attendance by the students. Thus it is important to elevate the functionality of attendance management system. Furthermore, a significant increase in a number of the university's students in recent years requires inevitable improvement on the traditional way of monitoring attendance in lectures and students' programs. Additionally, the integrity of students' attendance determined by sole use of matric cards in existing systems such as Easy Access Attendance Management System (EAMS) [1] could be improved further. The main concern regarding EAMS is it is possible that the student can falsely register the attendance on someone's behalf by simply flashing a matric card in front of the scanner. Another issue tackled in this work is most existing attendance systems in the market are not mobile enough to be readily deployed when needed, i.e. for outdoor activities.
Thus, we believe that we can address these problems with an embedded mobile attendance system equipped with dual verification strategy which utilizing both smart card and biometric feature such as face during the registration process for better mobility, accuracy, and reliability. In light of this proposed solution, we are aware that fingerprint, face, and iris are among biometrics properties that are commonly used for person recognition. However, face recognition amassed our interest due to its distinction in accuracy and nonobtrusiveness when used for active person recognition [2], [3]. Additionally, face recognition is the most natural biological features recognition according to the cognitive rule of human beings. Face recognition also possesses the following advantages compared to fingerprint including fast identification, high security, contactless and hygienic. However, the real challenge lies in the limited number of reference faces available for a good estimation of the subject identity. In this case, each student may have only single image in university database as a reference. This problem is known Single Sample per Person (SSPP) problem [4]- [6] and one of the solutions for this problem is by using local face recognition approach [7], [8] which is adopted in this work.
We are confident that our proposed solution is better than existing methods in regard to effectiveness and accuracy with dual-verification, required resources and cost as well as in terms of ease of use and deployment. Ultimately, the system can be used not only in UiTM but also in other universities or training centers as well. As a local product, this would clearly benefit the growth of local economy. This solution also can ensure the education in the form of lectures and training are well delivered, and the target audience would fully utilize the opportunity presented to them.
Previously, local strategies have been adopted to overcome the SSPP problems. The strategies involve the partitioning of faces into blocks and subsequently the local patches (LP) are classified using ensembles of classifiers by computing non-metric similarities between LPs of training samples. Martinez has produced three significant works on face recognition under SSPP constraints using probabilistic matching and motion estimation [9]. Besides, in [8] extends local probabilistic approach in Martinez's work using Self-Organizing Maps (SOM) where they proposed to train a single SOM for all the samples and to train a separate SOM for each class. In addition, in [10] adapts a generic discriminant model to discriminate the persons in SSPP constraints by Adaptive Generic Learning (AGL) method. Recently, Sparsity Preserving Discriminant Analysis (SPDA) is proposed to deal with multi sample [11] and SSPP face recognition [12]. Another recent work in dealing with SSPP problem is a method called Discriminative Multimanifold Analysis (DMMA) [13]. More recently, in [14] proposed another SSPP-based face recognition method called Double Linear Regression (DLR).
On the other hand, local approach alone could not produce good classification result if the features used have poor preservation of spatial locality and possess inadequate discriminative ability. In previous years, Gabor Wavelets (GW) have been identified as one of the best face descriptors for face recognition, and this is largely attributed to GW's biologically relevant kernel that effectively represents facial features [15]- [21]. GW preserves the inherent spatial locality by employing kernel which is identical to the human cortical cells, specifically the receptive field and this preservation of spatial locality is indeed a vital characteristic for an excellent face descriptor. GW features can help preserving optimal intra-class and inter-class separation since the computed features are optimally localized in both space and frequency domains.
Among previous well-known methods of face recognition based on local GW approach are Local Gabor Binary Pattern Histogram Sequence (LGBPHS) [22] and Hierarchical Ensemble Classifier (HEC) [23] where HEC is implemented using a weighted fusion of Local Gabor Feature Vector (LGFV) and global Fourier transform. Additionally, Gabor features are independently classified using ensembles of Borda count in Local Matching Gabor (LMG) method [24].
Recently, LMG is improved using entropy-like weighting strategy and Local Normalization (LN) approach. Gabor wavelets have also been successfully implemented previously for handwritten numeral recognition [25].

A. Locally Normalized Gabor Wavelets Features
In this paper, we adopt the Local Gabor Feature Vector (LGFV) method similar to the implementation of HEC, but with the fusion step dropped and LN process is added to improve the classification accuracy by reducing the effect of illumination variations in image. We use GW to extract features from local square patches of the face image to form a group of feature vectors by combining the features sharing similar spatial information (lateral patches). The images are locally normalized before computing the GW. Thus the proposed method is referred to onwards as LGFV-LN method. GW are computed using Gabor kernel . Given that is the pixel, is the orientation, is the scale, is the step in frequency and is the maximum frequency, which GW can be computed using Equations 1 and 2: (1) We use 8 orientations and 5 scales forming into 40 GW of different scales and orientations. In order to obtain Gabor feature image and GW kernel are convolved such that . Since small displacements can linearly affect Gabor phases, we use only the Gabor magnitudes. Thus using Equation 3, we can calculate the magnitude .
Inspired by previous work, we propose the use of LN image as the input for the convolution with the Gabor kernel instead of . can be obtained from (4) where denotes the mean of a neighborhood around the pixel and is the standard deviation of the neighborhood [26]. Subsequently, we partitioned the acquired Gabor Image (GI) into square LPs. The overall process of LGFV-LN features acquisition is illustrated in Fig. 1.

B. Face Classification
For face classification, we adopt the local ensemble strategy of k nearest neighbor method called soft NN where a confidence vector for each local feature was calculated, and then all these confidences were combined by sum aggregation [27]. However, in this work, we use Cosine similarity metrics as opposed to Euclidean distance as some recent works reported good results using Cosine similarity metrics, especially when used with GW features. Given two vectors of attributes A and B where and are the components of vector A and B respectively, the cosine similarity can be represented using a dot product and magnitude which given in Equation 5. (5)

C. Mobile Face Verification System
We use Raspberry Pi as mobile embedded input module in the main system framework, where we connect the webcam and RFID reader to the PC wirelessly. This provides the much needed mobility and portability which is due to its light weight and wireless connectivity. Fig. 2 and 3 show the overall framework of the mobile face verification system and the actual implementation of mobile input module respectively. Raspberry Pi is used to acquire facial image and ID from the person before transmitting the image and ID for further processing at a server PC. In this case, the Raspberry Pi and the server are connected to the same Access Point (AP). The process of verifying a person is given in Fig. 4 and can be elaborated as follows: • The subject's ID is acquired as the smart card is flashed on the RFID reader. Subsequently, the subject face is acquired from a webcam connected to the Raspberry Pi using Viola-Jones face detection algorithm [28]. • The acquired face image and ID are then transmitted to a server PC.
• At the server, the ID is checked against the database entries to verify its existence. If the ID exists, the following procedures continue. Otherwise, the verification returns a negative result. • The image sent to the server is processed further to obtain LGFV-LN feature. Likewise, the LGFV-LN features for all candidate images in the database are computed beforehand to speed-up the classification process.
• Soft NN classifier is used to find top-n nearest candidates where afterward, the ID for top-n candidates is compared against the subject's ID. • If one of the top-n candidates' ID matches the subject's ID, the verification returns a positive result. Otherwise, the verification returns a negative result. • The verification result is sent wirelessly to the Raspberry Pi. Fig. 4 The process of verifying a person using the proposed mobile face verification system

III. RESULTS AND DISCUSSION
For the experiments, three datasets namely AR, YALE B, and FERET are used. The AR dataset [29] contains several types of variations such as different illumination conditions, expressions, and partial occlusions. Out 26 images for each subject, 1 image contain neutral expression (used as gallery image), and the remaining 25 are used as probes-7 images contain expressions, 6 images contain illumination variation, 6 images depict the person as wearing glasses, and the other 6 images depict the person as wearing a scarf. Yale B dataset [30] contains images having different illumination conditions with a wide range of light direction with respect to camera axis defined by azimuth angle and elevation .  Fig. 7 shows images from FERET dataset.

A. Results for All Tested Datasets
Several methods including proposed method are tested with the datasets, and the results are given in Table I. The PI method denotes local face recognition using only image pixel intensities, without application of GW while LGFV method uses local GW approach but without local normalization applied. LN method uses only local normalization without GW applied. Best results in the table are shown in bold. Top-1 face recognition accuracy in Table 1 suggests that LGFV-LN delivers the best result for all tested datasets. For a fair comparison, all methods are using Cosine similarity, and it can be observed that the classifier when used with PI and LN fails for fb, fc, dupI and dupII. This is due to a problem with the computation of cosine values since the vectors' values were too small. In other datasets, GW features clearly show its descriptive superiority against non-GW methods such as PI and LN. For example, in scarf dataset, LGFV-LN delivers 96.5% accuracy while PI and LN deliver 58% and 52.5% accuracy respectively. Moreover, with the use of local normalization, LGFV-LN manages to produce superior results than LGFV method. Table 2 shows the performance of LGFV-LN with 3 different types of classifiers namely Cosine similarity and Euclidean distance for soft NN and soft SVM, where ensembles of SVM classifiers are used. Based on results in the table, best performances of LGFV-LN are obtained when Cosine similarity is used rather than Euclidean distance or soft SVM. For FERET datasets, soft SVM failed completely since the memory required for computation is too high.

C. Comparison between LGFV-LN and Existing Methods
As a benchmark, several existing methods such as Local Binary Patterns (LBP), Uniform Pursuit (UP), Local SOM and DMMA are compared against LGFV-LN. According to Table 3, LGFV-LN delivers highest Top-1 accuracy for all tested datasets except for fb dataset where DMMA produce 98.1% accuracy which is slightly superior to LGFV-LN's 97.2 % accuracy. Other than that, LGFV-LN demonstrates superiority against all benchmarked methods.

D. Top-n Accuracy for
LGFV-LN As a verification system, it is important to consider not only Top-1 candidate but also several other candidates ranked just after the top match. In this case, we show the Top-n accuracy for our proposed LGFV-LN method where n ranges from 1 to 10. By taking first n matches, we can ensure a higher rate of true positives while keeping false positives as low as possible by relying on the descriptive ability of LGFV-LN features. Based on the result in Fig. 8, we can deduce that is suitable for our implementation since it produces optimal performance. For , the accuracy for AR, YALE B, and FERET are 97%, 99.8%, and 95.3% respectively. Increasing the value of n further will increase the accuracy slightly, but it will ultimately increase processing time as well as the rate of false positives.
As for the face verification performance of LGFV-LN, the performance can be illustrated as Receiver Operating Characteristics (ROC) curve for ease of understanding. The performance is measured by plotting the True Positive Rate (TPR) against False Positive Rate (FPR) for PI, LGFV-LN (Cosine) and LGFV-LN (Euclidean). Each dataset used earlier are divided into two parts namely the non-impostors and impostors. The non-impostors are the probes whose reference images were in the gallery, while the impostors are the probes whose reference images were removed from the gallery and random ID were assigned to them. The ROC were obtained by recording the TPR and FPR due to the classification of faces against top-n candidate where n is incrementally changed from 1 to 50 (for YALE B, maximum n used is 19). The ratio of impostors to non-impostors can be given as 1-to-m ratio, where m used in this paper is until . The average of TPR vs. FPR obtained for AR, FERET and YALE B datasets from repeating experiments with different values of n and m are shown in Fig. 9. The performance measure in terms of TPR and FPR is very important in face verification system since we want to limit the access that any impostors have. In this case, we wish to reduce the chance of falsification in students' attendance record.

F. Performance of LGFV-LN in Terms of Processing Speed
There is a trade-off between the accuracy and speed even though LGFV-LN approach produces higher accuracy than other approaches. To investigate the feasibility of PI, LGFV-LN (Cosine) and LGFV-LN (Euclidean) approaches, the processing time required for these methods are examined. This experiment is carried out on a computer running on 3.80GHz quad-core processor with 4GB RAM. For comparison, the processing time only includes the time required to perform classification. We use one-to-many classification for face recognition, where the face recognition would try to classify single face against a gallery containing multiple images. The result is shown in Fig. 10 for classification of 1 face against up to 1000 gallery images.
Based on result presented in Fig. 10, PI approach is faster than both LGFV-LN approaches where PI only requires only 0.045 seconds to classify a face against 500 images which is only a fraction of the time required by LGFV-LN (Euclidean) and LGFV-LN (Cosine) which is at 0.888 and 1.909 seconds respectively. Based on this finding, for realtime application, we recommend that LGFV-LN approach is suitable when classifying faces against a small number of gallery images. But for the larger gallery, another approach such as PI would be more suitable. Another factor that would influence the type of suitable approach is the variations in the face to be classified. If local variations are involved, it is better to use LGFV approach. Additionally, if the processing speed is critical, we would recommend using LGFV-LN (Euclidean) approach since it is faster than LGFV-LN (Cosine) but with some sacrifice in accuracy.

IV. CONCLUSION
A framework for mobile mobile face verification system which is to be used as attendance monitoring system in UiTM has been proposed in this paper. The face verification method is basedon locally normalized Gabor Wavelets features.. We proposed the use of Raspberry Pi as mobile embedded input module connecting the webcam and RFID reader to the PC wirelessly. We also explained the process of person verification adopted in this system. As for face recognition stage, the proposed Local Gabor Feature Vector with Local Normalization (LGFV-LN) method demonstrates superiority against almost all benchmarked methods in all tested datasets under Single Sample Per Person (SSPP) constraints. The Cosine similarity measure when used with soft NN classifier also delivered the best performance for classification of LGFV-LN features. We also proposed that the verification system to consider up to 3 top matches for optimal performance. The Top-3 accuracy of LGFV-LN (Cosine) for AR, YALE B, and FERET datasets are found to be 97%, 99.8% and 95.3% respectively. We also found that LGFV-LN (Cosine) produces the best ROC curve, but LGFV-LN approach falls short in processing speed department. As for future work, we can improve the proposed method by systematically reducing the number of LGFV-LN features for better processing speed and accuracy. Additionally, we are going to test whether the proposed method can be implemented to perform the face recognition on the mobile platform, especially on mobile smart phones, rather on a remote PC.