Accelerator-Based Human Activity Recognition Using Voting Technique with NBTree and MLP Classifiers

— In evolution and ubiquitous computing systems, accelerometer-based human activity recognition has huge potential in a large number of application domains. Accelerometer-based human activity recognition aims to identify physical activities performed by human using accelerometer; a sensor device attached to the body and returns an actual valued estimate of acceleration along the x, y-and z-axes from which the sensor location can be estimated. In this study, an accelerator-based activity recognition model using voting technique was proposed. Two machine learning classifiers, Naïve Bayes Tree (NBTree) and Multilayer Perceptron (MLP), were used as ensemble classifiers in the voting technique. To evaluate the proposed voting technique, the performance of selected individual classifiers and existing voting technique was first examined, followed by the experiment to determine the performance of the proposed model. All of the experiments were performed using a standard dataset called Wireless Sensor Data Mining involving six physical human activities; jogging, walking, walking towards upstairs, walking towards downstairs, sitting and stand still. Results showed that the proposed voting technique with NBTree and MLP ensemble classifiers outperformed other individual classifiers and another previously suggested voting technique for accelerometer-based human activity recognition.


I. INTRODUCTION
Human activity recognition is an important yet challenging research area with many applications in healthcare, smart environments, and homeland security. One of the important areas in machine learning research is human activity recognition (HAR). This is due to its significant assistance in human-centred study that the objective is to improve people's life quality. Nowadays, many applications where HAR systems are used, for instance, the continuous monitoring of patients with motor problems to provide health diagnosis and medication tailoring [1], and the automated surveillance of public places for crime prevention [2].
Data play an important role in human activity recognition. Data will give us the pattern to be use for prediction of human activity. There are two approaches of data collection for activity recognition that comprehensively researched. First method is based on environmental sensors such as closed-circuit television cam-era to trace location, interaction of the object and motion movement. Meanwhile, for second method uses human attached sensors (wearable sensors) to track the acceleration of specific part of the human body and also the body as a whole. These two approaches have showed remarkable achievements in a constrained of laboratory settings [3]. Environmental and wearable sensors recently actively use for HAR [4]. Digital video cameras, microphones, global positioning system (GPS) and sensors for measuring of similarity or dissimilarity, body motion and vital signs are just a few examples.
The environmental sensors are generally bigger in size and more expensive. Additionally, user's privacy is a critical issue as the use of camera will be limited in selected location. Furthermore, they must also be physically connected via wired and have their batteries maintained. These require huge costs in setting up and maintaining the system. Wearable sensors on the other hand require two and more device and need user effort to wear and maintain the device or otherwise the sensor cannot be use for collecting the required data. Fortunately, recent developments in wearable sensing technologies such as inertial and vital sign sensors are offering minimal invasive as alternatives for HAR [5].
Current generation of smart phone is the best wear-able sensor as it comprised with a variability of sensors such as image capturing sensors (camera), GPS sensors, proximity sensors, light sensors, inertial sensors (accelerometers and gyroscopes), and direction sensors (compass). An accelerometer sensor is a sensor used to measure acceleration forces. Such forces may be static, like the continuous force of gravity or, as is the case with many mobile devices, dynamic to sense movement or vibrations as shown in Fig. 1.
These accelerometers are capable of detecting the orientation of the device (helped by the fact that they can detect the direction of Earth's gravity), which can provide useful information for activity recognition. Accelerometers were initially included in these devices to support advanced game play and to enable automatic screen rotation but they clearly have many other applications. In fact, there are many useful applications that can be built if accelerometers can be used to recognize a user's activity. For example, we can automatically monitor a user's activity level and generate daily, weekly, and monthly activity reports, which could be automatically emailed to the user. These reports would indicate an overall activity level, which could be used to gauge if the user is getting an adequate amount of exercise and estimate the number of daily calories expended. These reports could be used to encourage healthy practices and might alert some users to how sedentary they or their children actually are. The activity information can also be used to automatically customize the behaviour of the mobile phone. For example, music could automatically be selected to match the activity (e.g., "upbeat" music when the user is running) or send calls directly to voicemail when the user is exercising. There are undoubtedly numerous other instances where it would be helpful to modify the behaviour of the phone based on the user activity and we expect that many such applications will become available over the next decade.
In this research, we explored the use of smart phone accelerometer in recognizing six identified activities through the development of an accelerator-based human activity recognition model using voting technique with two ensemble classifiers; NBTree classifier and MLP classifier. Although many techniques were discovered for activity recognition in previous research, but still, there was no claim which technique is the best because normally researchers used their own experiment datasets instead of publically datasets. Our research objective is to do the accuracy comparison between our model performance with the accuracy from the previous study done by [6]. The result of the comparison is significant because we applied our model on the standard the public dataset used by [6] and we repeat the same experiment to obtain their claimed accuracy.
Our aim was to produce the best model with the model suggested in [6] and therefore, we made experiments and do the comparison with their result. For the purpose of evaluation, wireless sensor data mining (WISDM) [7], a publically available dataset was used. Results obtained showed that the accuracy achieved outperformed from the previous voting technique. Currently, there have been many human activity recognition approaches proposed in the literature. Many machine learning techniques or model used to recognize the human activities for example, Support Vector Machines, Decision Trees, Hidden Markov Models, K-Nearest Neighbour, Conditional Random Fields, were widely used in human activity recognition studies [3][8] [9][10] [11].
Some of the activity recognition work focused on the use of more than one accelerometer sensor and possibly combine with other sensors. For example, in [12] developed an automatic physical activities recognition system in a controlled environment using accelerometers and microphones. Another example, method to classify an activity of wavelet that using one or more accelerometers sensor was proposed by [13] where the dynamic motion component was separated from the gravity components that managed to achieved 98.4% accuracy. Meanwhile, in [14] used five (5) accelerometers sensor to collect data from 31 subjects and built a hierarchical classification model to identify different body postures and movements.
Human daily activities can also be monitored by wearable device sensor and have proven to be another an effective sensor for HAR. An experiment done by [15] used five biaxial accelerometers to recognize twenty different activities ranging from walking to folding laundry to strength training. These accelerometers placed on the left bicep, right wrist, left quadriceps, right ankle, and right hip. Their results showed that the correct location of accelerometer device placed will obtain accurate body motion. In [16], reported the use of wearable device sensor for collecting acceleration data for human activity recognition obtained 94% accuracy.
Apart of that, studies on the use of smart phone devices to collect data for activity recognition were also discovered. As mentioned earlier, current generation of smart phone is the best wearable sensor as it comprised with a variability of sensors. In [17], used an Android operating system based smart phone for recognizing basic activities such as walk, jog, climb up and walk down the stairs, sit and stand still. Meanwhile, in [18] proposed a subject dependent real time activity recognition system by using the Nokia N95 smart phone. Human activities can be divided into simple and complex activity based on the complexity of the recognition task [19]. Using data collected from smart phone device, in [19] claimed that by using MLP classifier, they managed to achieve 93% accuracy for simple human activities recognition but for complex activities human recognition, they achieved only 50% of the accuracy.
An activity which involves one person and lasts only within a few seconds can be categorized as simple activity. Some examples of simple activities are running, walking, and jogging, which do not contain much outlier noise or variations of combination activity. Meanwhile, for complex activities are not as repetitive as simple activities and may involve various movements such as talking while cooking, typing while sitting, smoking and giving a talk. Complex human activity recognition can provide the data to build automated recognition systems for preventing, curing, and improving wellness and health conditions of older adults. Finding from [3] shows that a smart phone can be used to recognize simple activities as well as complex activities, but extra sensors should be considered for better prediction results in complex activities. Research shows combination of two or more machine learning model can achieved high accuracy in recognizing human activities. In [19], showed several experiments with four users that performed six complex activities such as slow walk, fast walk, run, walk up-stairs, walk down-stairs and dancing were conducted, their recognition model managed to achieve 91.15% accuracy. In that experiment, three classifiers were combined, MLP, SVM and LogicBoost, and located in-pocket phone position and the result showed combination of MLP, Random Forest (RF) and Simple Logistic classifiers performed best for in-pocket phone position. They reported that the suitable combination rule for combination technique was not majority voting but average of probabilities. Meanwhile, according to [6], they proposed a model by using ensemble techniques with combination three classification algorithms, namely Decision Tree (C4.5) classifier, Multi-Layer Perceptrons (MLP) classifier and Logistic Regression classifier with the average of probabilities combination rule. The result showed that the performance of the proposed combination provides better performance than the previous MLP-based recognition approach. The dataset includes information from thirty-six users and 43 features were used during the experiments. Therefore, we aimed to compare our result with the performance reported in [6].
Voting or vote classifier used to ensembles of more than one classifier. The approach is based on plurality or majority voting, where each single classifier contributes a single vote [20]. The aggregation prediction is decided by the majority of the votes, i.e., the class with the most votes is the final prediction. The final prediction is decided by total up all votes and then class with the highest aggregate will be chosen. The advantage of voting is that it is unlikely that all classifiers will make the same mistake, as long as every error is made by a minority of the classifiers, an optimal classification can be achieved. The rest of this paper is organized as follows. Section 2 described the methods used in this study and section 3 presents and discusses the results. Finally, the last section concludes the paper.

II. MATERIAL AND METHOD
In this section, we first describe our proposed voting technique that uses NBTree and MLP classifiers followed by the description on the design of our experiments to evaluate the performance of our proposed voting technique with regard to its ability in performing activity recognition using smart phone. Through a quite numbers of experiments using Weka, we managed to produce our proposed model. Weka is a collection of machine learning algorithms for data mining tasks where the algorithms can either be applied directly to a dataset or called from Java code. A detail comparison of machine learning tools that are appropriate for prediction was presented in [21], and as a result, Weka tool was found to be the best in terms of computational perspective, wider range of algorithms, better data preparation tools and support for very large data sets. It also supports several standard data mining tasks, more specifically, data pre-processing, clustering, classification, regression, visualization, and feature selection. Therefore, in this research, all of the experiments and tests were performed using Weka.
In order to collect data for our supervised learning task, it was necessary to have a large number of users carry an Android-based smart phone while performing certain everyday activities. As mentioned earlier, data required to perform the experiments were obtained from a publicly available dataset, Wireless Sensor Data Mining (WISDM). There are 29 users whose activities information were recorded in this dataset using smart phone accelerometers, which makes this dataset suitable for benchmarking studies. The dataset comprises 1,098,207 and 5,424 examples of raw data and transformed data respectively, which are distributed according to the six activities identified earlier. These activities were walking, jogging, walking upstairs, walking downstairs, sitting, and standing. These activities were chosen they are commonly performed in our daily life. For each activity, acceleration was plotted in three different axes; x, y and z axis. X-axis represents horizontal movement of the participant leg, y-axis represents up-ward and downward motion and z-axis represents forward movement. The statistics of the dataset is shown in Table 1.
According to WISDM, the data involved a large number of users carry an Android-based smart phone while performing certain everyday activities. Before collecting this data, they obtained approval from the Fordham University IRB (Institutional Review Board) since the study involved "experimenting" on human subjects and there was some risk of harm (e.g., the subject could trip while jogging or climbing stairs). They managed to get help from twenty-nine volunteer subjects to carry a smart phone while performing a specific set of activities. These subjects carried the Android phone in their front pants leg pocket and were asked to walk, jog, ascend stairs, descend stairs, sit, and stand for specific periods of time.
The data collection was controlled by an application that executed on the phone. This application, through a simple graphical user interface, permitted them to record the user's name, start and stop the data collection, and label the activity being performed. The application also permitted them to control what sensor data (e.g., GPS, accelerometer) was collected and how frequently it was collected. In all cases they had collected the accelerometer data every 50ms, therefore they had 20 samples per second. The data collection was supervised by one of their WISDM team members to ensure the quality of the data.
Feature creation is a critical step in the development of any classifier. An activity recognition system does not solve the classification task directly on raw acceleration data. Generally, the classification is performed after an informative data representation is created in terms of feature vectors. To accomplish this, the dataset was prepared by the raw data series divided to 10  The "time between peaks" feature requires further explanation. The repetitive activities, like walking, tend to generate repeating waves for each axis and this feature tries to measure the time between successive peaks. To estimate this value, for each example we first identify all of the peaks in the wave using a heuristic method and then identify the highest peak for each axis. A threshold based on a percentage of this value is set and find the other peaks that met or exceed this threshold; if no peaks meet this criterion then the threshold is lowered until we find at least three peaks and then the time was measure between successive peaks and calculate the average. For samples where at least three peaks could not be found, the time between peaks is marked as unknown. This method was able to accurately find the time between peaks for the activities that had a clear repetitive pattern, like walking and jogging. Certainly more sophisticated schemes will be tried in the future. Therefore, in this dataset there are about total of 43 features, which are the variations of six feature types. By using a classifier, hypothesis can be build and a testing activity can be predicted based on these 43 features.
Classification is a tree based structure which is a concept of data mining (machine learning) technique. It used to predict data instances through attributes. Classification is a method where one can classify future data into known classes. In general, this approach uses a training data set to build a model and test data set to validate it. One of the classification technique is ensemble methods. Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produces more accurate solutions than a single model would. Voting and averaging are two of the easiest ensemble methods. They are both easy to understand and implement. Voting is used for classification and averaging is used for regression. As shown in Fig. 2, from the data, it will generate a set of classification or prediction models, M1, M2,...Mk, then voting strategies are used to combine the predictions for a given unknown tuple. The main objective of this research was to identify the performance of voting technique that uses combination of NBTree and MLP classifiers. The reason of choosing NBTree is because, NBTree is a hybrid approach of Naive Bayesian and Decision Tree that is suitable in learning scenarios when many attributes are likely to be relevant for a classification task. It induces highly accurate classifiers in practice, significantly improving upon both its constituents in many cases. MLP on the other hand is a well-known machine learning classifier which maps inputs on to outputs. MLP model has three layers which are input layer, hidden layer and output layer as the basic building blocks and at the same time, MLP use back-propagation algorithm as learning technique for training.

A. Individual Classifiers
To begin with, and for the purpose of comparison later, we evaluated the performance of the following individual classifiers, all available in the Weka toolkit: NBTree and MLP. These classifiers were trained and tested using a 10fold cross validation method on the set of extracted features. Cross-validation is a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it.
In 10-fold cross-validation, the original sample is randomly partitioned into 10 equal size subsamples. Of the 10 subsamples, a single subsample is retained as the validation data for testing the model, and the remaining 9 subsamples are used as training data. The cross-validation process is then repeated 10 times (the folds), with each of the 10 subsamples used exactly once as the validation data. The 10 results from the folds then be averaged to produce a single estimation as shown in Fig. 3. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once.

B. Combination of Classifiers
Then, the previous voting technique in [6] that used J48, LR and MLP was applied on the same dataset for comparison. Finally, the experiment was repeated on our proposed voting technique that uses the combination of NBTree and MLP as an ensemble classifier. We used an average of probabilities combination rule instead of majority voting to integrate into our model for the decision step and in the experiment as well we used ten-fold cross validation approach. The reason of average of probabilities was chosen because as mentioned by [19], that the suitable combination rule for combination technique was not majority voting but average of probabilities. After each experiment, confusion matrix for each technique were calculated. To compare the performance of each technique, a comparison table was produced where the details performance results of each were populated. In this section, we discuss the results obtained from the experiments. Table 2 to Table 4 present the confusion matrices for NBTree, MLP and voting technique in [6] respectively. Table 5 presents solely the confusion matrices for our model. As can be seen from the results presented on the tables, combination of NBTree and MLP classifiers in a voting technique gave us the best overall accuracy among other classifiers and voting technique. Based on the results in Fig. 4, it shows that a simple combination of NBTree and MLP achieved the best accuracy on the chosen dataset and has outperformed the previously defined voting technique. The most important activities to analyse are the climbing-up and climbing-down stair activities, since these were the only activities that that were difficult to recognize.
Apart of that, the results show that with the correct classifier combination and the right combination rule will produce high performance. We have used voting technique with the combination of NBTree classifier and MLP classifier and an average of probabilities as the combination rule. After an intensive number of experiments, we believe the factor that lead with high accuracy is the suitable combination of the classifier with the right combination rule. Similar to previous work in [6], our proposed model also achieved the highest accuracy for walking and jogging activities due to the number of available samples for these activities.
Our proposed model has therefore overcome the weakness found in [6] and other classifiers for their bad performance to predict walking upstairs and walking downstairs activities, where our voting technique achieved 93.35% for walking upstairs and 90.15% for walking downstairs activities.

IV. CONCLUSIONS
In this paper, an accelerator-based human activity recognition model was proposed using voting technique that combines NBTree and MLP as ensemble classifiers. The classifiers were combined based on average probabilities combination rule. Number of experiments performed to do the comparison and evaluate the performance of the proposed model used publicly available dataset on selected human activities from WISDM. Results of the experiments showed that the proposed human activity recognition model was able to achieve better performance in terms of accuracy comparted to single classifier model and also outperformed a previous voting technique. Combining the two best classifiers using the average of probabilities rules turned out to be the best classifier for activity recognition, outperforming all individual classifiers. For future, we plan to produce more comprehensive voting technique to handle complex activity recognition.