Analysis of Attribute Selection and Classification Algorithm Applied to Hepatitis Patients

Sherylaidah Samsuddin, Zuraini Ali Shah, RD Rohmat Saedudin, Shahreen Kasim, Choon Sen Seah

Abstract


Data mining techniques are widely used in classification, attribute selection and prediction in the field of bioinformatics because it helps to discover meaningful new correlations, patterns and trends by sifting through large volume of data, using pattern recognition technologies as well as statistical and mathematical techniques. Hepatitis is one of the most important health problem in the world. Many studies have been performed in the diagnosis of hepatitis disease but medical diagnosis is quite difficult and visual task which is mostly done by doctors. Therefore, this research is conducted to analyse the attribute selection and classification algorithm that applied to hepatitis patients. In order to achieve goals, WEKA tool is used to conduct the experiment with different attribute selector and classification algorithm . Hepatitis dataset that are used is taken from UC Irvine repository. This research deals with various attribute selector namely CfsSubsetEval, WrapperSubsetEval, GainRatioSubsetEval and CorrelationAttributeEval. The classification algorithm that used in this research are NaiveBayesUpdatable, SMO, KStar, RandomTree and SimpleLogistic. The results of the classification model are time and accuracy. Finally, it concludes that the best attribute selector is CfsSubsetEval while the best classifier is given to SMO because SMO performance is better than other classification techniques for hepatitis patients.

Keywords


data mining; attribute selection; classification; hepatitis; WEKA

Full Text:

PDF

References


Varun Kumar.M, Vijaya Sharathi.V And Gayathri Devi.B.R (2012). Hepatitis prediction model based on data mining algorithm and optimal feature selection to improve predictive accuracy. International journal of computer applications 51(19):13-16

Seah, C. S., Kasim, S., & Mohamad, M. S. (2017). Specific Tuning Parameter for Directed Random Walk Algorithm Cancer Classification. International Journal on Advanced Science, Engineering and Information Technology, 7(1), 176. doi:10.18517/ijaseit.7.1.1588

Sen, S. C., Kasim, S., Fudzee, M. F., Abdullah, R., & Atan, R. (2017). Random Walk From Different Perspective. Acta Electronica Malaysia, 1(2), 26-27. doi:10.26480/aem.02.2017.26.27

Chan, W. H., Mohamad, M. S., Deris, S., Corchado, J. M., Omatu, S., Ibrahim, Z., & Kasim, S. (2016). An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways. International Journal of Bioinformatics Research and Applications, 12(1), 72. doi:10.1504/ijbra.2016.075404

Ramesh Prasad Aharwal (2016). Evaluation of Various Classification Techniques of Weka Using Different Datasets. International Journal of Advance Research and Innovative Ideas in Education. Vol-2 Issue-2. ISSN (O)-2395-4396

Duygu Calisir, Esin Dogantekin. (2011). a new intelligent hepatitis diagnosis system: pca–lssvm. sciencedirect. volume 38, issue 8, pages 10705–10708.

Seah, C. S., Kasim, S., Mohamad, M. S., et al. (2018). An Effective Pre-Processing Phase for Gene Expression Classification. Indonesian Journal of Electrical Engineering and Computer Science, 11(3).

CfsSubsetEval. (2017, December 22). Retrieved from http://weka.sourceforge.net/doc.dev/weka/attributeSelection/CfsSubsetEval.html

WrapperSubsetEval. (2017, December 22). Retrieved from http://weka.sourceforge.net/doc.dev/weka/attributeSelection/WrapperSubsetEval.html

GainRatioAttributeEval. (2017, December 22). Retrieved from http://weka.sourceforge.net/doc.dev/weka/attributeSelection/GainRatioAttributeEval.html

CorrelationAttributeEval. (2017, December 22). Retrieved from http://weka.sourceforge.net/doc.dev/weka/attributeSelection/CorrelationAttributeEval.html

Seah, C. S., Kasim, S., Fudzee, M. F., & Mohamad, M. S. (2017). A Direct Proof of Significant Directed Random Walk. IOP Conference Series: Materials Science and Engineering, 235, 012004. doi:10.1088/1757-899x/235/1/012004

Rusland, N. F., Wahid, N., Kasim, S., & Hafit, H. (2017). Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets. IOP Conference Series: Materials Science and Engineering, 226, 012091. doi:10.1088/1757-899x/226/1/012091

Ngwar, M., & Wight, J. (2015). A fully integrated analog neuron for dynamic multi-layer perceptron networks. 2015 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2015.7280448

Jenkins, J., Nick, W., Roy, K., Esterline, A., & Bloch, J. (2016). Author identification using Sequential Minimal Optimization. SoutheastCon 2016. doi:10.1109/secon.2016.7506654

Nawi, N. M., Atomi, W. H., & Rehman, M. (2013). The Effect of Data Pre-processing on Optimized Training of Artificial Neural Networks. Procedia Technology, 11, 32-39. doi:10.1016/j.protcy.2013.12.159

Seah, C. S., Kasim, S., Fudzee, M. F., Ping, J. M., Mohamad, M. S., Saedudin, R. R., & Ismail, M. A. (2017). An enhanced topologically significant directed random walk in cancer classification using gene expression datasets. Saudi Journal of Biological Sciences, 24(8), 1828-1841. doi:10.1016/j.sjbs.2017.11.024




DOI: http://dx.doi.org/10.18517/ijaseit.8.5.5041

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development