Feature Selection Method using Genetic Algorithm for Medical Dataset

Neesha Jothi, Wahidah Husain, Nur’Aini Abdul Rashid, Sharifah Mashita Syed-Mohamad


There is a massive amount of high dimensional data that is pervasive in the healthcare domain. Interpreting these data continues as a challenging problem and it is an active research area due to their nature of high dimensional and low sample size. These problems produce a significant challenge to the existing classification methods in achieving high accuracy. Therefore, a compelling feature selection method is important in this case to improve the correctly classify different diseases and consequently lead to help medical practitioners. The methodology for this paper is adapted from KDD method. In this work, a wrapper-based feature selection using the Genetic Algorithm (GA) is proposed and the classifier is based on Support Vector Machine (SVM). The proposed algorithms was tested on five medical datasets naming the Breast Cancer, Parkinson’s, Heart Disease, Statlog (Heart), and Hepatitis. The results obtained from this work, which apply GA as feature selection yielded competitive results on most of the datasets. The accuracies of the said datasets are as follows: Breast Cancer - 72.71%, Parkinson’s – 88.36%, Heart Disease – 86.73%, Statlog (Heart) – 85.48 %, and Hepatitis – 76.95%. This prediction method with GA as feature selection will help medical practitioners to make better diagnose with patient’s disease.  


data mining; data mining in healthcare; medical dataset; feature selection; genetic algorithm.

Full Text:



N. Lavrač, “Selected techniques for data mining in medicine,” Artif. Intell. Med., vol. 16, no. 1, pp. 3–23, 1999.

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI Mag., vol. 17, no. 3, pp. 37–54, 1996.

I. Yoo et al., “Data mining in healthcare and biomedicine: A survey of the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448, 2012.

A. Sheikhtaheri, F. Sadoughi, and Z. Hashemi Dehaghi, “Developing and using expert systems and neural networks in medicine: A review on benefits and challenges,” J. Med. Syst., vol. 38, no. 9, 2014.

J.-J. J. Yang et al., “Emerging information technologies for enhanced healthcare,” Comput. Ind., vol. 69, no. 0, pp. 3–11, 2015.

B. Liao et al., “for High-Throughput Data Analysis,” vol. 12, no. 6, pp. 1374–1384, 2015.

A. Peña-Ayala, “Educational data mining: A survey and a data mining-based analysis of recent works,” Expert Syst. Appl., vol. 41, no. 4 PART 1, pp. 1432–1462, 2014.

I. H. Osman and J. P. Kelly, “Meta-Heuristics: An Overview,” in Meta-Heuristics, Boston, MA: Springer US, 1996, pp. 1–21.

H. Salem, G. Attiya, and N. El-Fishawy, “Classification of human cancer diseases by gene expression profiles,” Appl. Soft Comput. J., vol. 50, pp. 124–134, 2017.

A. K. Paul, P. C. Shill, M. R. I. Rabin, and M. A. H. Akhand, “Genetic algorithm based fuzzy decision support system for the diagnosis of heart disease,” 2016 5th Int. Conf. Informatics, Electron. Vision, ICIEV 2016, pp. 145–150, 2016.

C. Dua, Dheeru and Graff, “{UCI} Machine Learning Repository.” University of California, Irvine, School of Information and Computer Sciences, 2017.

S. Chatterjee, S. Hore, and N. Dey, “Dengue Fever Classification Using Gene Expression Data: A PSO Based Artificial Neural Network Approach,” vol. 515, pp. 331–341, 2017.

M. K. Shahsavari, H. Rashidi, and H. R. Bakhsh, “Efficient classification of Parkinson’s disease using extreme learning machine and hybrid particle swarm optimization,” 2016 4th Int. Conf. Control. Instrumentation, Autom. ICCIA 2016, no. January, pp. 148–154, 2016.

P. Shunmugapriya and S. Kanmani, “A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid),” Swarm Evol. Comput., vol. 36, pp. 27–36, 2017.

B. Subanya and R. R. Rajalaxmi, “Feature selection using artificial bee colony for cardiovascular disease classification,” 2014 Int. Conf. Electron. Commun. Syst. ICECS 2014, pp. 1–6, 2014.

M. Kantardzic, Data mining: concepts, methods and algorithms. Wiley-IEEE Press, 2003.

D. Prilutsky, B. Rogachev, R. S. Marks, L. Lobel, and M. Last, “Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood,” Artif. Intell. Med., vol. 52, no. 3, pp. 153–163, 2011.

B. Samanta et al., “Prediction of periventricular leukomalacia. Part I: Selection of hemodynamic features using logistic regression and decision tree algorithms,” Artif. Intell. Med., vol. 46, no. 3, pp. 201–215, 2009.

T. M. Lehmann et al., “Automatic categorization of medical images for content-based retrieval and data mining.,” Comput. Med. Imaging Graph., vol. 29, no. 2–3, pp. 143–55, 2005.

R. Liu, Y. Chen, L. Jiao, and Y. Li, “A particle swarm optimization based simultaneous learning framework for clustering and classification,” Pattern Recognit., vol. 47, no. 6, pp. 2143–2152, 2014.

T. Hong, K. Lin, and S. Wang, “Fuzzy Data Mining for Interesting Generalized Association Rules,” Fuzzy Sets Syst., vol. 138, pp. 255–269, 2003.

D. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Comput., vol. 16, no. 12, pp. 2639–2664, 2004.

B. Liu, Y. Xiao, L. Cao, Z. Hao, and F. Deng, “SVDD-based outlier detection on uncertain data,” Knowl. Inf. Syst., vol. 34, no. 3, pp. 597–618, 2013.

N. Khateeb and M. Usman, “Efficient Heart Disease Prediction System using K-Nearest Neighbor Classification Technique,” pp. 21–26, 2018.

E. K. Hashi, M. S. Uz Zaman, and M. R. Hasan, “An expert clinical decision support system to predict disease using classification techniques,” ECCE 2017 - Int. Conf. Electr. Comput. Commun. Eng., pp. 396–400, 2017.

I. K. A. Enriko, M. Suryanegara, and D. Gunawan, “Heart disease prediction system using k-Nearest neighbor algorithm with simplified patient’s health parameters,” J. Telecommun. Electron. Comput. Eng., vol. 8, no. 12, pp. 59–65, 2016.

Z. Rustam, D. A. Utami, R. Hidayat, J. Pandelaki, and W. A. Nugroho, “Hybrid Preprocessing Method for Support Vector Machine for Classification of Imbalanced Cerebral Infarction Datasets,” vol. 9, no. 2, pp. 685–691, 2019.

K. Shankar, S. K. Lakshmanaprabu, D. Gupta, A. Maseleno, and V. H. C. de Albuquerque, “Optimal feature-based multi-kernel SVM approach for thyroid disease classification,” J. Supercomput., pp. 1–16, 2018.

S. Vijayarani and S. Dhayanand, “Kidney Disease Prediction Using SVM and ANN Algorithms,” Int. J. Comput. Bus. Res. ISSN (Online, vol. 6, no. 2, pp. 2229–6166, 2015.

J. Thomas and R. T. Princy, “Human heart disease prediction system using data mining techniques,” Proc. IEEE Int. Conf. Circuit, Power Comput. Technol. ICCPCT 2016, pp. 1–5, 2016.

M. Sultana, A. Haider, and M. S. Uddin, “Analysis of data mining techniques for heart disease prediction,” 2016 3rd Int. Conf. Electr. Eng. Inf. Commun. Technol. iCEEiCT 2016, 2017.

T. R. Baitharu and S. K. Pani, “Analysis of Data Mining Techniques for Healthcare Decision Support System Using Liver Disorder Dataset,” Procedia Comput. Sci., vol. 85, no. Cms, pp. 862–870, 2016.

D. Paul, R. Su, M. Romain, V. Sébastien, V. Pierre, and G. Isabelle, “Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier,” Comput. Med. Imaging Graph., vol. 60, pp. 42–49, 2017.

M. Kumar and M. Kumar, “International Journal of Computer Science and Mobile Computing Prediction of Chronic Kidney Disease Using Random Forest Machine Learning Algorithm,” Int. J. Comput. Sci. Mob. Comput., vol. 5, no. 2, pp. 24–33, 2016.

W. Husain, L. K. L. K. Xin, N. Abdul Rashid, N. Jothi, N. A. Rashid, and N. Jothi, “Predicting Generalized Anxiety Disorder Among Women Using Random Forest Approach,” in 2016 3rd International Conference On Computer And Information Sciences (ICCOINS), 2016, pp. 42–47.

M. Lichman, K. Bache, and M. Lichman, “UCI machine learning repository,” 2013. [Online]. Available: http://archive.ics.uci.edu/ml.

T. Santhanam and M. S. Padmavathi, “Application of K-Means and Genetic Algorithms for Dimension Reduction by Integrating SVM for Diabetes Diagnosis,” Procedia Comput. Sci., vol. 47, pp. 76–83, 2015.

S. Maldonado, R. Weber, and F. Famili, “Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines,” Inf. Sci. (Ny)., vol. 286, pp. 228–246, 2014.

J. Kamruzzaman, S. Lim, I. Gondal, and R. Begg, “Gene selection and classification of human lymphoma from microarray data,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3745 LNBI, pp. 379–390, 2005.

W. S. Noble, “Support vector machine applications in computational biology,” Kernel Methods Comput. Biol., 2004.

D. Zhang, W. Zuo, D. Zhang, and H. Zhang, “Time series classification using support vector machine with Gaussian elastic metric kernel,” in Proceedings - International Conference on Pattern Recognition, 2010.

R. Caruana and a. Niculescu-Mizil, “Data mining in metric space: an empirical analysis of supervised learning performance criteria,” Proc. tenth ACM SIGKDD Int. Conf. Knowl. Discov. data Min., pp. 69–78, 2004.

C.-L. Huang, H.-C. Liao, and M.-C. Chen, “Prediction model building and feature selection with support vector machines in breast cancer diagnosis,” Expert Syst. Appl., vol. 34, no. 1, pp. 578–587, 2008.

M. F. Akay, “Support vector machines combined with feature selection for breast cancer diagnosis,” Expert Syst. Appl., vol. 36, no. 2 PART 2, pp. 3240–3247, 2009.

DOI: http://dx.doi.org/10.18517/ijaseit.9.6.10226


  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development