A Dataset-Driven Parameter Tuning Approach for Enhanced K-Nearest Neighbour Algorithm Performance

Udoinyang G. Inyang, Funebi F. Ijebu, Francis B. Osang, Aderenle A. Afoluronsho, Samuel S. Udoh, Imo J. Eyoh

Abstract


The number of Neighbours (k) and distance measure (DM) are widely modified for improved kNN performance. This work investigates the joint effect of these parameters in conjunction with dataset characteristics (DC) on kNN performance. Euclidean; Chebychev; Manhattan; Minkowski; and Filtered distances, eleven k values, and four DC, were systematically selected for the parameter tuning experiments. Each experiment had 20 iterations, 10-fold cross-validation method and thirty-three randomly selected datasets from the UCI repository. From the results, the average root mean squared error of kNN is significantly affected by the type of task (p<0.05, 14.53% variability effect), while DC collectively caused 74.54% change in mean RMSE values, k and DM accumulated the least effect of 25.4%. The interaction effect of tuning k, DC, and DM resulted in DM='Minkowski', 3≤k≤20, 7≤target dimension ≤9, and sample size (SS) >9000, as optimal performance pattern for classification tasks. For regression problems, the experimental configuration should be7000≤SS≤9000; 4≤number of attributes ≤6, and DM = 'Filtered'. The type of task performed is the most influential kNN performance determinant, followed by DM. The variation in kNN accuracy resulting from changes in k values only occurs by chance, as it does not depict any consistent pattern, while its joint effect of k value with other parameters yielded a statistically insignificant change in mean accuracy (p>0.5). As further work, the discovered patterns would serve as the standard reference for comparative analytics of kNN performance with other classification and regression algorithms.

Keywords


kNN; kNN performance; k-Neighbours; parallel analysis; principal component analysis; kNN parameter tuning.

Full Text:

PDF

References


S. Yunsheng, Jiye L, Jing L and Xingwang Z, “An efficient instance selection algorithm for k nearest Neighbour regression†Neurocomputing Vol. 251 pp. 26-34, 2017

U. G. Inyang, I. J. Eyoh, C. O. Nwokoro, F. B. Osang and A. A. Afolorunso “Comparative Analytics of Classifiers on Resampled Datasets for Pregnancy Outcome Prediction†International Journal of Advanced Computer Science and Applications, Vol 11 no. 6 pp 494-504, DOI:10.14569/IJACSA.2020.0110662, Jan. 2020

K. U. Syaliman, E. B., Nababan, and O.S, “Sitompul Improving the Accuracy of K-Nearest Neighbour Using Local mean Based and Distance Weight†Journal of Physics: Conf. Series 978 012047. Doi:10.1088/1742-6596/978/1/012047, 2018.

O. Campos, Zimek, A., Sander, J., Campello, R. J., Micenková, B., Schubert, E. and M. E. Houle, “On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study†Data Mining and Knowledge Discovery, Vol. 30 no. 4, 891-927, 2016

K. Maryam, “A method to improve the accuracy of k-nearest Neighbour algorithm†Int. J. of Computer Engineering and Information Technology Vol. 8 no. 6 90-95, 2016

Y. Zhang, Y. Xin, Q. Li, J. Ma, S. Li, L. Xiaodan and L. Weiqi, “Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications†BioMed Eng OnLine. Vol. 16 no. 125, Doi.:10.1186/s12938-017-0416-x, 2017.

Mehmood, Y., Khadam, S., Hameed, K., Riaz, F., & Ghafoor, A. (2010, October). Effects of different data characteristics on classifier's performance. In 2010 6th International Conference on Emerging Technologies (ICET) (pp. 39-44). IEEE.

P. Kalaiyarasi, and J. Suguna, “The Significance of Fine Tuning Parameters in Supervised Machine Learning Techniques for Diabetic Disease Prediction†International Journal of Advanced Science and Technology,28(17),364-375, 2019

A.A. H. Alfeilat, A.B.A. Hassanat, O. Lasassmeh, S. A. Tarawneh, B. M, Alhasanat, E.S. H. Salman and S. B.V Prasath, “Effects of DM Choice on K-Nearest Neighbour Classier Performance†A Review.Big Data, Vol.7 no. 4, Pp 221–248. Doi: 10.1089/big.2018.0175, 2019

B., Prasath, Alfeilat, H. A. A., Hassanat, A., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., and H. S. E. Salman,. “Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbour Classifier : A Review†arXiv preprint arXiv:1708.04321, 2017

Ali N, Neagu D. and Trundle P (2019). Evaluation of K‑Nearest Neighbour Classifier Performance for Heterogeneous Data Sets. SN Applied Sciences, Pp.:1-15, Doi: 10.1007/s42452-019-1356-9

Dua, D. and C. Graff, ‘UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]’. Irvine, CA: University of California, School of Information and Computer Science, 2019

P. Tamrakar, Roy S.S., Satapathy B., and Ibrahim S.P.S Integration of Lazy Learning Associative Classification with kNN Algorithm. International Conference on Vision Toward International Conference on Vision Toward Emerging Trends in Communication and Networking, pp.: 1 – 4. Doi.: 10.1109/V.TECON.2019.8899415, 2019.

Wang H., Xu P., and Zhao J Improved kNN Algorithm Based on Preprocessing of Center in Smart Cities. Complexity. Article ID: 5524388. Pp.: 1–10. Doi.: 10.1155/2021/5524388, 2021

S. Kang “K-Nearest Neighbour Learning with Graph Neural Networks†Mathematics, Vol. 9 no. 830, pp1–12, Doi.:10.3390/Math9080830, 2021

R. Kadry and O. Ismael ‘A New Hybrid kNN Classification Approach Based on Particle Swarm Optimization’ International Journal of Advanced Computer Science and Applications, Vol. 11(11), Pp 291– 296, 2020

K. Saetern and N. Eiamkanitchat “An Ensemble k-nearest Neighbour with Neuro-Fuzzy Method for Classification’ Advances in Intelligent Systems and Computing. Vol. 265. Pp.: 43–51, 2014

Hou W. Li D, Xu C, Zhang H and Li T “An Advanced k-nearest Neighbour Classification Algorithm Based on KD-Tree. In Proceedings of IEEE International Conference of Safety Produce†Informatization (IICSPI), pp.902-905, Doi: 10.1109/IICSPI.2018.8690508, 2018

Salvador–Meneses J, Ruiz–Chavez Z and Garcia–Rodriguez J Compressed kNN: K-Nearest Neighbours withData Compression. Entropy, Vol.21 no 234, Doi: 10.3390/e21030234, 2019

Md Isa E.N., Amir A, Ilyas Z.M. and Razalli S.M. “The Performance Analysis of K-Nearest Neighbours (K-NN) Algorithm for Motor Imagery Classification Based on EEG Signal†MATEC Web of Conferences 140, 01024,DOI: 10.1051/matecconf/201714001024, 2017

L.Y, Hu M.W. Huang, S.W. Ke and C. F. Tsai “The distance function effect on k-nearest Neighbour classification for medical datasets†SpringerPlus, Vol. 5 no. 1304, Doi: 10.1186/s40064-016-2941-7, 2016

Parvin H, Alizadeh H, and Minati B A Modification on K-Nearest Neighbour Classifier, Global Journal of Computer Science and Technology, Vol. 10 no 14, pp 37 – 41, 2010

A. Pulungan, M. Zarlis, and S. Suwilo, “Analysis of Braycurtis, Canberra and Euclidean Distance in KNN Algorithm†Journal Publication and Informatics Engineering Research, Vol. 4 no 1, pp. 74 –77, 2019

R. Ehsani and F. Drablos, “Robust DMs for kNN Classification of Cancer Data†Cancer Informatics, Vol. 19, pp.1–9. Doi.:10.1177/1176935120965542, 2020

P. Mulak and N. Talhar, “Analysis of DMs Using K-NearestNeighbour Algorithm on KDD Dataset, International Journal of Science and Research, Vol. 4 no 7, pp. 2101-2104, 2015

Chomboon K, Chujai P, Teerarassammee P, Kerdprasop K, and N Kerdprasop, “An Empirical Study of Distance Metrics for K-Nearest Neighbour Algorithm. In Proceedings of the 3rd International Conference on Industrial Application Engineering (ICIAE 2015), The Institute of Industrial Application Engineers, Japan. pp. 280 –285, 2015

R. Todeschini, D. Ballabio, V. Consonni and F. Grisoni “A New Concept of Higher-order Similarity and the Role of distance/similarity measures in local classification methods†Chemometrics and Intelligent Laboratory Systems, Vol.157 pp 50-57, 2016

Cheng D, Zhang S, Deng Z, Zhu Y and Zong M (2014). kNN Algorithm with Data-Driven k Value. In: X. Luo, J.X. Yu, and Z. Li (Eds.): ADMA, Springer International Publishing Switzerland, Pp.: 499–512.

M. J. Brown “Predicting Math Test Scores Using K-Nearest Neighbour, IEEE Integrated STEM Conference (ISEC), Princeton, New Jersey. pp.: 104–106. Doi:10.1109/ISECon.2017.7910221, 2017

M.C. Ma, S.W. Yang, and W. B. Cheng. “How the Parameters of K-nearest Neighbour Algorithm impact on the Best Classification Accuracy: In Case of Parkinson Dataset†Journal of Applied Science, Vol. 14 No 2, Pp171 – 176, 2014

L. Abedallah and I. Shimshoni “K Nearest Neighbour using Ensemble Clustering†In: El Akkaoui Z, Mazon J-N, Vaisman A, Zimanyi E, Cuzzocrea A and Dayal U (Eds), Proceedings of 14th International Conference on:Data Warehousing and Knowledge Discovery,DaWaK, Vienna, Australia, September 3 - 6, pp. 265–278, 2012

L. Baoli, L. Qin and Y. Shiwen “An Adaptive k Nearest Neighbour Text Categorization Strategy. ACM Transaction on Asian Language Information Processing, Vol.3 no 4, pp 215–226, 2004

A. G. Jivani, “The Novel k nearest Neighbour Algorithm†International Conference on Computer Communication and Informatics, pp. 1–4, Doi: 10.1109/ICCCI.2013.6466287, 2013.

S. Raschka “STAT 479: Machine Learning Lecture Notes, Department of Statistics, University of Wisconsin –Madison. Accessed: 25 April, 2021. Available at: http://stat.wisc.edu/_sraschka/teaching/stat479-fs2018/, 2018

I. Paryudi,. What Affects K Value Selection In K-Nearest Neighbor. International Journal of Scientific & Technology Research, Vol. 8, 86-92, 2019

Oreski, D., Oreski, S., and Klicek, B. “Effects of dataset characteristics on the performance of feature selection techniques†Applied Soft Computing, 52, 109-119, 2017

O. Kwon and M. J. Sim “Effects of Data set Features on the Performances of Classification Algorithms†Expert Systems with Applications, Vol. 40. Pp.: 1847 – 1857 (2013).

Y. Peng, G. Wang, G. Kou and Y. Shi, “An Empirical Study of Classification Algorithm Evaluation for Financial Risk Prediction†Applied Soft Computing, Vol. 11, Pp.: 2906–2915, 2011.

C. Lamina, G. Sturm, Kollerits, B., and F. Kronenberg, “Visualizing interaction effects: a proposal for presentation and interpretation†Journal of clinical epidemiology, Vol. 65 no 8, pp. 855-862, 2012

Prihandoko P, Bertalya B, and Setyowati L (2020). City Health Prediction Model Using Random Forest Classification Method. 5th International Conference on Informatics and Computing (ICIC) 2020, Pp. 1-5. Doi.:10.1109/ICIC50835.2020.9288542

M. E. Ekpenyong, and U. G. Inyang, ‘Unsupervised mining of under-resourced speech corpora for tone features classification’ 2016 International Joint Conference on Neural Networks (IJCNN), 2374-2381, 2016

U. G. Inyang and O. C. Akinyokun, “A hybrid knowledge discovery system for oil spillage risks pattern classification†Artificial. Intelligence Research, Vol. 3 no 4, pp 77-86, 2014

U. G. Inyang, E. E. Akpan, O. C. Akinyokun, "A Hybrid Machine Learning Approach for Flood Risk Assessment and Classification." International Journal of Computational Intelligence and Applications, Vol. 19 no 2, 2050012, 2020

U. G. Inyang, U. A. Umoh, I. C. Nnaemeka, and S. A. Robinson, “Unsupervised Characterization and Visualization of Students' Academic Performance Features†Computer and Information Science, Vol. 12 no 2, pp103-116, 2019

P. R. Peres-Neto, Jackson, D. A., and K. M. Somers, “How many principal components? Stopping rules for determining the number of non-trivial axes revisited†Computational Statistics and Data Analysis, Vol. 49 No. 4, pp974-997, 2005

J. C. Hayton, D. G., Allen, and V. Scarpello, “Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis†Organizational research methods, Vol. 7 no 2, pp191-205, 2004

Caron, P. O. “A Monte Carlo examination of the broken-stick distribution to identify components to retain in principal component analysis. Journal of Statistical Computation and Simulation, Vol 86 no 12, pp2405-2410, 2016

M. M. Kumbure and P. Luukka, “A generalized fuzzy k-nearest Neighbour regression model based on Minkowski distance†Springer Granular Computing, pp1-15, Doi:10.1007/s41066-021-00288-w, 2021




DOI: http://dx.doi.org/10.18517/ijaseit.13.1.16706

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development