Indonesian Text Classification using Back Propagation and Sastrawi Stemming Analysis with Information Gain for Selection Feature

Mahendra Dwifebri Purbolaksono, Feddy Dea Reskyadita, - Adiwijaya, Arie Ardiyanti Suryani, Arief Fatchul Huda

Abstract


The second fundamental source of law for Moslems is the Hadith. The Hadith can be used to explain Quranic texts.  However, Hadith still needs to be translated according to each national language to easily understand its meaning [1]. In Indonesia Hadith more usually refers to a special class of relevance to more particular religious concern [1]. Base on that, this research will Classify the translation Hadith Text into three classes: Obligation, Prohibition, and Information. From previous research, the Back Propagation Neural Network (BPNN) showed good performance in classifying hadith text. Therefore, BPNN was used to solve the problem of hadith text classification in this study. However, the dataset has a huge number of varied bag-of-words, which are features that will be used in the classification process. Hence, Information Gain (IG) was utilized to select influential features, and as the sequential process before the classification process. To measure the performance of this system, the Macro F1-Score was used. The F1-Score enables one to observe exactness from precision and completeness from recall. The Macro F1-score is also needed for the performance evaluation of more than two classes.  Based on the experiment conducted, the system was able to classify hadith text using BPNN, IG, and without stemming, yielding the highest F1-score of 84.63%. However, the system performance that included the stemming process yielded an F1-score of 80.92%. This shows that the stemming process could decrease classification performance. This decreasing performance is due to some influential words merging with more noninfluential words.


Keywords


feature selection; information gain; text mining; neural network; classification.

Full Text:

PDF

References


K. A. Aldhlan, A. M. Zeki, A. M. Zeki, and H. A. Alreshidi, “Novel mechanism to improve hadith classifier performance,” in Proceedings - 2012 International Conference on Advanced Computer Science Applications and Technologies, ACSAT 2012, 2013.

H. Aydadenta and Adiwijaya, “On the classification techniques in data mining for microarray data classification,” in Journal of Physics: Conference Series, 2018.

S. Nurcahyo, F. Nhita, and Adiwijaya, “Rainfall prediction in kemayoran Jakarta using hybrid genetic algorithm (GA) and partially connected feedforward neural network (PCFNN),” in 2014 2nd International Conference on Information and Communication Technology, ICoICT 2014, 2014.

F. Harrag, E. El-Qawasmah, and A. M. S. Al-Salman, “Stemming as a feature reduction technique for Arabic text categorization,” in Proceedings of the 10th International Symposium on Programming and Systems, ISPS’ 2011, 2011.

M. N. A.-K., G. K., R. A.-S., S. I. A.-S., and. R. S. A.-M., “Al-Hadith Text Classifier,” J. Appl. Sci., 2009.

M. F. Afianto, Adiwijaya, and S. Al-Faraby, “Text Categorization on Hadith Sahih Al-Bukhari using Random Forest,” in Journal of Physics: Conference Series, 2018.

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation) Using Information Gain and Backpropagation Neural Network,” in Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 2019.

F. Harrag and E. El-Qawasmah, “Neural network for Arabic text classification,” in 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, 2009.

A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection methods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, 2015.

G. Forman, I. Guyon, and A. Elisseeff, “An extensive empirical study of feature selection metrics for text classification,” J. Mach. Learn. Res., 2003.

T. Liu, S. Liu, Z. Chen, and W. Ma, “An evaluation on feature selection for text clustering,” in Proceedings of the Twentieth International Conference on Machine Learning, 2003.

M. D. Purbolaksono, K. C. Widiastuti, M. S. Mubarok, Adiwijaya, and F. A. Ma’ruf, “Implementation of mutual information and bayes theorem for classification microarray data,” in Journal of Physics: Conference Series, 2018.




DOI: http://dx.doi.org/10.18517/ijaseit.10.1.8858

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development