Effects of Oversampling Smote and Spectral Transformations in the Classification of Mango Cultivars Using Near-Infrared Spectroscopy

Ali Khumaidi, Ridwan Raafi'udin


Near-Infrared spectroscopy (NIR) is a non-destructive analytical technique that can provide chemical and structural information on samples in a speedy and accurate time. NIR has a wavelength of 750-2500 nm. However, the absorbance bands of the NIR spectrum are often broad, non-specific, and overlapping. NIR spectrum analysis requires a multivariate method which is very subjective to noise arising from instrumentation. There is no standard protocol in modeling for classification and prediction using NIR spectra. Several models have been developed with and without pre-processing techniques. The SMOTE technique can improve the model to predict all class responses accurately. This research contributes to creating a multiclass classification model for grouping mango cultivars by finding the best pre-processing technique and using SMOTE oversampling. The results of the four test scenarios on the model's performance built using the Support Vector Machine (SVM) that the best model is obtained using spectral transformations with LSNV and CLIP operations with 100% accuracy, precision, and recall values. The Decision Tree (DT) has the performance results in 100% model was obtained by using spectral transformation with LSNV, CLIP and SAVGOL operations with parameters {'deriv_order': 0,1, 2, 'filter_win': 11, 13, 'poly_order': 3}. Using SMOTE has better accuracy than without pre-processing, with an accuracy of 92% on SVM and 94% on DT. In comparison, the combination of SMOTE and Spectral Transformation gives classification results for SVM and DT with the same accuracy of 96%, better than using SMOTE only.


Classification; cultivar mango; near-infrared; spectral transformation; oversampling SMOTE.

Full Text:



P. Osinenko et al., “Application of non-destructive sensors and big data analysis to predict physiological storage disorders and fruit firmness in ‘Braeburn’ apples,” Comput. Electron. Agric., vol. 183, p. 106015, Apr. 2021, doi: 10.1016/j.compag.2021.106015.

M. Arunkumar, A. Rajendran, S. Gunasri, M. Kowsalya, and C. K. Krithika, “Non-destructive fruit maturity detection methodology - A review,” Mater. Today Proc., Mar. 2021, doi: 10.1016/j.matpr.2020.12.1094.

B. Nugraha, P. Verboven, S. Janssen, Z. Wang, and B. M. Nicolaï, “Non-destructive porosity mapping of fruit and vegetables using X-ray CT,” Postharvest Biol. Technol., vol. 150, pp. 80–88, Apr. 2019, doi: 10.1016/j.postharvbio.2018.12.016.

B. M. Nicolaï et al., “Non-destructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review,” Postharvest Biol. Technol., vol. 46, no. 2, pp. 99–118, Nov. 2007, doi: 10.1016/j.postharvbio.2007.06.024.

A. Ibrahim, N. El-Bialee, M. Saad, and E. Romano, “Non-Destructive Quality Inspection of Potato Tubers Using Automated Vision System,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 10, no. 6, p. 2419, Dec. 2020, doi: 10.18517/ijaseit.10.6.13079.

C. Ding, D. Wang, Z. Feng, W. Li, and D. Cui, “Integration of vibration and optical techniques for watermelon firmness assessment,” Comput. Electron. Agric., vol. 187, p. 106307, Aug. 2021, doi: 10.1016/j.compag.2021.106307.

J. M. S. Netto, F. A. Honorato, P. M. Azoubel, L. E. Kurozawa, and D. F. Barbin, “Evaluation of melon drying using hyperspectral imaging technique in the near infrared region,” LWT, vol. 143, p. 111092, May 2021, doi: 10.1016/j.lwt.2021.111092.

F. D. Anggraeni, N. Khuriyati, M. A. F. Falah, H. Nishina, K. Takayama, and N. Takahashi, “Non-destructive Measurement of Lycopene Content in High Soluble Solids Stored Tomato (Solanum Lycopersicum Mill. cv Rinka 409),” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 10, no. 6, p. 2567, Dec. 2020, doi: 10.18517/ijaseit.10.6.9478.

R. Hayati, A. A. Munawar, and F. Fachruddin, “Enhanced near infrared spectral data to improve prediction accuracy in determining quality parameters of intact mango,” Data Br., vol. 30, p. 105571, Jun. 2020, doi: 10.1016/j.dib.2020.105571.

P. Mishra, E. Woltering, and N. El Harchioui, “Improved prediction of ‘Kent’ mango firmness during ripening by near-infrared spectroscopy supported by interval partial least square regression,” Infrared Phys. Technol., vol. 110, p. 103459, Nov. 2020, doi: 10.1016/j.infrared.2020.103459.

G. Ren, Y. Liu, J. Ning, and Z. Zhang, “Assessing black tea quality based on visible–near infrared spectra and kernel-based methods,” J. Food Compos. Anal., vol. 98, p. 103810, May 2021, doi: 10.1016/j.jfca.2021.103810.

J. Li, H. Zhang, B. Zhan, Y. Zhang, R. Li, and J. Li, “Non-destructive firmness measurement of the multiple cultivars of pears by Vis-NIR spectroscopy coupled with multivariate calibration analysis and MC-UVE-SPA method,” Infrared Phys. Technol., vol. 104, p. 103154, Jan. 2020, doi: 10.1016/j.infrared.2019.103154.

C. Liu, S. X. Yang, X. Li, L. Xu, and L. Deng, “Noise level penalizing robust Gaussian process regression for NIR spectroscopy quantitative analysis,” Chemom. Intell. Lab. Syst., vol. 201, p. 104014, Jun. 2020, doi: 10.1016/j.chemolab.2020.104014.

J. Torniainen, I. O. Afara, M. Prakash, J. K. Sarin, L. Stenroth, and J. Töyräs, “Open-source python module for automated pre-processing of near infrared spectroscopic data,” Anal. Chim. Acta, vol. 1108, pp. 1–9, Apr. 2020, doi: 10.1016/j.aca.2020.02.030.

Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of Imbalanced Data: A Review,” Int. J. Pattern Recognit. Artif. Intell., vol. 23, no. 04, pp. 687–719, Jun. 2009, doi: 10.1142/S0218001409007326.

N. S. Sani, M. Abdul Rahman, A. Abu Bakar, S. Sahran, and H. Mohd Sarim, “Machine Learning Approach for Bottom 40 Percent Households (B40) Poverty Classification,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 4–2, p. 1698, Sep. 2018, doi: 10.18517/ijaseit.8.4-2.6829.

J. Jang, Y. Kim, K. Choi, and S. Suh, “Sequential targeting: A continual learning approach for data imbalance in text classification,” Expert Syst. Appl., vol. 179, p. 115067, Oct. 2021, doi: 10.1016/j.eswa.2021.115067.

A. A. Munawar, Kusumiyati, and D. Wahyuni, “Near infrared spectroscopic data for rapid and simultaneous prediction of quality attributes in intact mango fruits,” Data Br., vol. 27, p. 104789, Dec. 2019, doi: 10.1016/j.dib.2019.104789.

Mishra, J. M. Roger, D. N. Rutledge, and E. Woltering, “SPORT pre-processing can improve near-infrared quality prediction models for fresh fruits and agro-materials,” Postharvest Biol. Technol., vol. 168, p. 111271, Oct. 2020, doi: 10.1016/j.postharvbio.2020.111271..

D. D. Silalahi, H. Midi, J. Arasan, M. S. Mustafa, and J.-P. Caliman, “Robust generalized multiplicative scatter correction algorithm on pretreatment of near infrared spectral data,” Vib. Spectrosc., vol. 97, pp. 55–65, Jul. 2018, doi: 10.1016/j.vibspec.2018.05.002.

B. Lu et al., “Quantitative NIR spectroscopy determination of coco-peat substrate moisture content: Effect of particle size and non-uniformity,” Infrared Phys. Technol., vol. 111, p. 103482, Dec. 2020, doi: 10.1016/j.infrared.2020.103482.

Q. Guo, W. Wu, and D. . Massart, “The robust normal variate transform for pattern recognition with near-infrared data,” Anal. Chim. Acta, vol. 382, no. 1–2, pp. 87–103, Feb. 1999, doi: 10.1016/S0003-2670(98)00737-5.

T. Pan, J. Zhao, W. Wu, and J. Yang, “Learning imbalanced datasets based on SMOTE and Gaussian distribution,” Inf. Sci. (Ny)., vol. 512, pp. 1214–1233, Feb. 2020, doi: 10.1016/j.ins.2019.10.048.

Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., Feb. 2021, doi: 10.1016/j.jksuci.2021.01.014.

M. Sinambela, M. Situmorang, K. Tarigan, S. Humaidi, and M. Sirait, “Waveforms Classification of Northern Sumatera Earthquakes for New Mini Region Stations Using Support Vector Machine,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 11, no. 2, p. 489, Apr. 2021, doi: 10.18517/ijaseit.11.2.12503.

V. Vapnik and R. Izmailov, “Reinforced SVM method and memorization mechanisms,” Pattern Recognit., vol. 119, p. 108018, Nov. 2021, doi: 10.1016/j.patcog.2021.108018.

I. Hasanah, E. Purwanti, and P. Widiyanti, “Design and Implementation of an Early Screening Application for Dengue Fever Patients Using Android-Based Decision Tree C4.5 Method,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 10, no. 6, p. 2237, Dec. 2020, doi: 10.18517/ijaseit.10.6.5771.

J. P. Pinder, “Decision Trees,” in Introduction to Business Analytics using Simulation, Elsevier, 2017, pp. 47–69.

P. Arumugam and P. Jose, “Efficient Decision Tree Based Data Selection and Support Vector Machine Classification,” Mater. Today Proc., vol. 5, no. 1, pp. 1679–1685, 2018, doi: 10.1016/j.matpr.2017.11.263.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf. Sci. (Ny)., vol. 507, pp. 772–794, Jan. 2020, doi: 10.1016/j.ins.2019.06.064.

DOI: http://dx.doi.org/10.18517/ijaseit.12.3.16001


  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development