Machine Learning Model for Sentiment Analysis of COVID-19 Tweets

Malak Aljabri; Sumayh S. Aljameel; Irfan Ullah Khan; Nida Aslam; Sara Mhd. Bachar Charouf; Norah Alzahrani

doi:10.18517/ijaseit.12.3.14724

Machine Learning Model for Sentiment Analysis of COVID-19 Tweets

Malak Aljabri, Sumayh S. Aljameel, Irfan Ullah Khan, Nida Aslam, Sara Mhd. Bachar Charouf, Norah Alzahrani

Abstract

Covid-19 pandemic presents unprecedented challenges and enormously affects different aspects of individuals' lives worldwide. The implementation of different prevention measures, the economic and social disruption, and the significant rise in the mortality rate greatly affect the peoples' spectrum of emotions. Sentiment analysis, an important branch of artificial intelligence, uses machine learning techniques to understand public perspectives and gain more insights into how they think and feel. During the pandemic, sentiment analysis increasingly contributes towards making appropriate decisions. This research aims to analyze the public sentiment related to Covid-19 by exploring social perceptions shared on Twitter, one of the most ubiquitous social networks. This goal was achieved by building a machine learning model using a dataset of Covid-19 related English tweets. Different combinations of machine learning classification algorithms (Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB)) and feature extraction techniques (Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram) were built and applied to the dataset for binary (positive, negative) and ternary (positive, negative, and neutral) classifications. A comparative study for the performance of the different models was then conducted, and the results concluded that XGB classification algorithm with unigram and bigram for binary classification achieved the highest accuracy of 90%. This sentiment analysis model can assist countries and governments in measuring the impact of the pandemic and the applied prevention measures on people's emotional and mental health and take early actions to reduce their impact or prevent them from becoming severe cases.

Keywords

Sentiment analysis; Twitter; Covid-19; machine learning.

Full Text:

PDF

References

â€œCOVID-19 Coronavirus Pandemicâ€. Accessed on Mar. 02, 2021, [Online]. Available: https://www.worldometers.info/coronavirus/.

â€œCumulative Casesâ€. Accessed on Mar. 02, 2021, [Online]. Available: https://coronavirus.jhu.edu/data/cumulative-cases.

B. Semo and S. M. Frissa, â€œThe Mental Health Impact of the COVID-19 Pandemic: Implications for Sub-Saharan Africa,â€ Psychology Research and Behavior Management, vol. 13, pp. 713â€“720, Sep. 2020, Accessed on Mar. 02, 2021, DOI: 10.2147/PRBM.S264286.

T. Tanaka and S. Okamoto, â€œIncrease in suicide following an initial decline during the COVID-19 pandemic in Japan,â€ Nature Human Behaviour, no. 5, pp. 229â€“238, 2021, Accessed on Mar. 02, 2021, DOI:10.1038/s41562-020-01042-z.

R. Wagh and P. Punde, â€œSurvey on Sentiment Analysis using Twitter Dataset,â€ Presented at Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 208â€“211. DOI: 10.1109/ICECA.2018.8474783.

T. Beysolow II,â€ What Is Natural Language Processing?â€ in Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing ,1st ed. New York, NY, USA, Apress 2018, pp.1-12, [Online]. Available: https://doi.org/10.1007/978-1-4842-3733-5_1

S. Dubey et al., â€œPsychosocial impact of COVID-19,â€ Diabetes and Metabolic Syndrome: Clinical Research and Reviews, vol. 14, no. 5, pp. 779â€“788, 2020, Accessed on Mar. 02, 2021, DOI: 10.1016/j.dsx.2020.05.035.

M. Cinelli et al., â€œThe COVID-19 social media infodemic,â€ Scientific Reports, vol. 10, no. 1, pp. 1â€“10, 2020, Accessed on Mar. 02, 2021, DOI:10.1038/s41598-020-73510-5.

A. C. Sanders et al., â€œUnmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse,â€ medRxiv, 2020, Accessed on Mar. 02, 2021, DOI:10.1101/2020.08.28.20183863.

R. Chandrasekaran, V. Mehta, T. Valkunde, and E. Moustakas, â€œTopics, Trends, and Sentiments of Tweets about the COVID-19 Pandemic: Temporal Infoveillance Study,â€ Journal of Medical Internet Research, vol. 22, no. 10, pp. 1â€“12, 2020, Accessed on Mar. 02, 2021, DOI: 10.2196/22624.

X. Xiang et al., â€œModern Senicide in the Face of a Pandemic: An Examination of Public Discourse and Sentiment About Older Adults and COVID-19 Using Machine Learning,â€ The Journals of Gerontology: Series B, vol. XX, no. Xx, pp. 1â€“11, 2020, Accessed on Mar. 02, 2021, DOI: 10.1093/geronb/gbaa128.

S. Boon-Itt and Y. Skunkan, â€œPublic perception of the COVID-19 pandemic on twitter: Sentiment analysis and topic modeling study,â€ JMIR Public Health and Surveillance, vol. 6, no. 4, pp. 1â€“17, 2020, Accessed on Mar. 02, 2021, DOI:10.2196/21978.

J. Xue, J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu, â€œPublic discourse and sentiment during the COVID 19 pandemic: Using latent dirichlet allocation for topic modeling on twitter,â€ Plos one, vol. 15, no. 9 September, pp. 1â€“12, 2020, Accessed on Mar. 02, 2021, DOI: 10.1371/journal.pone.0239441.

J. Xue et al., â€œTwitter discussions and emotions about the COVID-19 pandemic: Machine learning approach,â€ Journal of Medical Internet Research, vol. 22, no. 11, pp. 1â€“14, 2020, Accessed on Mar. 02, 2021, DOI:10.2196/20550.

R. J. Medford, S. N. Saleh, A. Sumarsono, T. M. Perl, and C. U. Lehmann, â€œAn â€˜Infodemicâ€™: Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak,â€ Open Forum Infectious Diseases, vol. 7, no. 7, 2020, Accessed on Mar. 02, 2021, DOI: 10.1093/ofid/ofaa258

L. Nemes and A. Kiss, â€œSocial media sentiment analysis based on COVID-19,â€ Journal of Information and Telecommunication, pp. 1â€“15, 2020, Accessed on Mar. 02, 2021, DOI:10.1080/24751839.2020.1790793.

K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and A. E. Hassanien, â€œSentiment Analysis of COVID-19 tweets by Deep Learning Classifiersâ€”A study to show how popularity is affecting accuracy in social media,â€ Applied Soft Computing Journal, vol. 97, 2020, Accessed on Mar. 02, 2021, DOI: 10.1016/j.asoc.2020.106754.

M. Aljabri et al., â€œSentiment analysis of arabic tweets regarding distance learning in saudi arabia during the covid-19 pandemic,â€ Sensors, vol. 21, no. 16, 2021, Accessed on Mar. 02, 2021, DOI: 10.3390/s21165431.

J. Samuel, G. G. M. N. Ali, M. M. Rahman, E. Esawi, and Y. Samuel, â€œCOVID-19 public sentiment insights and machine learning for tweets classification,â€ Information (Switzerland), vol. 11, no. 6, pp. 1â€“22, 2020, Accessed on Mar. 02, 2021, DOI: 10.3390/info11060314.

H. Matthias, A. Kruspe, and I. Kuhn, â€œCross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic", arXiv preprint arXiv:2008, July.21,2020,[Online]Available: https://arxiv.org/pdf/2008.12172.pdf

A. Miglani, â€œCoronavirus tweets NLP - Text Classification | Kaggle.â€ Accessed on Mar. 02, 2021, [Online] Available: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification?select=Corona_NLP_test.csv.

â€œJamSpellâ€. Accessed on Mar. 02, 2021, [Online] Available: https://github.com/bakwc/JamSpell. Accessed on Mar. 02, 2021.

S. S. M. M. Rahman, K. B. M. B. Biplob, M. H. Rahman, K. Sarker, and T. Islam, â€œAn investigation and evaluation of N-gram, TF-IDF and ensemble methods in sentiment classification,â€ in In: Bhuiyan T., Rahman M.M., Ali M.A. (eds) Cyber Security and Computer Science. ICONCS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 325. Springer, Cham, pp. 391â€“402, 2020, [Online]. Available: DOI: 10.1007/978-3-030-52856-0_31.

S. Qaiser and R. Ali, â€œText Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,â€ International Journal of Computer Applications, vol. 181, no. 1, pp. 25â€“29, 2018, Accessed on Mar. 02, 2021, DOI: 10.5120/ijca2018917395.

Z. Li, Q. Zhang, Y. Wang, and S. Wang, â€œSocial media rumor refuter feature analysis and crowd identification based on XG Boost and NLP,â€ Applied Sciences (Switzerland), vol. 10, no. 14, 2020, Accessed on Mar. 02, 2021, DOI: 10.3390/app10144711.

Y. Jia et al., â€œGNSS-R soil moisture retrieval based on a XGboost machine learning aided method: Performance and validation,â€ Remote Sensing, vol. 11, no. 14, pp. 1â€“25, 2019, Accessed on Mar. 02, 2021, DOI:10.3390/rs11141655.

L. Zhang and C. Zhan.(2017, April) â€œMachine Learning in Rock Facies Classification: An Application of XGBoost,â€presented at International Geophysical Conference. Accessed on Mar. 02, 2021, [Online]. Available: https://doi.org/10.1190/IGC2017-351.

V. Jakkula, â€œTutorial on Support Vector Machine (SVM),â€ School of EECS, Washington State University, pp. 1â€“13, 2011. Accessed on Mar. 02, 2021, [Online]. Available: https://course.ccs.neu.edu/cs5100f11/resources/jakkula.pdf

M. Awad and R. Khanna, â€œSupport Vector Machines for Classificationâ€ in Efficient learning machines: Theories, concepts, and applications for engineers and system designers, Apress, Berkeley, CA, 2015, pp. 67-80, [Online]. Available: https://doi.org/10.1007/978-1-4302-5990-9_4

Z. Xiong, X. Sun, J. Sang, and X. Wei, â€œModify the Accuracy of MODIS PWV in China : A Performance Comparison Using Random Forest , Generalized Regression Neural Network and Back-Propagation Neural Network,â€ Remote Sensing, vol. 13, no. 11, pp. 1â€“18, 2021, Accessed on Mar. 02, 2021, DOI: 10.3390/rs13112215.

P. Probst, M. N. Wright, and A. L. Boulesteix, â€œHyperparameters and tuning strategies for random forest,â€ Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3, 2019, Accessed on Mar. 02, 2021, DOI: 10.1002/widm.1301.

A. Burkov, â€œModel Performance Assessmentâ€ in The Hundred-Page Machine Learning Book, Illustrate. Andriy Burkov, 2019.

C. Goutte and E. Gaussier, â€œA Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,â€. In: Losada D.E., FernÃ¡ndez-Luna J.M. (eds) Advances in Information Retrieval, ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg, 2005, pp. 345-359, [Online]. Available: https://doi.org/10.1007/978-3-540-31865-1_25

M. Silva et al., â€œPredicting misinformation and engagement in COVID-19 twitter discourse in the first months of the outbreak,â€ arXiv, vol. 37, no. 4, 2020.

DOI: http://dx.doi.org/10.18517/ijaseit.12.3.14724

Refbacks

There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development

International Journal on Advanced Science, Engineering and Information Technology

Machine Learning Model for Sentiment Analysis of COVID-19 Tweets

Abstract

Keywords

Full Text:

References

Refbacks