Machine Learning Model for Sentiment Analysis of COVID-19 Tweets

Malak Aljabri, Sumayh S. Aljameel, Irfan Ullah Khan, Nida Aslam, Sara Mhd. Bachar Charouf, Norah Alzahrani

Abstract


Covid-19 pandemic presents unprecedented challenges and enormously affects different aspects of individuals' lives worldwide. The implementation of different prevention measures, the economic and social disruption, and the significant rise in the mortality rate greatly affect the peoples' spectrum of emotions. Sentiment analysis, an important branch of artificial intelligence, uses machine learning techniques to understand public perspectives and gain more insights into how they think and feel. During the pandemic, sentiment analysis increasingly contributes towards making appropriate decisions. This research aims to analyze the public sentiment related to Covid-19 by exploring social perceptions shared on Twitter, one of the most ubiquitous social networks. This goal was achieved by building a machine learning model using a dataset of Covid-19 related English tweets. Different combinations of machine learning classification algorithms (Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB)) and feature extraction techniques (Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram) were built and applied to the dataset for binary (positive, negative) and ternary (positive, negative, and neutral) classifications. A comparative study for the performance of the different models was then conducted, and the results concluded that XGB classification algorithm with unigram and bigram for binary classification achieved the highest accuracy of 90%. This sentiment analysis model can assist countries and governments in measuring the impact of the pandemic and the applied prevention measures on people's emotional and mental health and take early actions to reduce their impact or prevent them from becoming severe cases.

Keywords


Sentiment analysis; Twitter; Covid-19; machine learning.

Full Text:

PDF

References


“COVID-19 Coronavirus Pandemicâ€. Accessed on Mar. 02, 2021, [Online]. Available: https://www.worldometers.info/coronavirus/.

“Cumulative Casesâ€. Accessed on Mar. 02, 2021, [Online]. Available: https://coronavirus.jhu.edu/data/cumulative-cases.

B. Semo and S. M. Frissa, “The Mental Health Impact of the COVID-19 Pandemic: Implications for Sub-Saharan Africa,†Psychology Research and Behavior Management, vol. 13, pp. 713–720, Sep. 2020, Accessed on Mar. 02, 2021, DOI: 10.2147/PRBM.S264286.

T. Tanaka and S. Okamoto, “Increase in suicide following an initial decline during the COVID-19 pandemic in Japan,†Nature Human Behaviour, no. 5, pp. 229–238, 2021, Accessed on Mar. 02, 2021, DOI:10.1038/s41562-020-01042-z.

R. Wagh and P. Punde, “Survey on Sentiment Analysis using Twitter Dataset,†Presented at Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 208–211. DOI: 10.1109/ICECA.2018.8474783.

T. Beysolow II,†What Is Natural Language Processing?†in Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing ,1st ed. New York, NY, USA, Apress 2018, pp.1-12, [Online]. Available: https://doi.org/10.1007/978-1-4842-3733-5_1

S. Dubey et al., “Psychosocial impact of COVID-19,†Diabetes and Metabolic Syndrome: Clinical Research and Reviews, vol. 14, no. 5, pp. 779–788, 2020, Accessed on Mar. 02, 2021, DOI: 10.1016/j.dsx.2020.05.035.

M. Cinelli et al., “The COVID-19 social media infodemic,†Scientific Reports, vol. 10, no. 1, pp. 1–10, 2020, Accessed on Mar. 02, 2021, DOI:10.1038/s41598-020-73510-5.

A. C. Sanders et al., “Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse,†medRxiv, 2020, Accessed on Mar. 02, 2021, DOI:10.1101/2020.08.28.20183863.

R. Chandrasekaran, V. Mehta, T. Valkunde, and E. Moustakas, “Topics, Trends, and Sentiments of Tweets about the COVID-19 Pandemic: Temporal Infoveillance Study,†Journal of Medical Internet Research, vol. 22, no. 10, pp. 1–12, 2020, Accessed on Mar. 02, 2021, DOI: 10.2196/22624.

X. Xiang et al., “Modern Senicide in the Face of a Pandemic: An Examination of Public Discourse and Sentiment About Older Adults and COVID-19 Using Machine Learning,†The Journals of Gerontology: Series B, vol. XX, no. Xx, pp. 1–11, 2020, Accessed on Mar. 02, 2021, DOI: 10.1093/geronb/gbaa128.

S. Boon-Itt and Y. Skunkan, “Public perception of the COVID-19 pandemic on twitter: Sentiment analysis and topic modeling study,†JMIR Public Health and Surveillance, vol. 6, no. 4, pp. 1–17, 2020, Accessed on Mar. 02, 2021, DOI:10.2196/21978.

J. Xue, J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu, “Public discourse and sentiment during the COVID 19 pandemic: Using latent dirichlet allocation for topic modeling on twitter,†Plos one, vol. 15, no. 9 September, pp. 1–12, 2020, Accessed on Mar. 02, 2021, DOI: 10.1371/journal.pone.0239441.

J. Xue et al., “Twitter discussions and emotions about the COVID-19 pandemic: Machine learning approach,†Journal of Medical Internet Research, vol. 22, no. 11, pp. 1–14, 2020, Accessed on Mar. 02, 2021, DOI:10.2196/20550.

R. J. Medford, S. N. Saleh, A. Sumarsono, T. M. Perl, and C. U. Lehmann, “An ‘Infodemic’: Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak,†Open Forum Infectious Diseases, vol. 7, no. 7, 2020, Accessed on Mar. 02, 2021, DOI: 10.1093/ofid/ofaa258

L. Nemes and A. Kiss, “Social media sentiment analysis based on COVID-19,†Journal of Information and Telecommunication, pp. 1–15, 2020, Accessed on Mar. 02, 2021, DOI:10.1080/24751839.2020.1790793.

K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and A. E. Hassanien, “Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media,†Applied Soft Computing Journal, vol. 97, 2020, Accessed on Mar. 02, 2021, DOI: 10.1016/j.asoc.2020.106754.

M. Aljabri et al., “Sentiment analysis of arabic tweets regarding distance learning in saudi arabia during the covid-19 pandemic,†Sensors, vol. 21, no. 16, 2021, Accessed on Mar. 02, 2021, DOI: 10.3390/s21165431.

J. Samuel, G. G. M. N. Ali, M. M. Rahman, E. Esawi, and Y. Samuel, “COVID-19 public sentiment insights and machine learning for tweets classification,†Information (Switzerland), vol. 11, no. 6, pp. 1–22, 2020, Accessed on Mar. 02, 2021, DOI: 10.3390/info11060314.

H. Matthias, A. Kruspe, and I. Kuhn, “Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic", arXiv preprint arXiv:2008, July.21,2020,[Online]Available: https://arxiv.org/pdf/2008.12172.pdf

A. Miglani, “Coronavirus tweets NLP - Text Classification | Kaggle.†Accessed on Mar. 02, 2021, [Online] Available: https://www.kaggle.com/datatattle/covid-19-nlp-text-classification?select=Corona_NLP_test.csv.

“JamSpellâ€. Accessed on Mar. 02, 2021, [Online] Available: https://github.com/bakwc/JamSpell. Accessed on Mar. 02, 2021.

S. S. M. M. Rahman, K. B. M. B. Biplob, M. H. Rahman, K. Sarker, and T. Islam, “An investigation and evaluation of N-gram, TF-IDF and ensemble methods in sentiment classification,†in In: Bhuiyan T., Rahman M.M., Ali M.A. (eds) Cyber Security and Computer Science. ICONCS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 325. Springer, Cham, pp. 391–402, 2020, [Online]. Available: DOI: 10.1007/978-3-030-52856-0_31.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,†International Journal of Computer Applications, vol. 181, no. 1, pp. 25–29, 2018, Accessed on Mar. 02, 2021, DOI: 10.5120/ijca2018917395.

Z. Li, Q. Zhang, Y. Wang, and S. Wang, “Social media rumor refuter feature analysis and crowd identification based on XG Boost and NLP,†Applied Sciences (Switzerland), vol. 10, no. 14, 2020, Accessed on Mar. 02, 2021, DOI: 10.3390/app10144711.

Y. Jia et al., “GNSS-R soil moisture retrieval based on a XGboost machine learning aided method: Performance and validation,†Remote Sensing, vol. 11, no. 14, pp. 1–25, 2019, Accessed on Mar. 02, 2021, DOI:10.3390/rs11141655.

L. Zhang and C. Zhan.(2017, April) “Machine Learning in Rock Facies Classification: An Application of XGBoost,â€presented at International Geophysical Conference. Accessed on Mar. 02, 2021, [Online]. Available: https://doi.org/10.1190/IGC2017-351.

V. Jakkula, “Tutorial on Support Vector Machine (SVM),†School of EECS, Washington State University, pp. 1–13, 2011. Accessed on Mar. 02, 2021, [Online]. Available: https://course.ccs.neu.edu/cs5100f11/resources/jakkula.pdf

M. Awad and R. Khanna, “Support Vector Machines for Classification†in Efficient learning machines: Theories, concepts, and applications for engineers and system designers, Apress, Berkeley, CA, 2015, pp. 67-80, [Online]. Available: https://doi.org/10.1007/978-1-4302-5990-9_4

Z. Xiong, X. Sun, J. Sang, and X. Wei, “Modify the Accuracy of MODIS PWV in China : A Performance Comparison Using Random Forest , Generalized Regression Neural Network and Back-Propagation Neural Network,†Remote Sensing, vol. 13, no. 11, pp. 1–18, 2021, Accessed on Mar. 02, 2021, DOI: 10.3390/rs13112215.

P. Probst, M. N. Wright, and A. L. Boulesteix, “Hyperparameters and tuning strategies for random forest,†Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3, 2019, Accessed on Mar. 02, 2021, DOI: 10.1002/widm.1301.

A. Burkov, “Model Performance Assessment†in The Hundred-Page Machine Learning Book, Illustrate. Andriy Burkov, 2019.

C. Goutte and E. Gaussier, “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,â€. In: Losada D.E., Fernández-Luna J.M. (eds) Advances in Information Retrieval, ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg, 2005, pp. 345-359, [Online]. Available: https://doi.org/10.1007/978-3-540-31865-1_25

M. Silva et al., “Predicting misinformation and engagement in COVID-19 twitter discourse in the first months of the outbreak,†arXiv, vol. 37, no. 4, 2020.




DOI: http://dx.doi.org/10.18517/ijaseit.12.3.14724

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development