Research Article

Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

by  Mirsad Hadžić, Zerina Mašetić, Fatima Mašić
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 58
Published: November 2025
Authors: Mirsad Hadžić, Zerina Mašetić, Fatima Mašić
10.5120/ijca2025925988
PDF

Mirsad Hadžić, Zerina Mašetić, Fatima Mašić . Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy. International Journal of Computer Applications. 187, 58 (November 2025), 73-79. DOI=10.5120/ijca2025925988

                        @article{ 10.5120/ijca2025925988,
                        author  = { Mirsad Hadžić,Zerina Mašetić,Fatima Mašić },
                        title   = { Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 58 },
                        pages   = { 73-79 },
                        doi     = { 10.5120/ijca2025925988 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Mirsad Hadžić
                        %A Zerina Mašetić
                        %A Fatima Mašić
                        %T Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 58
                        %P 73-79
                        %R 10.5120/ijca2025925988
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The purpose of this research is to perform sentiment analysis on Twitter data using Natural Language Processing (NLP) techniques, particularly leveraging the NLTK library in Python within a Jupyter notebook environment. The study aims to explore sentiment classification methods, evaluating the emotional tone of tweets and categorizing them as neutral, positive, or negative sentiments, utilizing NLTK's SentimentIntensityAnalyzer. The sample consists of Twitter data with columns like 'Tweet' and 'Sentiment' sourced from a CSV file. The methodology involves tokenizing and processing the text, grading sentiment, counting occurrences of the hashtag #fifa, and analyzing word frequencies [1]. In addition to the lexicon-based VADER approach, the study incorporates a transformer-based deep learning model—BERT (Bidirectional Encoder Representations from Transformers) -to enhance sentiment classification accuracy. BERT, pre-trained on large corpora and capable of understanding context and nuanced language, offers a state-of-the-art alternative to traditional models. This inclusion allows a comparative analysis between rule-based and deep learning approaches, highlighting BERT’s effectiveness in handling complex tweet structures. Furthermore, the study investigates the impact of removing stopwords and explores the list of eliminated stopwords. The expected results include gaining insights into prevalent sentiments on Twitter regarding a specified topic, frequency of the hashtag #fifa, and a comprehensive understanding of word usage, visually depicted through wordclouds. Possible limitations include inherent subjectivity in sentiment analysis, potential variations in language use, reliance on hashtag frequency as an indicator of topic prevalence, and the effectiveness of stopword removal, which may be context-dependent. The addition of wordcloud analysis enhances the visual representation of the most frequent words, providing a holistic perspective on the dataset.

References
  • Saif M. Mohammed (2017). Challenges in Sentiment Analysis. arXiv preprint. https://ufal.mff.cuni.cz/~hana/teaching/Mohammad2017_Chapter_ChallengesInSentimentA nalysis.pdf
  • VADER. (2024). https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/
  • Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf
  • Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
  • Pang, B., & Lee, L. (2008). Thumbs up? Sentiment Classification using Machine Learning Techniques. https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
  • Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
  • Lei Z., S.W., B. Liu (2018). Deep Learning for Sentiment Analysis : A Survey https://arxiv.org/abs/1801.07883
  • Saif M. Mohammed, & S.K. (2018). https://svkir.com/papers/Mohammad-Kiritchenko-Tweets-VAD-EI-LREC-2018.pdf
  • Caliskan, A., et al. (2017). Semantics derived automatically from language corpora contain human-like biases. Science. https://www.science.org/doi/10.1126/science.aal4230
  • "Natural Language Processing in Python: Exploring Word Frequencies with NLTK" - Medium. (2021) https://medium.com/@siglimumuni/natural-language-processing-in-python-exploring-word-fr equencies-with-nltk-918f33c1e4c3
  • Dataset. (2022). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets
  • NLTK. (2025). https://www.nltk.org/
  • "Simple WordCloud using NLTK Library in Python" - NLPfy. (2021) https://nlpfy.com/simple-wordcloud-using-nltk-library-in-python/
  • Mueller, A. (2012). WordCloud Documentation. https://github.com/amueller/word_cloud
  • Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225.
  • MNB. (2024). https://www.geeksforgeeks.org/multinomial-naive-bayes/
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
  • Kaggle (2023). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets/data.
  • Snscrape (2007). https://github.com/JustAnotherArchivist/snscrape
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Sentiment analysis Twitter data mining VADER BERT NLP

Powered by PhDFocusTM