Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

Mirsad Hadžić; Zerina Mašetić; Fatima Mašić

Research Article

Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

by Mirsad Hadžić, Zerina Mašetić, Fatima Mašić

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 58

Published: November 2025

Authors: Mirsad Hadžić, Zerina Mašetić, Fatima Mašić

10.5120/ijca2025925988

PDF

Mirsad Hadžić, Zerina Mašetić, Fatima Mašić . Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy. International Journal of Computer Applications. 187, 58 (November 2025), 73-79. DOI=10.5120/ijca2025925988

                        @article{ 10.5120/ijca2025925988,
                        author  = { Mirsad Hadžić,Zerina Mašetić,Fatima Mašić },
                        title   = { Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 58 },
                        pages   = { 73-79 },
                        doi     = { 10.5120/ijca2025925988 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Mirsad Hadžić
                        %A Zerina Mašetić
                        %A Fatima Mašić
                        %T Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 58
                        %P 73-79
                        %R 10.5120/ijca2025925988
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The purpose of this research is to perform sentiment analysis on Twitter data using Natural Language Processing (NLP) techniques, particularly leveraging the NLTK library in Python within a Jupyter notebook environment. The study aims to explore sentiment classification methods, evaluating the emotional tone of tweets and categorizing them as neutral, positive, or negative sentiments, utilizing NLTK's SentimentIntensityAnalyzer. The sample consists of Twitter data with columns like 'Tweet' and 'Sentiment' sourced from a CSV file. The methodology involves tokenizing and processing the text, grading sentiment, counting occurrences of the hashtag #fifa, and analyzing word frequencies [1]. In addition to the lexicon-based VADER approach, the study incorporates a transformer-based deep learning model—BERT (Bidirectional Encoder Representations from Transformers) -to enhance sentiment classification accuracy. BERT, pre-trained on large corpora and capable of understanding context and nuanced language, offers a state-of-the-art alternative to traditional models. This inclusion allows a comparative analysis between rule-based and deep learning approaches, highlighting BERT’s effectiveness in handling complex tweet structures. Furthermore, the study investigates the impact of removing stopwords and explores the list of eliminated stopwords. The expected results include gaining insights into prevalent sentiments on Twitter regarding a specified topic, frequency of the hashtag #fifa, and a comprehensive understanding of word usage, visually depicted through wordclouds. Possible limitations include inherent subjectivity in sentiment analysis, potential variations in language use, reliance on hashtag frequency as an indicator of topic prevalence, and the effectiveness of stopword removal, which may be context-dependent. The addition of wordcloud analysis enhances the visual representation of the most frequent words, providing a holistic perspective on the dataset.

References

Saif M. Mohammed (2017). Challenges in Sentiment Analysis. arXiv preprint. https://ufal.mff.cuni.cz/~hana/teaching/Mohammad2017_Chapter_ChallengesInSentimentA nalysis.pdf
VADER. (2024). https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
Pang, B., & Lee, L. (2008). Thumbs up? Sentiment Classification using Machine Learning Techniques. https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
Lei Z., S.W., B. Liu (2018). Deep Learning for Sentiment Analysis : A Survey https://arxiv.org/abs/1801.07883
Saif M. Mohammed, & S.K. (2018). https://svkir.com/papers/Mohammad-Kiritchenko-Tweets-VAD-EI-LREC-2018.pdf
Caliskan, A., et al. (2017). Semantics derived automatically from language corpora contain human-like biases. Science. https://www.science.org/doi/10.1126/science.aal4230
"Natural Language Processing in Python: Exploring Word Frequencies with NLTK" - Medium. (2021) https://medium.com/@siglimumuni/natural-language-processing-in-python-exploring-word-fr equencies-with-nltk-918f33c1e4c3
Dataset. (2022). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets
NLTK. (2025). https://www.nltk.org/
"Simple WordCloud using NLTK Library in Python" - NLPfy. (2021) https://nlpfy.com/simple-wordcloud-using-nltk-library-in-python/
Mueller, A. (2012). WordCloud Documentation. https://github.com/amueller/word_cloud
Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225.
MNB. (2024). https://www.geeksforgeeks.org/multinomial-naive-bayes/
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
Kaggle (2023). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets/data.
Snscrape (2007). https://github.com/JustAnotherArchivist/snscrape

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Sentiment analysis Twitter data mining VADER BERT NLP