Research Article

A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis

by  Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Issue 7
Published: February 2024
Authors: Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat
10.5120/ijca2024923409
PDF

Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat . A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis. International Journal of Computer Applications. 186, 7 (February 2024), 35-42. DOI=10.5120/ijca2024923409

                        @article{ 10.5120/ijca2024923409,
                        author  = { Korakot Matarat,Chaidan Mingmuang,Weerasak Charoenrat },
                        title   = { A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis },
                        journal = { International Journal of Computer Applications },
                        year    = { 2024 },
                        volume  = { 186 },
                        number  = { 7 },
                        pages   = { 35-42 },
                        doi     = { 10.5120/ijca2024923409 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2024
                        %A Korakot Matarat
                        %A Chaidan Mingmuang
                        %A Weerasak Charoenrat
                        %T A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis%T 
                        %J International Journal of Computer Applications
                        %V 186
                        %N 7
                        %P 35-42
                        %R 10.5120/ijca2024923409
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Sentiment analysis plays a crucial role in deciphering opinions and emotions expressed in textual data, with wide-ranging applications in business such as customer feedback analysis and social media monitoring. This paper conducts a thorough performance analysis of supervised machine learning algorithms in sentiment analysis, utilising the Wongnai reviews dataset, which comprises 40,000 reviews. By utilising a sophisticated preprocessing pipeline and conducting a comparative analysis of feature extraction methods, the research improves sentiment analysis by eliminating stop words (e.g., < > □% I < / # + -;- * & @ $). Subsequently, it will eradicate words that are meaningless for processing the text, for example, มี, เฉยๆ, เช่นใด, เพียงแต่, น้อยๆ, ข้างเคียง and hashtag removal, POS tagging, sentiment score computation, and TF-IDF analysis. The research introduces a novel approach to dominant feature extraction, surpassing traditional bag-of-words methods. By applying six algorithms Logistic Regression (LR), Multinomial Naïve Bayes (NB), Decision Tree Classifier (DT), Neural Network (NN), Gradient Descent (SGD), and Support Vector Machine (SVC), the study compares their accuracy, precision, and recall values, revealing notable insights within the context of Wongnai reviews. In conclusion, this paper not only contributes to understanding sentiment analysis performance but also serves as a valuable resource for optimising models in diverse domains. SVC emerges as the top-performing algorithm by achieving a 0.73 accuracy score, outclassing LR, NB, NN, and SGD with identical performances by achieving a 0.72 accuracy score, while DT exhibits the lowest performance. Further analysis combining TF-IDF with BoW shows improved performance by SGD and SVC by achieving a 0.74 accuracy score, reinforcing the superior performance of SVC in this experiment. This concise summary provides a foundation for practitioners and researchers engaged in sentiment analysis, aiding informed decision-making and paving the way for future exploration with advanced machine learning algorithms.

References
  • Abdul, M., Abdul, K., and Abu, K. (2019). Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset. 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, pp. 266 - 270, Nov. 2019
  • Alexander, M., Elmina, H., Francisco, C. and Ofer, E. (2021). Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit. Knowledge-Based Systems,Vol. 228, Sep. 2021
  • Abdulwahab, A. and Mustafa, A. (2019). Sentiment Analysis of Product Reviews Using Bag of Words and Bag of Concepts. IJEIE, Vol. 11. No.2. pp.49-60, Dec. 2019
  • Azwa, A. and Andrew, S. (2019). Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches. in IEEE Access, vol. 8, pp. 17722-17733, Dec. 2020
  • Devansh, S., Arun, S. and Sudha, P. (2022). Sentimental Analysis Using Supervised Learning Algorithms. ICCAKM, Dubai, United Arab Emirates, pp. 1-6, Dec. 2022
  • Elena, R., Martin, H., Matthias, W. and Marcelo, J. (2018). More than Bags of Words: Sentiment Analysis with Word Embeddings. Communication Methods and Measures, Vol. 12, No. 2, pp. 140-157, Apr. 2018
  • Furqan, R., Madiha, W., Vaibhav, R. and Arif, Mehmood. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE. Vol. 16, No. 2, Feb. 2021
  • Hafiz, M., et al. (2021). Sentiment Analysis of Online Food Reviews using Big Data Analytics. Elementary Education Online. Vol. 20, No. 2, pp. 827-836, Apr. 2021
  • Kanwal, Z., Narmeen, B. and Soomaiya, H. (2020). Sentiment Analysis and Classification of Restaurant Reviews using Machine Learning. ACIT, Giza, Egypt, pp. 1-6, Jan. 2020
  • Kotagiri, S., and Mary, S. (2019). Aspect Based Sentiment Analysis using POS Tagging and TFIDF. IJEAT, Vol. 8, No. 6, Aug. 2019
  • Manasee, G. (2015). The Process of Sentiment Analysis: A Study. International Journal of Computer Applications, Vol. 126, No. 7, Sep. 2015
  • Marwan, O., Moustafa, H., Nacereddine, H. and Amani, S. (2019). Sentiment Classifier: Logistic Regression for Arabic Services’ Reviews in Lebanon. International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, pp. 1-5, May. 2019
  • Metin, B., and Haldun, K. (2019). Sentiment Analysis with Term Weighting and Word Vectors. The International Arab Journal of Information Technology, Vol. 16, No. 5, Sep. 2019
  • Mohammad, F., and Riyanarto, S. (2019). A comparative study of sentiment analysis using SVM and SentiWordNet. Indonesian Journal of Electrical Engineering and Computer Science, Vol. 13, No. 3, pp. 902-909, Mar. 2019
  • Mohd, Y., Muhammad, L. and Liyana, Z. (2019). A Review on Sentiment Analysis Techniques and Applications. IOP Conference Series Materials Science and Engineering, Vol. 551, Fab. 2019
  • Mohamed, C. et al. (2021). LSTM, VADER and TF-IDF based Hybrid Sentiment Analysis Model. IJACSA, Vol. 12, No. 7, Jul. 2021
  • Pooja, M. and Sharnil, P. (2020). A Review on Sentiment Analysis Methodologies, Practices and Applications. IJSTR, Vol. 9, No. 2, Feb. 2020
  • Raj, P. et al. (2016). Comparative Evaluation of Supervised Learning Algorithms for Sentiment Analysis of Movie Reviews. International Journal of Computer Applications, Vol. 142, No. 1, May. 2016
  • Rutuja, R., Sumit, K. and Ruchi, R. (2022). Comparison of Artificial Intelligence Algorithms in Plant Disease Prediction. Revue d'Intelligence Artificielle, Vol. 36, No. 2, pp. 185-193, Apr. 2022
  • Manjula, D. et al. (2023). Twitter Sentiment Analysis using Collaborative Multi Layer Perceptron (MLP) Classifier. ICCCI, Coimbatore, India, pp. 1-6, May. 2023
  • Samriti, S., Gurvinder, S. and Manik, S. (2021). A comprehensive review and analysis of supervisedlearning and soft computing techniques for stress diagnosis in humans. Computers in Biology and Medicine, Vol. 134, Jul. 2021
  • Samruddhi, K. (2019). Classification Model to Predict the Sentiment of Hotel Review. IRJCS, Vol. 6, No. 6, Jun. 2019
  • Saleh, N. et al. (2022). Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Computers, Materials & Continua, Vol. 70, No. 2, Sep. 2022
  • Siva, P. et al. (2019). Feature-Based Opinion Mining for Amazon Product’s using MLT. International Journal of Innovative Technology and Exploring Engineering, Vol. 8, No. 11, Sep. 2019
  • Siyin, L. et al. (2021). Research on Text Sentiment Analysis Based on Neural Network and Ensemble Learning. Revue d'Intelligence Artificielle, Vol. 35, No. 1, pp. 63-70, Feb. 2021
  • Satyendra, S., Krishan, K. and Brajesh, K. (2022). Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques. International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, Faridabad, India, pp. 252-255, May. 2022
  • Shamsa, U. et al. (2018). Sentiment Analysis Approaches and Applications: A Survey. International Journal of Computer Applications, Vol. 181, No. 1, Jul. 2018
  • Tanatorn, T., Nuttapong, S. and Udomsak, D. (2020). Sentiment Classification on Thai Social Media Using a Domain-Specific Trained Lexicon, ECTI-CON, Phuket, Thailand, pp. 580-583, Jun. 2020
  • Tejaswini, M. and Choudhari, G. (2019). Implementation of Sentiment Classification of Movie Reviews by Supervised Machine Learning Approaches. ICCMC, Erode, India, pp. 1197-1200, Mar. 2019
  • Vivian, L. et al. (2019). Semi-supervised Learning for Sentiment Classification using Small Number of Labeled Data. The Fifth Information Systems International Conference, Vol. 161, pp. 577-584, Jan. 2019
  • Shadi, D. (2018). Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. IJCSIS, Vol. 16, No. 12, Dec. 2018
  • Waqar, M. et al. (2020). Sentiment analysis of Product Reviews in the Absence of Labelled data using Supervised Learning Approaches. Malaysian Journal of Computer Science, Vol. 32, No. 2, pp. 118-132, Apr. 2020
  • Korakot, C. (2021). Wongnai corpus. https://github.com/ wongnai/wongnai-corpus
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Performance analysis Supervised learning Bag-of-words TF-IDF analysis Thai language data analysis Sentiment analysis.

Powered by PhDFocusTM