Research Article

Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms

by  B. Ravikiran, Srinivasu Badugu
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Issue 42
Published: September 2024
Authors: B. Ravikiran, Srinivasu Badugu
10.5120/ijca2024924040
PDF

B. Ravikiran, Srinivasu Badugu . Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms. International Journal of Computer Applications. 186, 42 (September 2024), 28-35. DOI=10.5120/ijca2024924040

                        @article{ 10.5120/ijca2024924040,
                        author  = { B. Ravikiran,Srinivasu Badugu },
                        title   = { Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms },
                        journal = { International Journal of Computer Applications },
                        year    = { 2024 },
                        volume  = { 186 },
                        number  = { 42 },
                        pages   = { 28-35 },
                        doi     = { 10.5120/ijca2024924040 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2024
                        %A B. Ravikiran
                        %A Srinivasu Badugu
                        %T Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms%T 
                        %J International Journal of Computer Applications
                        %V 186
                        %N 42
                        %P 28-35
                        %R 10.5120/ijca2024924040
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Sarcasm detection is a growing field in Natural Language Processing (NLP). Sarcasm is identified using positive or more increased positive words, often with a negative connotation, to insult or mock others. In sentiment analysis, detecting sarcasm in the text has become critical. They reviewed numerous relevant research articles, but due to the telugu language's limited resources, detecting sarcasm in telugu language texts remains challenging. As a result, the sentiment detection model struggles to accurately identify the exact sentiment of a sarcastic statement, necessitating the development of an automated sarcasm detection system. Many researchers have trained and tested various machine learning classification algorithms to identify sarcasm, but these algorithms require a dataset as input, which often contains noise. The dataset undergoes various preprocessing techniques to eliminate noise. Gathered a Telugu conversational dataset from the Kaggle repository, developed their dataset called the Telugu News Headline dataset, labeled the statements as sarcastic or non-sarcastic by the annotators, and then input them into the proposed model. Built the proposed model using SVM (Support Vector Machine), NB (Naive Bayes), and LR (Logistic Regression) and utilized One Hot Encoding (OHE) to transform the dataset into vectors, then fed to the Sarcasm Detection Model to determine the model accuracy. It is trained and tested the Sarcasm detection model on positive or even more positive sentences with 60:40, 70:30, 80:20, and 90:10 splitting ratios to enhance the model performance. By considering the base 70:30 split ratio the best of three algorithms, Logistic Regression resulted in accuracy rates of 65.89% on the imbalanced Telugu conversational dataset and 67.01% on the balanced Telugu conversational dataset. Logistic Regression resulted in accuracy rates of 90.07% on the imbalanced Telugu news headline dataset, and SVM resulted in an accuracy of 98.35% on the balanced Telugu conversational dataset. It is observed that Logistic Regression had better accuracy on the imbalanced and balanced Telugu conversational dataset and the imbalanced Telugu news headline dataset, whereas on the balanced Telugu news headline dataset, SVM had good accuracy. In the future, it can be applied deep learning algorithms to detect sarcasm for better accuracy.

References
  • Joshi, A., Bhattacharyya, P., & Carman, M. J. (2017). “Automatic Sarcasm Detection : A Survey”. ACM ComputingSurveys, 50(5),1–22. https://doi.org/10.1145/3124420
  • Misra, R., & Arora, P. (2023). “Sarcasm Detection using news headlines dataset”. AI Open. https://doi.org/10.1016/j.aiopen.2023.01.001
  • Otter, D. W., Medina, J. R., & Kalita, J. K. (2020). “A Survey of the Usages of Deep Learning for Natural Language Processing”. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 1–21. https://doi.org/10.1109/TNNLS.2020.2979670
  • Šandor, D. and Bagić Babac, M. (2024), "Sarcasm Detection in online comments using machine learning", Information Discovery and Delivery, Vol. 52 No. 2, pp. 213-226. https://doi.org/10.1108/IDD-01-2023-0002
  • Rahma, A., Azab, S. S., & Mohammed, A. (2023). “A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends”. IEEE Access, 11,18261–18280. https://doi.org/10.1109/access.2023.3247427
  • Razali, M. S., Halin, A. A., Ye, L., Doraisamy, S., & Norowi, N. M. (2021). “Sarcasm Detection Using Deep Learning With Contextual Features”. IEEE Access, 9, 68609–68618. https://doi.org/10.1109/ACCESS.2021.3076789
  • Ravi Teja Gedela, Ujwala Baruah, & Soni, B. (2023). “Deep Contextualised Text Representation and Learning for Sarcasm Detection”. Arabian Journal for Science and Engineering, 49(3), 3719–3734. https://doi.org/10.1007/s13369-023-08170-4
  • Kumar, A., & Garg, G. (2019). “Empirical study of shallow and deep learning Models for Sarcasm Detection using context in benchmark datasets”. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-019-01419-7
  • Eke, C. I., Norman, A. A., Shuib, L., & Nweke, H. F. (2019). “Sarcasm identification in textual data: systematic review, research challenges and open directions”. Artificial Intelligence Review, 53(6), 4215–4258. https://doi.org/10.1007/s10462-019-09791-8
  • Ravi Teja Gedela, Pavani Meesala, Ujwala Baruah, & Soni, B. (2023). “Identifying Sarcasm using heterogeneous word embeddings: a hybrid and ensemble perspective”. SoftComputing. https://doi.org/10.1007/s00500-023-08368-6
  • Xiong, T., Zhang, P., Zhu, H., & Yang, Y. (2019). “Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling”. The World Wide Web Conference on - WWW ’19. https://doi.org/10.1145/3308558.3313735
  • Prashanth KVTKN, & Tene Ramakrishnudu. (2023). “Semi-supervised approach for tweet-level stress detection”. Natural Language Processing Journal, 100019–100019. https://doi.org/10.1016/j.nlp.2023.100019
  • Poria, S., Cambria, E., Hazarika, D., & Vij, P. (2016). “A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks”. ArXiv.org. https://arxiv.org/abs/1610.08815
  • Doan, T. M., & Gulla, J. A. (2022). “A Survey on Political Viewpoints Identification”. Online Social Networks and Media, 30, 100208. https://doi.org/10.1016/j.osnem.2022.100208
  • Shaik, T., Tao, X., Dann, C., Xie, H., Li, Y., & Galligan, L. (2023). “Sentiment analysis and opinion mining on educational data: A survey”. Natural Language Processing Journal, 2, 100003. https://doi.org/10.1016/j.nlp.2022.100003.
  • Chakravarthi, B. R., Priyadharshini, R., Banerjee, S., Jagadeeshan, M. B., Kumaresan, P. K., Ponnusamy, R., Benhur, S., & McCrae, J. P. (2023). “Detecting abusive comments at a fine-grained level in a low-resource language”. Natural Language Processing Journal, 3, 100006. https://doi.org/10.1016/j.nlp.2023.100006.
  • Kulkarni, D. S., & Rodd, S. S. (2022). “Sentiment Analysis in Hindi—A Survey on the State-of-the-art Techniques”. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1–46. https://doi.org/10.1145/3469722
  • M. Nirmala, Gandomi, A. H., Madda Rajasekhara Babu, Babu, D., & Rizwan Patan. (2024). “An Emoticon-Based Novel Sarcasm Pattern Detection Strategy to Identify Sarcasm in Microblogging Social Networks”. IEEE Transactions on Computational Social Systems, 1–8. https://doi.org/10.1109/tcss.2023.3306908
  • Li, J., Pan, H., Lin, Z., Fu, P., & Wang, W. (2021). “Sarcasm Detection with Commonsense Knowledge”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3192–3201. https://doi.org/10.1109/taslp.2021.3120601
  • He, S., Guo, F., & Qin, S. (2020). “Sarcasm Detection Using Graph Convolutional Networks with Bidirectional LSTM”. https://doi.org/10.1145/3422713.3422722
  • Govindan, V., & Balakrishnan, V. (2022). “A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for Sarcasm Detection”. Journal of King Saud University - Computer andInformationSciences. https://doi.org/10.1016/j.jksuci.2022.01.008
  • Muaad, A. Y., Jayappa Davanagere, H., Benifa, J. V. B., Alabrah, A., Naji Saif, M. A., Pushpa, D., Al-antari, M. A., & Alfakih, T. M. (2022). “Artificial Intelligence-Based Approach for Misogyny and Sarcasm Detection from Arabic Texts”. Computational Intelligence and Neuroscience, 2022,e7937667. https://doi.org/10.1155/2022/7937667
  • Jothi Prakash V, & Vijay, A. (2023). “Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12),1–28. https://doi.org/10.1145/3631391
  • Lahoti, P., Mittal, N., & Singh, G. (2022). A Survey on NLP resources, tools and techniques for Marathi Language Processing. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3548457
  • Alcamo, T., Cuzzocrea, A., Bosco, G. L., Pilato, G., & Schicchi, D. (2020). Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora. Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications&Services. https://doi.org/10.1145/3428757.3429144
  • Feng, H., Xie, S., Wei, W., Haibin, L., & Zhihan, L. (2022). Deep Learning in Computational Linguistics for Chinese Language Translation. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3519386
  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning--based Text Classification. ACM Computing Surveys, 54(3), 1–40. https://doi.org/10.1145/3439726
  • Poeller, S., Dechant, M., Klarkowski, M., & Mandryk, R. L. (2023). Suspecting Sarcasm: How League of Legends Players Dismiss Positive Communication in Toxic Environments. Proceedings of the ACM on Human-Computer Interaction, 7(CHI PLAY), 1–26. https://doi.org/10.1145/3611020
  • Son, L. H., Kumar, A., Sangwan, S. R., Arora, A., Nayyar, A., & Abdel-Basset, M. (2019). Sarcasm Detection Using Soft Attention-Based Bidirectional Long Short-Term Memory Model With Convolution Network. IEEE Access, 7, 23319–23328. https://doi.org/10.1109/access.2019.2899260
  • Zhang, Y., Yu, Y., Wang, M., Huang, M., & M. Shamim Hossain. (2023). Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis. ACM Transactions on Multimedia Computing, Communications and Applications/ACM Transactions on Multimedia Computing Communications and Applications. https://doi.org/10.1145/3635311
  • Jain, P. K., Saravanan, V., & Pamula, R. (2021). A Hybrid CNN-LSTM: A Deep Learning Approach for Consumer Sentiment Analysis Using Qualitative User-Generated Contents. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(5), 1–15. https://doi.org/10.1145/3457206
  • Cao, J., Li, J., Yin, M., & Wang, Y. (2022). Online reviews sentiment analysis and product feature improvement with deep learning. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3522575
  • Jothi Prakash V, & Vijay, A. (2023). Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12), 1–28. https://doi.org/10.1145/3631391
  • Oprea, S. V., & Magdy, W. (2020). The Effect of Sociocultural Variables on Sarcasm Communication Online. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1–22. https://doi.org/10.1145/3392834
  • Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(1), 1–11. https://doi.org/10.1145/3409488
  • Agrawal, A., An, A., & Manos Papagelis. (2020). Leveraging Transitions of Emotions for Sarcasm Detection. https://doi.org/10.1145/3397271.3401183
  • Tusarkanta Dalai, Tapas Kumar Mishra, & Sa, P. K. (2024). Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(2), 1–23. https://doi.org/10.1145/3637877.
  • R Prasanna Kumar, G Bharathi Mohan, Yamani Kakarla, L, J. S., Kolla Gnapika Sindhu, Sai, V., Ganesh, B., & Nunna Hasmitha Krishna. (2023). Sarcasm Detection in Telugu and Tamil: An Exploration of Machine Learning and Deep Neural Networks. https://doi.org/10.1109/icccnt56998.2023.10306775
Index Terms
Computer Science
Information Sciences
NLP
Machine Learning Classification Algorithms
Telugu Language Text
SVM
NB
LR
Keywords

Natural Language Processing; Sarcasm Detection; Machine Learning Low-resource language

Powered by PhDFocusTM