Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms

B. Ravikiran; Srinivasu Badugu

Research Article

Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms

by B. Ravikiran, Srinivasu Badugu

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Issue 42

Published: September 2024

Authors: B. Ravikiran, Srinivasu Badugu

10.5120/ijca2024924040

PDF

B. Ravikiran, Srinivasu Badugu . Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms. International Journal of Computer Applications. 186, 42 (September 2024), 28-35. DOI=10.5120/ijca2024924040

                        @article{ 10.5120/ijca2024924040,
                        author  = { B. Ravikiran,Srinivasu Badugu },
                        title   = { Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms },
                        journal = { International Journal of Computer Applications },
                        year    = { 2024 },
                        volume  = { 186 },
                        number  = { 42 },
                        pages   = { 28-35 },
                        doi     = { 10.5120/ijca2024924040 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2024
                        %A B. Ravikiran
                        %A Srinivasu Badugu
                        %T Sarcasm Detection in Telugu Language Text Using Distinct Machine Learning Classification Algorithms%T 
                        %J International Journal of Computer Applications
                        %V 186
                        %N 42
                        %P 28-35
                        %R 10.5120/ijca2024924040
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Sarcasm detection is a growing field in Natural Language Processing (NLP). Sarcasm is identified using positive or more increased positive words, often with a negative connotation, to insult or mock others. In sentiment analysis, detecting sarcasm in the text has become critical. They reviewed numerous relevant research articles, but due to the telugu language's limited resources, detecting sarcasm in telugu language texts remains challenging. As a result, the sentiment detection model struggles to accurately identify the exact sentiment of a sarcastic statement, necessitating the development of an automated sarcasm detection system. Many researchers have trained and tested various machine learning classification algorithms to identify sarcasm, but these algorithms require a dataset as input, which often contains noise. The dataset undergoes various preprocessing techniques to eliminate noise. Gathered a Telugu conversational dataset from the Kaggle repository, developed their dataset called the Telugu News Headline dataset, labeled the statements as sarcastic or non-sarcastic by the annotators, and then input them into the proposed model. Built the proposed model using SVM (Support Vector Machine), NB (Naive Bayes), and LR (Logistic Regression) and utilized One Hot Encoding (OHE) to transform the dataset into vectors, then fed to the Sarcasm Detection Model to determine the model accuracy. It is trained and tested the Sarcasm detection model on positive or even more positive sentences with 60:40, 70:30, 80:20, and 90:10 splitting ratios to enhance the model performance. By considering the base 70:30 split ratio the best of three algorithms, Logistic Regression resulted in accuracy rates of 65.89% on the imbalanced Telugu conversational dataset and 67.01% on the balanced Telugu conversational dataset. Logistic Regression resulted in accuracy rates of 90.07% on the imbalanced Telugu news headline dataset, and SVM resulted in an accuracy of 98.35% on the balanced Telugu conversational dataset. It is observed that Logistic Regression had better accuracy on the imbalanced and balanced Telugu conversational dataset and the imbalanced Telugu news headline dataset, whereas on the balanced Telugu news headline dataset, SVM had good accuracy. In the future, it can be applied deep learning algorithms to detect sarcasm for better accuracy.

References

Joshi, A., Bhattacharyya, P., & Carman, M. J. (2017). “Automatic Sarcasm Detection : A Survey”. ACM ComputingSurveys, 50(5),1–22. https://doi.org/10.1145/3124420
Misra, R., & Arora, P. (2023). “Sarcasm Detection using news headlines dataset”. AI Open. https://doi.org/10.1016/j.aiopen.2023.01.001
Otter, D. W., Medina, J. R., & Kalita, J. K. (2020). “A Survey of the Usages of Deep Learning for Natural Language Processing”. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 1–21. https://doi.org/10.1109/TNNLS.2020.2979670
Šandor, D. and Bagić Babac, M. (2024), "Sarcasm Detection in online comments using machine learning", Information Discovery and Delivery, Vol. 52 No. 2, pp. 213-226. https://doi.org/10.1108/IDD-01-2023-0002
Rahma, A., Azab, S. S., & Mohammed, A. (2023). “A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends”. IEEE Access, 11,18261–18280. https://doi.org/10.1109/access.2023.3247427
Razali, M. S., Halin, A. A., Ye, L., Doraisamy, S., & Norowi, N. M. (2021). “Sarcasm Detection Using Deep Learning With Contextual Features”. IEEE Access, 9, 68609–68618. https://doi.org/10.1109/ACCESS.2021.3076789
Ravi Teja Gedela, Ujwala Baruah, & Soni, B. (2023). “Deep Contextualised Text Representation and Learning for Sarcasm Detection”. Arabian Journal for Science and Engineering, 49(3), 3719–3734. https://doi.org/10.1007/s13369-023-08170-4
Kumar, A., & Garg, G. (2019). “Empirical study of shallow and deep learning Models for Sarcasm Detection using context in benchmark datasets”. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-019-01419-7
Eke, C. I., Norman, A. A., Shuib, L., & Nweke, H. F. (2019). “Sarcasm identification in textual data: systematic review, research challenges and open directions”. Artificial Intelligence Review, 53(6), 4215–4258. https://doi.org/10.1007/s10462-019-09791-8
Ravi Teja Gedela, Pavani Meesala, Ujwala Baruah, & Soni, B. (2023). “Identifying Sarcasm using heterogeneous word embeddings: a hybrid and ensemble perspective”. SoftComputing. https://doi.org/10.1007/s00500-023-08368-6
Xiong, T., Zhang, P., Zhu, H., & Yang, Y. (2019). “Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling”. The World Wide Web Conference on - WWW ’19. https://doi.org/10.1145/3308558.3313735
Prashanth KVTKN, & Tene Ramakrishnudu. (2023). “Semi-supervised approach for tweet-level stress detection”. Natural Language Processing Journal, 100019–100019. https://doi.org/10.1016/j.nlp.2023.100019
Poria, S., Cambria, E., Hazarika, D., & Vij, P. (2016). “A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks”. ArXiv.org. https://arxiv.org/abs/1610.08815
Doan, T. M., & Gulla, J. A. (2022). “A Survey on Political Viewpoints Identification”. Online Social Networks and Media, 30, 100208. https://doi.org/10.1016/j.osnem.2022.100208
Shaik, T., Tao, X., Dann, C., Xie, H., Li, Y., & Galligan, L. (2023). “Sentiment analysis and opinion mining on educational data: A survey”. Natural Language Processing Journal, 2, 100003. https://doi.org/10.1016/j.nlp.2022.100003.
Chakravarthi, B. R., Priyadharshini, R., Banerjee, S., Jagadeeshan, M. B., Kumaresan, P. K., Ponnusamy, R., Benhur, S., & McCrae, J. P. (2023). “Detecting abusive comments at a fine-grained level in a low-resource language”. Natural Language Processing Journal, 3, 100006. https://doi.org/10.1016/j.nlp.2023.100006.
Kulkarni, D. S., & Rodd, S. S. (2022). “Sentiment Analysis in Hindi—A Survey on the State-of-the-art Techniques”. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1–46. https://doi.org/10.1145/3469722
M. Nirmala, Gandomi, A. H., Madda Rajasekhara Babu, Babu, D., & Rizwan Patan. (2024). “An Emoticon-Based Novel Sarcasm Pattern Detection Strategy to Identify Sarcasm in Microblogging Social Networks”. IEEE Transactions on Computational Social Systems, 1–8. https://doi.org/10.1109/tcss.2023.3306908
Li, J., Pan, H., Lin, Z., Fu, P., & Wang, W. (2021). “Sarcasm Detection with Commonsense Knowledge”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3192–3201. https://doi.org/10.1109/taslp.2021.3120601
He, S., Guo, F., & Qin, S. (2020). “Sarcasm Detection Using Graph Convolutional Networks with Bidirectional LSTM”. https://doi.org/10.1145/3422713.3422722
Govindan, V., & Balakrishnan, V. (2022). “A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for Sarcasm Detection”. Journal of King Saud University - Computer andInformationSciences. https://doi.org/10.1016/j.jksuci.2022.01.008
Muaad, A. Y., Jayappa Davanagere, H., Benifa, J. V. B., Alabrah, A., Naji Saif, M. A., Pushpa, D., Al-antari, M. A., & Alfakih, T. M. (2022). “Artificial Intelligence-Based Approach for Misogyny and Sarcasm Detection from Arabic Texts”. Computational Intelligence and Neuroscience, 2022,e7937667. https://doi.org/10.1155/2022/7937667
Jothi Prakash V, & Vijay, A. (2023). “Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12),1–28. https://doi.org/10.1145/3631391
Lahoti, P., Mittal, N., & Singh, G. (2022). A Survey on NLP resources, tools and techniques for Marathi Language Processing. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3548457
Alcamo, T., Cuzzocrea, A., Bosco, G. L., Pilato, G., & Schicchi, D. (2020). Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora. Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications&Services. https://doi.org/10.1145/3428757.3429144
Feng, H., Xie, S., Wei, W., Haibin, L., & Zhihan, L. (2022). Deep Learning in Computational Linguistics for Chinese Language Translation. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3519386
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning--based Text Classification. ACM Computing Surveys, 54(3), 1–40. https://doi.org/10.1145/3439726
Poeller, S., Dechant, M., Klarkowski, M., & Mandryk, R. L. (2023). Suspecting Sarcasm: How League of Legends Players Dismiss Positive Communication in Toxic Environments. Proceedings of the ACM on Human-Computer Interaction, 7(CHI PLAY), 1–26. https://doi.org/10.1145/3611020
Son, L. H., Kumar, A., Sangwan, S. R., Arora, A., Nayyar, A., & Abdel-Basset, M. (2019). Sarcasm Detection Using Soft Attention-Based Bidirectional Long Short-Term Memory Model With Convolution Network. IEEE Access, 7, 23319–23328. https://doi.org/10.1109/access.2019.2899260
Zhang, Y., Yu, Y., Wang, M., Huang, M., & M. Shamim Hossain. (2023). Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis. ACM Transactions on Multimedia Computing, Communications and Applications/ACM Transactions on Multimedia Computing Communications and Applications. https://doi.org/10.1145/3635311
Jain, P. K., Saravanan, V., & Pamula, R. (2021). A Hybrid CNN-LSTM: A Deep Learning Approach for Consumer Sentiment Analysis Using Qualitative User-Generated Contents. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(5), 1–15. https://doi.org/10.1145/3457206
Cao, J., Li, J., Yin, M., & Wang, Y. (2022). Online reviews sentiment analysis and product feature improvement with deep learning. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3522575
Jothi Prakash V, & Vijay, A. (2023). Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12), 1–28. https://doi.org/10.1145/3631391
Oprea, S. V., & Magdy, W. (2020). The Effect of Sociocultural Variables on Sarcasm Communication Online. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1–22. https://doi.org/10.1145/3392834
Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(1), 1–11. https://doi.org/10.1145/3409488
Agrawal, A., An, A., & Manos Papagelis. (2020). Leveraging Transitions of Emotions for Sarcasm Detection. https://doi.org/10.1145/3397271.3401183
Tusarkanta Dalai, Tapas Kumar Mishra, & Sa, P. K. (2024). Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(2), 1–23. https://doi.org/10.1145/3637877.
R Prasanna Kumar, G Bharathi Mohan, Yamani Kakarla, L, J. S., Kolla Gnapika Sindhu, Sai, V., Ganesh, B., & Nunna Hasmitha Krishna. (2023). Sarcasm Detection in Telugu and Tamil: An Exploration of Machine Learning and Deep Neural Networks. https://doi.org/10.1109/icccnt56998.2023.10306775

Index Terms

Computer Science

Information Sciences

NLP

Machine Learning Classification Algorithms

Telugu Language Text

SVM

Keywords

Natural Language Processing; Sarcasm Detection; Machine Learning Low-resource language