International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 186 - Issue 11 |
Published: March 2024 |
Authors: Datla Tarun Anjaneya Varma, Nukala Sai Dhanuj, Nookala Gopala Krishna Murthy |
![]() |
Datla Tarun Anjaneya Varma, Nukala Sai Dhanuj, Nookala Gopala Krishna Murthy . Hostile Content Detection from Tweets in Hindi using Machine Learning and Deep Learning. International Journal of Computer Applications. 186, 11 (March 2024), 30-34. DOI=10.5120/ijca2024923466
@article{ 10.5120/ijca2024923466, author = { Datla Tarun Anjaneya Varma,Nukala Sai Dhanuj,Nookala Gopala Krishna Murthy }, title = { Hostile Content Detection from Tweets in Hindi using Machine Learning and Deep Learning }, journal = { International Journal of Computer Applications }, year = { 2024 }, volume = { 186 }, number = { 11 }, pages = { 30-34 }, doi = { 10.5120/ijca2024923466 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2024 %A Datla Tarun Anjaneya Varma %A Nukala Sai Dhanuj %A Nookala Gopala Krishna Murthy %T Hostile Content Detection from Tweets in Hindi using Machine Learning and Deep Learning%T %J International Journal of Computer Applications %V 186 %N 11 %P 30-34 %R 10.5120/ijca2024923466 %I Foundation of Computer Science (FCS), NY, USA
In this paper, the focus is to address the exigent challenge of cyberbullying detection within the domain of Hindi social media discourse, an area conspicuously underserved in scholarly exploration. Harnessing a meticulously curated dataset from the CONSTRAINT-2021[1][6] shared task, encompassing approximately 8,200 posts meticulously annotated with categories delineating facets such as fake, hate, offensive, and defamation, the study leverages the prowess of machine learning methodologies. Two distinct approaches are scrutinized: one predicated on the application of the MBERT transformer model, involving the translation of sentences into English, and the other leveraging INLTK embeddings directly for Hindi posts. The outcomes unveil the superior efficacy of the MBERT model in comparison to INLTK. Employing discerning algorithms such as Xgboost, Lightgbm, and Catboost, the research attains commendable F1 scores across diverse categories of hostile content. This scholarly pursuit thus not only enriches the existing literature on the detection of cyberbullying in regional languages but also furnishes consequential insights for mitigating this societal challenge.