Telugu Toxic Comment Analysis Using Deep Learning Transformer Models
1st Narsimhaswamy Bhukya
Computer Science and Engineering
Rajiv Gandhi University of Knowledge Technologies
Basar, India b200423@rgukt.ac.in
3rd Mahesh Pendem
Computer Science and Engineering
Rajiv Gandhi University of Knowledge Technologies
Basar, India
2nd Nagaraju Badavath
Computer Science and Engineering
Rajiv Gandhi University of Knowledge Technologies
Basar, India b201136@rgukt.ac.in
4st Sujoy Sarkar
Computer Science and Engineering
Rajiv Gandhi University of Knowledge Technologies
Basar, India b200737@rgukt.ac.in
Abstract— [1]The rapid growth of social media platforms has led to an increase in the spread of toxic and abusive content, particularly in low-resource languages such as Telugu. Automatic detection of such harmful content is essential to ensure safe and respectful online communication. However, limited annotated datasets and linguistic diversity pose significant challenges for Telugu toxic comment classification.
In this project, we perform a comparative analysis of machine learning and transformer-based models for Telugu toxic comment detection using the Telugu portion of the [2] MACD dataset. In the first phase, traditional machine learning algorithms such as Support Vector Machine (SVM) and Logistic Regression were implemented along with encoder-based transformer models including mBERT and IndicBERT. The models were evaluated using standard performance metrics such as Accuracy, Precision, Recall, and F1-score.
In the second phase, we extend the study by implementing encoder–decoder transformer architectures, namely mT5 and IndicT5, to examine their effectiveness in handling contextual and semantic complexities in Telugu text. This work provides a systematic comparison between traditional machine learning approaches, encoder-based transformers, and encoder–decoder transformers for Telugu toxic comment classification. The study aims to identify the most suitable architecture for improving detection performance in low-resource Indic languages.