A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text
Himanshi Rathore
Department of Artificial Intelligence Noida Institute of Engineering and Technology
Greater Noida, India himanshir603@gmail.com
Kajal Singh
Department of Artificial Intelligence Noida Institute of Engineering and Technology
Greater Noida, India kajalss1807@gmail.com
Aparna Pandey
Department of Artificial Intelligence Noida Institute of Engineering and Technology
Greater Noida, India aparnapandey29@gmail.com
Abstract—Cyberbullying detection is a challenging issue be-cause harmful online messages can be unclear and unpredictable. On social media platforms, abusive intent often appears in sarcasm, mixed-language writing, slang, spelling changes, or short coded phrases [2], [3]. The problem becomes even more difficult with Hinglish communication, which mixes English and Hindi in the same message. In this work, we propose a context-aware hybrid cyberbullying detection framework based on XLM-RoBERTa [4]. The system combines transformer-based contextual understanding with lexical indicators, word-level TF-IDF, character-level TF-IDF, and subword-aware cues. A two-stage classification process first determines whether the content is harmful and then identifies the specific type of harm. Focal loss helps improve learning for minority harmful classes [5]. We created the dataset for this study by merging public toxic-language datasets with custom Hinglish samples. We analyzed it further using entropy, Gini impurity, imbalance ratio, coefficient of variation, and lexical diversity. The final model achieved a test accuracy of 0.8099, a weighted F1 score of 0.8113, and a macro F1 score of 0.8192. The results indicate that a hybrid multilingual approach can provide a stronger and more effective way to detect cyberbullying in real online environments.
Index Terms—Cyberbullying detection, XLM-RoBERTa, mul-tilingual NLP, Hinglish, hybrid features, TF-IDF, focal loss, context-aware classification.