A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 7
File Size 467.47 KB
File Count 1
Create Date 22/04/2026
Last Updated 22/04/2026

Download

Description

A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text

Himanshi Rathore
Department of Artificial Intelligence Noida Institute of Engineering and Technology

Greater Noida, India himanshir603@gmail.com

Kajal Singh
Department of Artificial Intelligence Noida Institute of Engineering and Technology

Greater Noida, India kajalss1807@gmail.com

Aparna Pandey

Department of Artificial Intelligence Noida Institute of Engineering and Technology

Greater Noida, India aparnapandey29@gmail.com

Abstract—Cyberbullying detection is a challenging issue be-cause harmful online messages can be unclear and unpredictable. On social media platforms, abusive intent often appears in sarcasm, mixed-language writing, slang, spelling changes, or short coded phrases [2], [3]. The problem becomes even more difficult with Hinglish communication, which mixes English and Hindi in the same message. In this work, we propose a context-aware hybrid cyberbullying detection framework based on XLM-RoBERTa [4]. The system combines transformer-based contextual understanding with lexical indicators, word-level TF-IDF, character-level TF-IDF, and subword-aware cues. A two-stage classification process first determines whether the content is harmful and then identifies the specific type of harm. Focal loss helps improve learning for minority harmful classes [5]. We created the dataset for this study by merging public toxic-language datasets with custom Hinglish samples. We analyzed it further using entropy, Gini impurity, imbalance ratio, coefficient of variation, and lexical diversity. The final model achieved a test accuracy of 0.8099, a weighted F1 score of 0.8113, and a macro F1 score of 0.8192. The results indicate that a hybrid multilingual approach can provide a stronger and more effective way to detect cyberbullying in real online environments.

Index Terms—Cyberbullying detection, XLM-RoBERTa, mul-tilingual NLP, Hinglish, hybrid features, TF-IDF, focal loss, context-aware classification.

A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text

A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text

A Context-Aware Hybrid XLM-RoBERTa Framework for Multilingual Cyberbullying Detection in Noisy Social-Media Text

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us