Cyberbullying Detection: A Machine Learning Approach using Social Media Data
Mohammad Zakariya Dr. Nupur Soni
School Of Computer Application Associate Professor
Babu Banarasi Das University, Lucknow,India School Of Computer Application
Babu Banarasi Das University, Lucknow,India
Abstract- In recent years, the prevalence of cyberbullying on social media platforms has grown to such an extent that automated techniques to identify and stop it have become necessary. This study focuses on developing a machine learning-based model that uses text classification techniques to identify occurrences of cyberbullying in social media content. Using a dataset acquired from Kaggle that includes labeled social media data especially associated with different types of cyberbullying, the study uses a supervised learning approach. After preprocessing the dataset to eliminate noise and ensure uniformity, the Term Frequency-Inverse Document Frequency (TF-IDF) approach is used to extract features. Several machine learning algorithms, including Support Vector Machines (SVM), Decision Trees, Random Forests, and Naive Bayes, are trained and evaluated using standard classification metrics, such as accuracy, precision, recall, and F1-score. The experimental results show that the SVM model achieves the highest accuracy of 83%, outperforming the other algorithms in classifying both cyberbullying and non-cyberbullying content. The study's findings demonstrate the potential of machine learning techniques in combating cyberbullying on social media by automatically identifying harmful content, thus contributing to creating safer online spaces. The research also highlights challenges in handling imbalanced datasets and the need for further improvements in model performance.
Keywords- Cyberbullying Detection, Machine Learning, Social Media, Text Classification, Supervised Learning, Natural Language Processing (NLP), Term Frequency-Inverse Document Frequency (TF-IDF), Support Vector Machines (SVM), Decision Trees, Random Forests, Naive Bayes, Data Preprocessing, Feature Extraction, Model Evaluation, Imbalanced Dataset.