Spammer Detection and Fake User Identification on Social Media
T. Meghana, J. V. B. S. Prem Sai, K. Deekshitha, A. Gnanesh Kumar
Department of CSC, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, 531162, India
{21981a4654, 21981a4620, 21981a4623, 21981a4601}@raghuenggcollege.in
Abstract— This paper presents about the detection of spammers and fake user accounts by using a machine learning model which is logistic regression using binary classification through a flask-based web application. The data set which is used for the training of the Machel learning model consist of 576 user profile characterized by 11 attributes which is presence of profile pic, username length, length of the full name of the user, user profile description length, presence of external URL’s, number of words in full name, is user name equals to full name, is user name public or private, number of followers, number of following, number of posts. These are the 11 attributes which is used to detect whether is a user is real or fake. The machine learning model is trained on these attributes by using Python’s SCIKIT-LEARN pipeline and column transfer module. The proposed models to tackle this problem was Logistic Regression, Random Forest, and Decision Tree. After the training and the testing the models give a accuracy results of 91%, 88% and 83% respectively. The web application of this problem is developed by using Flask module of python which provides a userfriendly interface for real time prediction of the outcomes, which classifies the provided user data as genuine ID or fake ID. This work of ours will demonstrate a practical solution for the identification of spammers and fake user profiles which will contribute to the enhancement of integrity and security of online social media platforms.
Index Terms— Fake user identification, Spammer detection, Machine learning, Logistic Regression, Flask web application, Social media security, Dataset preprocessing, Scikit-learn, Classification, Online trust, Feature engineering, User profile analysis, Social engagement metrics, Platform integrity