Identifying Fraudulent Apps on Google Play Store: A Decision Tree and LSTM Approach
Patnam Rochan, Prathi Rahul Sandeep, Dr.R.Shalini.M.E.ph.D.,
patnamrochan@gmail.com, rahulsandeep959@gmail.com, shalini.r.cse@sathyabama.ac.in,
Abstract The increasing prevalence of fraudulent applications on app stores, such as the Google Play Store, poses severe challenges to user privacy, financial security, and the reputation of legitimate developers. These fraudulent apps often exploit vulnerabilities by mimicking genuine applications, employing deceptive practices such as fake reviews, excessive permissions, and sudden rating spikes to appear trustworthy. Traditional static detection methods struggle to adapt to these evolving fraud strategies.This research introduces a novel hybrid approach that combines Decision Trees for feature importance ranking with Long Short-Term Memory (LSTM) networks for capturing temporal patterns in app behavior. The Decision Tree model identifies critical attributes such as permissions, app size, and user review sentiments that are most indicative of fraud. The LSTM model processes temporal data, such as sudden spikes in app downloads or ratings over time, to identify sequential patterns that are characteristic of fraudulent activity. The proposed system was evaluated on a comprehensive dataset containing app metadata, user reviews, and behavioral trends, demonstrating significant improvements in detection accuracy, precision, recall, and F1-score compared to traditional machine learning techniques like logistic regression and random forest. The integration of feature importance analysis and sequential modeling not only enhances detection accuracy but also provides interpretability, enabling developers and platform administrators to better understand fraudulent patterns. This hybrid approach offers a scalable, dynamic, and effective solution for safeguarding app stores and protecting users from malicious apps.
Keywords
Fraudulent Applications, Google Play Store, Fraud Detection, Decision Trees, Long Short-Term Memory (LSTM), Hybrid Model, Sequential Modeling, App Metadata, Temporal Patterns, User Reviews Analysis, Machine Learning, Deep Learning, Cybersecurity