URL-Based Phishing Detection Using Machine Learning and Deep Learning
Prof. Alfred Thomas , Sreelekshmi K U
Assistant Professor: Rajiv Gandhi Institute of Technology, Kottayam∗Student: Rajiv Gandhi Institute of Technology, Kottayam
Abstract —Website phishing is one of the main threats to the present cyber security world. It is a cyber-fraud in which an imposter will be faking a legitimate website in its content such as the website of a bank or any other organization. The fake one will have the complete features of the original website including color theme, logo, texts, and appearance so distinguishing the fake one and legitimate one will be challenging. Phishing can be detected in many ways and using many tech- niques. URL-based Phishing website detection using Machine Learning (ML) and Deep Learning (DL) is one of the most accurate techniques among them. This project is using ML algorithms such as Random Forest to detect phishing and legitimate websites and comparing the performance with Deep Learning models such as DNN (Deep Neural Networks) and LSTM (Long Short-Term Memory) and Bi-directional LSTM. Data of both legitimate and phishing URLs will be collected using web scraping from the internet and websites like www.phishtank.com instead of using already avail- able datasets. A number of features such as HTML- based features, Domain-related features, and Address bar-related features will be extracted from the raw URLs collected from the internet. Machine learning algorithms are found to be performing very accurately, especially in cases like cyber security where high ac- curacy performance is demanded. So, machine learn- ing algorithms such as Decision Tree, Random Forest, K-Nearest Neighbors (KNN), etc and Deep Learning models such as ANN (Artificial Neural Networks) and DNN are used as the models. For training the LSTM model, the URL data will be processed using Natural Language Processing techniques. The performance of these models is analyzed using performance evaluation measures and metrics such as accuracy, precision and other scores, and the outputs and results will be tabu- lated. The whole system will be converted into a desktop app using Python Tkinter GUI framework.
Keywords—Uniform Resource Locator, Artificial Neu-
ral Network, Natural Language Processing, Long Short- Term Memory, Deep Neural Network, Deep Learning, Machine Learning, Support Vector Machine, K Nearest Neighbors