CREDIT CARD FAULT DETECTION USING ISOLATION FOREST AND LOCAL OUTLIER FACTOR METHOD
Alisha Gaikwad
Research Scholar
Dr. Rohit Miri
Head of department
Department of Computer Science, Dr. C.V.Raman University Kota, Bilaspur Chhattisgarh
Abstract:
The fast rise of the e-commerce industry has resulted in an exponential increase in the usage of credit cards for online purchases, resulting in an increase in fraud. In recent years, identifying fraud in the credit card system has grown extremely challenging for banks. In order to detect credit card fraud in transactions, machine learning is essential. Banks utilise a variety of machine learning approaches to forecast these transactions, as well as historical data and new variables to improve the prediction capability. The sampling strategy on the data-set, the selection of variables, and the detection algorithms utilised all have a significant impact on the performance of fraud detection The efficacy of logistic regression, decision trees, and random forests for detecting credit card fraud is investigated in this research. Kaggle provided a credit card transaction data collection with a total of 2,84,808 credit card transactions from a European bank data source. It divides transactions into two categories: "positive class" and "negative class." The data set is substantially skewed, with around 0.172 percent of transactions being fraudulent and the remainder being legitimate. We used oversampling to balance the data set in this article, which resulted in 60% fraudulent transactions and 40% legitimate transactions. The dataset is subjected to the three approaches, and the work is carried out in R. The effectiveness of the strategies is assessed depending on a variety of factors sensitivity, specificity, accuracy and error rate. Isolation Forest and Local Outlier Factor have accuracy values of 99.7 and 99.6, respectively. The Random forest outperforms the logistic regression and decision tree procedures, according to the results.
Keywords: Fraud detection, Credit card, Logistic regression, Decision tree, Random forest.