Automated Identification of Unusual Medicare Claim Activities Using Machine Learning
K.Manjunath, department of Computer Science and Engineering, GNITC, 22-5F6, 22wj1a05f6@gniindia.org
K.Anand Kumar, department of Computer Science and Engineering, GNITC, 22-5G1, 22wj1a05g1@gniindia.org
L.Dhanush, department of Computer Science and Engineering, GNITC, 23-519, 23wj5a0519@gniindia.org
Ms.Rajashree Sutrawe, Associate Professor, department of Computer Science and Engineering, GNITC
Abstract - Healthcare fraud detection is an important challenge in modern healthcare systems due to the large volume of medical claims and the highly imbalanced nature of fraud datasets. Fraudulent claims represent only a small portion of the total claims, which makes it difficult for traditional machine learning models to accurately identify fraudulent activities. Conventional techniques such as Random Oversampling (ROS), Synthetic Minority Oversampling Technique (SMOTE), and Random Undersampling (RUS) are commonly used to address class imbalance, but they often lead to issues such as overfitting, noise generation, or loss of important information. In this study, a machine learning-based approach is proposed to improve Medicare fraud detection using the Medicare Part B dataset. The proposed framework applies a hybrid resampling technique called SMOTE-ENN, which combines SMOTE for generating synthetic minority samples with Edited Nearest Neighbors (ENN) to remove noisy and irrelevant data instances. Logistic Regression is used as the classification algorithm to detect fraudulent healthcare claims. The model performance is evaluated using multiple metrics, including accuracy, precision, recall, F1-score, AUC-ROC, and Area Under the Precision-Recall Curve (AUPRC). Experimental results demonstrate that the proposed approach achieves an accuracy of 98%, indicating its effectiveness in handling imbalanced datasets and improving fraud detection in healthcare systems.
Key Words: Healthcare Fraud Detection, Machine Learning, SMOTE-ENN, Logistic Regression, Imbalanced Data, Medicare Claims.