Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 22
File Size 360.30 KB
File Count 1
Create Date 28/08/2025
Last Updated 28/08/2025

Download

Description

Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique

Mr. Siddesh K T 2 , Ganesh Maruti Damodar1

2Assistant Professor, Department of MCA, BIET, Davanagere

1 Student,4th Semester MCA, Department of MCA, BIET, Davanagere

ABSTRACT

Detecting healthcare fraud is a complex and continually evolving challenge, particularly due to the difficulties posed by imbalanced datasets. Traditional machine learning (ML) approaches have been widely explored in past research but often struggle with data imbalance. Techniques such as Random Oversampling (ROS) can lead to overfitting, SMOTE (Synthetic Minority Oversampling Technique) may introduce noise, and Random Undersampling (RUS) can result in the loss of critical information. To address these limitations, it is essential to enhance model accuracy through advanced resampling methods and improved evaluation metrics. This study introduces an innovative strategy for addressing data imbalance in healthcare fraud detection, focusing on the Medicare Part B dataset. Initially, the categorical feature "Provider Type" is extracted and used to increase minority class variety by replicating existing entries. Following this, a hybrid technique known as SMOTE-ENN—combining SMOTE with Edited Nearest Neighbors (ENN)—is implemented. This approach not only generates synthetic samples but also filters out noisy data, leading to a more balanced and cleaner dataset. We evaluate six different ML models using standard metrics such as accuracy, precision, recall, F1-score, and the AUC-ROC curve, with additional emphasis on the Area Under the Precision-Recall Curve (AUPRC) due to its effectiveness in imbalanced settings. Experimental results demonstrate that the Decision Tree classifier outperforms all others, achieving an exceptional 0.99 score across all evaluation metrics.

Keywords: Healthcare Fraud Detection, Imbalanced Data, Medicare Part B, SMOTE-ENN, Machine Learning, Data Resampling, Decision Tree, AUPRC, Classification Models, Synthetic Oversampling, Noise Reduction, Evaluation Metrics

Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique

Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique

Employing Machine Learning Methods to boost the Medicare program Theft Finding: Resolving Category Bias with Synthetic Minority Over-sampling Technique

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us