Health Insurance Premium Prediction System Using Machine Learning





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 14
File Size 421.97 KB
File Count 1
Create Date 23/04/2026
Last Updated 23/04/2026

Download

Description

Health Insurance Premium Prediction System Using Machine Learning

Soham Khadse

Yash Wahane

SahilGangane

Dept. of Computer Science and Engineering

Jhulelal Institute of Technology

Nagpur,India sohankhadse532@gmail.com

yashwahane101@gmail.com

sahilgangane@gmail.com

Krish Durugkar

Dept.of Computer Science and Engineering

Jhulelal Institute of Technology

Nagpur,India durugkarkrish@gmail.com

Samir Sheikh

Dept. of Computer Science and Engineering

Jhulelal Institute of Technology

Nagpur,India samirsheikh@gmail.com

Prof. Rahul Bambodkar Dept.of Computer Scienceand Engineering

Jhulelal Institute of Technology

Nagpur,India r.bambodkar@jitnagpur.edu.in

Abstract—Health insurance premium pricing remains one of the most complex and consequential challenges in the global healthcare and financial services sectors. Premiums directly determine the affordability and accessibility of health coverage for individuals, families, and enterprises, while simultaneously dictating the financial viability and risk exposure of insurance providers. Despite its critical importance, the conventional process of premium determination relies heavily on rule-based actuarial tables and manual underwriting protocols that are rigid, opaque, and often inadequate in capturing the multidimensional nature of individual health risk. This paper presents a comprehensive machine learning-based Health Insurance Premium Prediction System that integrates demographic attributes, lifestyle indicators, geographic factors, and medical history variables to estimate insurance premiums in an accurate, transparent, and personalized manner. The proposed system trains and rigorously compares four supervised regression algorithms—Linear Regression, Decision Tree Regression, Random Forest Regression, and XGBoost Regression—on a real-world structured healthcare dataset of 1,338 records sourced from the Kaggle Medical Cost Personal Dataset.

Comprehensive preprocessing including missing value treatment, feature encoding, normalization, and feature Error (MAE) of 1,978 USD, and Root Mean Square Error (RMSE) of 3,312 USD on the held-out test set. SHAP (SHapley Additive exPlanations) value analysis is employed to interpret model predictions and quantify individual feature contributions, confirming that smoking status, age, BMI, and number of dependents are the dominant risk factors. Beyond prediction, the system incorporates a three-tier risk classification engine (Low, Moderate, High Risk) and is deployed as an interactive web application accessible to policyholders, insurance agents, and healthcare organizations. Future directions include integration of real-time wearable health data, federated learning for privacy-preserving distributed training, and deep learning architectures for longitudinal risk modelling.

Keywords—health insurance premium prediction, machine learning, supervised regression, Random Forest, XGBoost, SHAP explainability, risk categorization, actuarial pricing, healthcare analytics, feature engineering

Health Insurance Premium Prediction System Using Machine Learning

Health Insurance Premium Prediction System Using Machine Learning

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Health Insurance Premium Prediction System Using Machine Learning

Health Insurance Premium Prediction System Using Machine Learning

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us