XAI-Driven EDA- An Entropy-Driven Explainable AI Framework for Intelligent Exploratory Data Analysis
Dr.S.Gnanapriya1, Vishnu.S.Nair
1Associate professor, Department of Computer Applications, Nehru College of Management, Coimbatore, Tamil Nadu, India.
ncmdrsgnanapriya@nehrucolleges.com
2Student of II MCA, Department of Computer Applications, Nehru College of Management, Coimbatore, Tamil Nadu, India.
Vishnusudhakaran713@gmail.com
Abstract
Exploratory Data Analysis (EDA) is a fundamental step in data-driven research, enabling analysts to understand data structure, identify patterns, and detect anomalies. However, conventional EDA techniques are largely manual, time-intensive, and heavily dependent on domain expertise, often resulting in high cognitive load, subjective bias, and limited scalability when dealing with complex or high-dimensional datasets. To address these limitations, this paper presents Explainable AI-EDA, an intelligent and automated exploratory data analysis framework that integrates statistical analysis, machine learning, and explainable artificial intelligence into a unified system.
The proposed framework performs automated data profiling, missing value analysis, skewness and kurtosis evaluation, and entropy-based dataset complexity assessment. Machine learning techniques such as K-Means clustering, Isolation Forest-based anomaly detection, and linear regression are employed to uncover hidden patterns, detect outliers, and analyze variable relationships.
The system is implemented as an interactive web-based application that supports real-time visualization, natural language interaction through an AI research assistant, and automated generation of research-ready analytical reports. Experimental evaluation demonstrates that Explainable AI-EDA significantly reduces analyst cognitive load, improves analytical efficiency, and provides scalable and reproducible exploratory analysis.
Keywords:- Explainable AI (XAI), Automated EDA, Large Language Models, Information Entropy, Machine Learning, Data Visualization, Isolation Forest, K-Means Clustering, Cognitive Load, Statistical Profiling..