Detecting Application Defects Using Inter-Project Comparison A Review of System Extension on Category Inequality
Miss.1Jyothika K R , 2 Vaishnavi R S
1Assistant Professor, Department of MCA, BIET, Davanagere
2 Student,4th Semester MCA, Department of MCA, BIET, Davanagere
ABSTRACT
Software fault prediction plays a vital role in enhancing the quality and reliability of software systems. Despite its importance, it encounters key challenges such as class imbalance in fault datasets and the difficulty of building models that generalize effectively across various projects. This study explores these issues through a cross-project analysis, focusing on three central research questions. First, we investigate the impact of class imbalance on prediction accuracy, revealing how disparities in data distribution hinder classifier performance. By applying multiple classification algorithms to a range of datasets from distinct software projects, we demonstrate the need to address imbalance for dependable fault prediction. Second, we assess the effectiveness of cross-project predictions, examining how models trained on one project perform when applied to others. Our results underline the importance of selecting training datasets that share similar traits with the target project to achieve better generalization. Third, we explore how expanding the training dataset with samples from different projects affects prediction outcomes, highlighting the advantages of cross-project learning. Additionally, we provide a detailed comparison of performance metrics such as accuracy, precision, recall, and F1-score across different classifiers. Overall, this research not only underscores the challenges inherent in fault prediction but also offers practical insights into overcoming them, thereby contributing to the development of more robust and generalizable predictive models in software engineering.
Keywords: Software Fault Prediction, Class Imbalance, Cross-Project Analysis, Model Generalization, Predictive Modeling, Software Quality, Classifier Performance, Accuracy, Precision, Recall, F1 Score, Machine Learning, Fault Data, Software Reliability.