A Comparative Evaluation of RandomForest and XGBoost within the ELDB Multi-Instance Learning Framework
Mutyala Ratna Kumar1, Namburi Chirandan2 ,Muvva Lakshmi Narayana3, Medikonduru Maithili Saisree4
1,2,3,4 Department of Computer Science and Engineering, R.V.R & J.C College of Engineering, Guntur, India
ABSTRACT
Multi-Instance Learning (MIL) addresses classification problems where labels are as- signed to bags of instances rather than indi- vidual instances. The Multi-Instance Ensem- ble Learning with Discriminative Bags (ELDB) algorithm is a notable mapping- based MIL approach that transforms bags into a new feature space before classification. The performance of ELDB depends signifi- cantly on the classifier employed on this mapped representation. This study investi- gates the integration and performance of modern ensemble classifiers, specifically RandomForest (RF) and XGBoost (XGB), within the ELDB framework, using their de- fault parameters. These were compared against the baseline classifiers originally con- sidered (k-Nearest Neighbors (kNN), Sup- port Vector Machines (SVM), and Decision Trees (J48)) using an implementation based on the work by Yang et al. (2022). Experi- ments were conducted using 10-fold cross-validation on standard MIL benchmark da- tasets including Musk1+, Fox+, and Tiger+, with F1-score as the primary evaluation met- ric. Results indicated that performance is highly dataset-dependent; while XGBoost showed strong performance on Fox+, kNN remained the top performer on Musk1+. Sig- nificant variability in performance across folds was also observed, particularly on the Tiger+ dataset. This study demonstrates the feasibility of integrating RF and XGBoost into ELDB and highlights that while these models are competitive, the optimal classifier choice is contingent on the dataset character- istics, warranting careful selection or further tuning within the ELDB framework.
Keywords: Multi-Instance Learning, En- semble Learning, ELDB, RandomForest, XGBoost, Machine Learning, Classification, Benchmark Datasets