A Hybrid Architecture for Multi-Category Text Classification Using BERT and Graph Embeddings
1, *Vipin Kataria, 2, *Nitin Kumar
1Picarro Inc, Santa Clara, California, USA
2Marriott International, Bethesda ,Maryland, USA
*These Authors contribute equally to the Paper
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Text classification, the task of assigning predefined categories to textual data, has become increasingly vital in the digital age as organizations struggle to manage and extract value from vast amounts of unstructured information. This study explores a novel hybrid architecture for multi-category text classification called BEGNN (BERT-Enhanced Graph Neural Network), which integrates the semantic richness of BERT embeddings with the structural capabilities of graph neural networks. The research applies this approach to a dataset containing 2,225 text samples across five distinct categories: politics, sport, technology, entertainment, and business. The proposed BEGNN architecture processes text through parallel pathways - extracting contextual semantic features via BERT while simultaneously modeling document structure through graph representations - before integrating these complementary features for classification. Experimental results demonstrate the superior performance of our approach, achieving 99% accuracy, precision, recall, and F1-scores across all categories, outperforming established models including traditional machine learning methods (SVM, Logistic Regression) and other deep learning approaches (BERT, BiLSTM). The confusion matrix analysis reveals exceptional classification capability with minimal misclassifications, particularly for Sport and Business categories. This research contributes to the advancement of text classification by effectively combining semantic and structural text representations, offering significant improvements for applications requiring high precision in document categorization.
Key Words: Text Classification, BERT, Graph Neural Networks, Natural Language Processing, Multi-category Classification, Hybrid Architecture, Semantic Feature Extraction, Structural Feature Extraction, Co-Attention Mechanism, Deep Learning, News Categorization, Document Classification, BEGNN