Predicting Tags for Research Articles: An Extensive Approach Using Hybrid Topic Modeling Techniques
Nagashree Jayaram1, Ravikumar M L2
1,2 PG Scholar, M. Tech in Artificial Intelligence,
1,2 REVA Academy for Corporate Excellence (RACE)
1,2 Reva University, Bangalore, KA, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract
The exponential growth of scientific literature has made it increasingly difficult for researchers to efficiently locate relevant articles within vast online archives. Topic modeling and tagging offer effective solutions to this problem by providing a clear categorization of research articles, which enhances both the recommendation and search processes. This paper builds upon previous efforts in topic prediction by focusing on the prediction of specific tags for research articles based on their abstracts. The study utilizes advanced machine learning techniques, including Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), and Non-Negative Matrix Factorization (NMF) with Count Vectorizer and Term Frequency-Inverse Document Frequency (TF-IDF), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs). The research is conducted in the context of a live hackathon, where the challenge is to predict relevant tags from a predefined set for research articles in four key disciplines: Computer Science, Mathematics, Physics, and Statistics. This approach aims to improve the accuracy and granularity of tag prediction, thereby facilitating more effective information retrieval in academic databases.
Key Words: Tag Prediction, Research Articles, Topic Modelling, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF), Text Classification, Natural Language Processing (NLP), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), Machine Learning, Text Mining, Abstract Analysis, Multi-Label Classification, Scientific Literature