Categorization of News Articles
Bhavani1, Seema Nagaraj2
1 Student, Department of MCA, Bangalore Institute of Technology, Karnataka,
India bbjanganni@gmail.com
2 Assistant Professor, Department of MCA, Bangalore Institute of Technology, Karnataka,
India seemanagaraj@bit-bangalore.edu.in
ABSTRACT
In today’s digital world, online platforms generate an enormous amount of news content every day, covering a wide variety of subjects. To make this information more accessible and improve user experience, it is important to organize it efficiently. Relying on manual classification is slow and impractical, which highlights the need for smart systems that can handle this process automatically.
This project introduces a machine learning solution designed to categorize news articles into predefined groups such as Politics, Sports, Technology, Business, and Entertainment. By applying Natural Language Processing (NLP) techniques, the system prepares the raw text through steps like tokenization, stop word removal, stemming, and TF-IDF transformation, converting it into structured features suitable for analysis.
The classification task is carried out using a Multinomial Naive Bayes model, chosen for its simplicity, speed, and proven accuracy in text-based applications. The model is trained on a labelled dataset and tested with performance metrics such as accuracy, precision, recall, and F1-score to ensure reliable outcomes.
Users can provide any piece of text from a news article, and the system will predict the most likely category it belongs to. Beyond automating the classification process, this project demonstrates how machine learning can be effectively applied to text analytics and media. The approach is scalable, adaptable to different domains, and can be expanded to include more categories or even other languages with minimal modifications.
Keywords: News categorization, Natural Language Processing (NLP), Text preprocessing, Naïve Bayes classifier, Machine learning, Classification accuracy, Scalable system.