Newspaper Summarizer using Natural Language Processing and Machine Learning.
Nupur Sanjay Jagtap, Kishori Manoj Jadhav, Ojasvi Sanjay More.
Under The Guidance of: Neha R. Hiray.
Sandip Foundations Sandip Institute of Engineering and Management.
ABSTRACT :
In the era of digital information, users are inundated with news articles from numerous sources, resulting in information overload and an overwhelming user experience. This research presents an advanced, real-time Newspaper Aggregator that utilizes Natural Language Processing (NLP) and Machine Learning (ML) techniques to collect, process, and personalize news articles from diverse sources in real-time. The aggregator’s architecture integrates several NLP models to achieve comprehensive news handling: topic modeling categorizes articles into predefined topics such as Politics, Sports, and Technology using Latent Dirichlet Allocation (LDA), while sentiment analysis, powered by BERT, classifies public sentiment as Positive, Negative, or Neutral, capturing nuanced perspectives. The system’s summarization module leverages PEGASUS and Text Rank to deliver coherent, concise summaries, improving information accessibility and reducing reading time. Additionally, the recommendation engine employs a hybrid filtering approach, combining collaborative and content-based filtering, to provide personalized news recommendations based on user history and article characteristics. Our methodology includes systematic data collection, text pre-processing, topic categorization, sentiment classification, summarization, and real-time recommendation, followed by rigorous evaluation. The aggregator achieves high accuracy across tasks: BERT-driven sentiment analysis achieves 92% accuracy, LDA models yield coherent topic clusters, and summarization evaluations produce a ROUGE-L score of 0.75, all of which underscore the system's reliability in managing dynamic news content. Performance testing indicates that this Newspaper Aggregator offers a significant improvement in user relevance and engagement compared to traditional keyword-based systems. Overall, this study establishes a foundation for intelligent, real-time news aggregation, providing users with a streamlined, personalized news experience.
KEYWORDS:
Real-time news aggregation, Natural Language Processing (NLP), Machine Learning (ML), topic modeling, sentiment analysis, BERT, Latent Dirichlet Allocation (LDA), text summarization, PEGASUS, Text Rank, recommendation systems, collaborative filtering, content-based filtering, personalized news, information overload, news categorization, user relevance, article classification, hybrid recommendation model.