Image Caption Generation Using CNN & RNN Architectures for Visually Impaired Assistance
M.Praneeth1, T. Vikas singh2, V. Madhu3, M.Narendra4 , N.Vishnu Vardhan5,G. Shruthi 6 ,
Dr. B. Venkataramana7
1Student, BtechCSE (DS) 4thYear, Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
praneeethkumarmk@gmail.com
2student, BtechCSE (DS) 4thYear, Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
Vikassinghthakur221@gmail.com
3student, BtechCSE (DS) 4thYear, Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
vurlugondamadhugoud@gmail.com
4student, BtechCSE (DS) 4thYear, Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
narendrakumar720721@gmail.com
5student, BtechCSE (DS) 4thYear, Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
naikotivishnuvardhan@gmail.com
6Asst.Prof CSE (DS), Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
geejulasruthi@gmail.com
7Assoc.Prof CSE (DS), Holy Mary Inst.Of Tech.And Science, Hyderabad, TG,India,
venkataramana.b@hmgi.ac.in
ABSTRACT
The project aims to develop an advanced Image Caption Generator using deep learning techniques and computer vision algorithms. In an era of increasing visual content on the internet, the ability to automatically generate descriptive captions for images has become crucial for enhancing accessibility and user experience. This project leverages state-of the-art deep neural networks, specifically Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (RNNs) for generating coherent and contextually relevant captions. The system takes an image as input and employs a pre-trained CNN to extract high level features, creating a rich representation of the visual content. Subsequently, an RNN-based sequence-to-sequence model processes these features to generate natural language captions. To improve the quality and fluency of captions, the model incorporates attention mechanisms, allowing it to focus on different parts of the image while generating each word. The outcome of this project has broad applications in fields such as image indexing, content retrieval, and accessibility, making digital visual content more understandable and engaging for a wide range of users. Additionally, the project contributes to the advancement of deep learning techniques in computer vision and natural language processing, pushing the boundaries of Al capabilities in understanding and describing visual information.
KeyWords Image Captioning,Convolutional Neural Network (CNN),Recurrent Neural Network (RNN),Long Short-Term Memory (LSTM),Encoder-Decoder Architecture,Feature Extraction,Image Features,Sequence Generation,Natural Language Processing (NLP)