AI- Based Multilingual Hand Sign Recognition System Using Computer Vision
KRISHNA .S,MGA S , BHAVANI S
Department of Artificial Intelligence and Data Science, Nehru Institute of Engineering and Technology, Coimbatore, Tamil Nadu 641005, India
MENTOR : MS.SIKA
Department of Artificial Intelligence and Data Science Nehru Institute of Engineering and Technology
ABSTRACT
This paper presents an AI-driven multilingual hand sign recognition system that integrates computer vision, deep learning, and natural language processing to enable real-time gesture interpretation and cross-lingual communication. The proposed system addresses the limitations of conventional sign language interpretation methods by providing an automated, scalable, and language-independent framework for translating hand gestures into both textual and auditory outputs.
The system utilizes a vision-based processing pipeline in which input video streams are analyzed using MediaPipe for efficient hand detection, localization, and three-dimensional landmark extraction. The extracted spatial features are fed into a Convolutional Neural Network (CNN) optimized for multi-class gesture classification. To enhance model generalization and reduce overfitting, the network is trained on a structured dataset of annotated hand gestures using data augmentation techniques such as rotation, scaling, and normalization. Model training is performed using categorical cross-entropy loss and adaptive optimization algorithms to ensure stable and efficient convergence.
Following classification, the predicted gesture labels are mapped to semantic text representations, which are then processed through a neural machine translation module to generate multilingual outputs. A text-to-speech (TTS) synthesis component further converts the translated text into speech, enabling multimodal interaction. The system is designed to operate under real-time constraints with low computational latency, making it suitable for practical deployment in assistive environments.
Experimental results demonstrate that the proposed framework achieves high recognition accuracy and robustness across varying environmental conditions. The combination of lightweight landmark-based feature extraction and deep learning-based classification significantly improves computational efficiency compared to traditional image-based methods.
Overall, the proposed system contributes to the advancement of assistive technologies by providing a comprehensive, real-time, and scalable solution for inclusive communication.