Enhanced Sign Language Translation Using Vision Transformers and Adaptive Representation
Abishek J
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, India
Abishekjegan@gmail.com
Mrs. Saranya K , ME
Assistant Professor
Department of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, Tamil Nadu, India
kansarcge@gmail.com
Adhitya Kiran.K
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, India
adhityakiran10@gmail.com
AakashRaaj.P
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, India
aakasharaajponnurangame@gmail.com
Abstract— Sign language is an essential mode of communication for individuals with hearing impairments, yet real-time translation remains a challenge due to complex hand gestures, facial expressions, and language variations. Traditional deep learning approaches, such as CNNs and RNNs, struggle with sequential dependencies and spatial feature extraction, limiting recognition accuracy. Recent advancements in Vision Transformers have significantly improved image-based learning by utilizing self- attention mechanisms to capture both spatial and temporal dependencies, making them highly effective for gesture recognition. This paper presents a bidirectional Sign Language Recognition and Translation System that employs ViT for sign recognition and a Pre-Recorded Gesture Database for text-to-sign conversion. The system captures real-time video input, extracts gesture features using ViT’s attention-based encoding, and converts recognized gestures into text. Conversely, it maps typed text to a corresponding pre-recorded sign animation, ensuring smooth and natural communication. By eliminating the need for gloss-based intermediaries and improving processing efficiency, the proposed system enhances accuracy, computational efficiency, and real-time performance, offering a scalable. solution for bridging communication gaps in the hearing-impaired community.
KEYWORDS— SIGN LANGUAGE RECOGNITION, VISION TRANSFORMER, GESTURE-TO-TEXT, TEXT- TO- GESTURE, DEEP LEARNING, TRANSFORMER MODELS, REAL-TIME TRANSLATION