Sign Language Detection Using AI and Machine Learning: An LSTM-Based Real-Time Action Recognition System
Nitish Kumar1, Rishabh Jain2, Suraj Kumar Chaubey3, Manmeet Kumar4
Under the Supervision of Mr. Badal Bhushan
1,2,3,4 B.Tech (CSE) – Final Year Students, Dept. of Computer Science & Engineering,
IIMT College of Engineering, Greater Noida — Affiliated to Dr. APJ Abdul Kalam Technical University, Lucknow, UP, India
Email: nitishkumar78@gmail.com | rishabhjain89@gmail.com | surajchaubey107@gmail.com | manmeet66@gmail.com
Abstract
Abstract — Sign language serves as the primary and natural medium of communication for the Deaf and Hard of Hearing (DHH) community worldwide. Despite its linguistic richness and social significance, the automated recognition of sign language by computational systems remains a formidable research challenge. This paper presents a real-time Sign Language Detection system grounded in action recognition principles and powered by Long Short-Term Memory (LSTM) deep learning networks. The system leverages Mediapipe Holistic for accurate multi-landmark extraction across hands, face, and body, and employs a deep stacked LSTM architecture to model the temporal dynamics and sequential dependencies inherent in sign language gestures from continuous live video. A comprehensive training pipeline encompassing video acquisition, morphological preprocessing, key-point feature extraction, sequence formation, and hyperparameter-optimized LSTM training is proposed and evaluated. The system is validated on a multi-class gesture dataset under varied conditions. Experimental outcomes demonstrate a training accuracy of 96.2%, a validation accuracy of 91.5%, and a test accuracy of 87.3%, surpassing traditional static frame-based CNN methods by approximately 13 percentage points. Performance is assessed across precision, recall, F1-score, and real-time inference latency (~44 ms/frame), confirming robustness and practical usability. This research contributes a scalable, cost-effective, and deployable solution that bridges the communication gap for DHH individuals, facilitating their inclusion in educational, healthcare, and everyday social contexts.
Keywords — Sign Language Recognition (SLR), LSTM, Deep Learning, Action Recognition, Mediapipe Holistic, Gesture Recognition, Computer Vision, Temporal Modeling, DHH Communication, OpenCV, TensorFlow