A Vision-Based Computational Recognition Using MediaPipe Hand Landmarks and CNN
Ms saraswathi 1 , Naveen M 2, Dhanush Kumar M 3, Jeyaseelapandi S 4,
Assistant professor , CST, SNS College of Engineering, Coimbatore – 641107. Email: saraswathi.r.cst@snsce.ac.in
Final Year, CST, SNS College of Engineering, Coimbatore – 641107. Email: mathavandhanush341@gmail.com
Final Year, CST, SNS College of Engineering, Coimbatore – 641107. Email: naveenm3109@gmail.com
Final Year, CST, SNS College of Engineering, Coimbatore – 641107. Email: jeyaseelapandi9626@gmail.com
ABSTRACT :
The “Vision-Based Computational Recognition Using MediaPipe Hand Landmarks and CNN” system is designed to bridge the communication gap between hearing-impaired and non-signing individuals through real-time sign language recognition. The proposed framework employs MediaPipe for efficient hand landmark extraction and a Convolutional Neural Network (CNN) for accurate gesture classification. The system translates American Sign Language (ASL) gestures into text and speech, providing a seamless and accessible mode of interaction. By combining real-time hand tracking, feature extraction, and deep learning-based recognition, the framework achieves high accuracy and low latency even under varying lighting and background conditions. This solution not only enhances communication accessibility but also supports inclusive human–computer interaction, enabling practical applications in education, healthcare, and assistive technologies.Keywords — MediaPipe, Convolutional Neural Network, Sign Language Recognition, Computer Vision, Accessibility The system translates American Sign Language (ASL) gestures into text and speech, providing a seamless and accessible mode of interaction. By combining real-time hand tracking, feature extraction, and deep learning-based recognition, the framework achieves high accuracy and low latency even under varying lighting and background conditions. This solution not only enhances communication accessibility but also supports inclusive human–computer interaction, This framework utilizes MediaPipe to extract hand landmarks consistently in various backgrounds and lighting conditions and employs a Convolutional Neural Network (CNN) to classify the gestures into their respective outputs. Users can benefit from this system because it provides real-time recognition, reduces dependency on interpreters, and improves accessibility. The method used in the development of this application is the Web Engineering method with stages of communication, planning, modeling, and deployment, supported by Python programming language with MediaPipe and TensorFlow frameworks, and tested using Black Box Testing
Keywords – MediaPipe, Convolutional Neural Network, Sign Language Recognition, Deep Learning, Computer Vision, Accessibility, Human–Computer Interaction.