Removing Noisy and Irrelevant Information in Lipnet for Silent Communication Using Residual Attention Block
CH Jayaprakash Reddy
Department of Information Technology,
Vardhaman College of Engineering
Hyderabad-Telangana, 501286, India jayaprakashreddy0125@gmail.com
CH Meghanadh
Department of Information Technology,
Vardhaman College of Engineering
Hyderabad-Telangana, 501286, India Charpameghanadh97@gmail.com
M Sona Reddy
Department of Information Technology,
Vardhaman College of Engineering
Hyderabad-Telangana, 501286, India sonaareddy27@gmail.com
Dr B K Madhavi
Department of Information Technology,
Vardhaman College of Engineering
Hyderabad-Telangana, 501286, India Madhavi1593@vardhaman.org
Y Charan Raj
Department of Information Technology,
Vardhaman College of Engineering
Hyderabad-Telangana,501286,India yeluricharannetha143@gmail.com
Abstract— In order to facilitate communication in settings where auditory signals are inaccurate or missing, silent visual speech recognition attempts to decode spoken content only from lip movements. Although LipNet, a groundbreaking end-to-end architecture for sentence-level lip-reading, performs admirably, it is nevertheless susceptible to irrelevant facial motion, background noise, and changes in lighting. In order to minimize noisy, redundant features while maintaining significant spatiotemporal patterns, we propose an improved LipNet model in this study by incorporating Residual Attention Blocks (RABs) into the convolutional feature extraction stage. While residual connections preserve steady gradient flow during training, the attention mechanism selectively highlights discriminative lip-motion cues by combining channel and spatial weighting. When compared to the baseline LipNet, experimental evaluation shows that the suggested model achieves better word-level and character-level accuracy with notable decreases in Word Error Rate (WER). These findings demonstrate how well residual attention reinforces strong lip-motion representation and point to a possible path for silent communication systems that can withstand noise.
Keywords—LipNet, Visual Speech Recognition, Silent Communication, Lip Reading, Residual Attention Block, Spatiotemporal Feature Extraction, Deep Learning, 3D Convolutional Neural Networks, Bidirectional GRU, Connectionist Temporal Classification, Attention Mechanism.