Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
1 Guide: Dr. S China Venkateswarlu , Professor, ECE & IARE
2 Guide: Dr. V Siva Nagaraju, Professor, ECE & IARE
Jinnuri Charishma
1Jinnuri Charishma Electronics and Communication Engineering & Institute of Aeronautical Engineering
Abstract: Knowledge distillation (KD) is a widely used model compression technique that enables smaller, computationally efficient models to inherit the performance benefits of larger, high-capacity models. In this study, we investigate the application of KD in training noise-robust speech enhancement models to improve automatic speech recognition (ASR) in adverse acoustic environments. Traditional speech enhancement models often struggle to balance noise suppression and speech intelligibility, leading to degradation in ASR performance. To address this, we propose a KD-based training framework where a powerful teacher model, trained on high-quality speech enhancement tasks, guides the learning process of a lightweight student model.
The proposed approach employs both frame-level and sequence-level distillation techniques to ensure that the student model learns critical speech representations while maintaining noise suppression effectiveness. The frame-level loss helps retain fine-grained speech features, whereas sequence-level loss enhances the overall intelligibility of the reconstructed speech. We evaluate our framework on multiple noisy datasets, including real-world and synthetic noise conditions, using standard ASR benchmarks. Our results demonstrate that KD-based speech enhancement significantly improves ASR performance compared to conventional noise reduction techniques. Additionally, the student model achieves comparable performance to the teacher while maintaining a reduced computational footprint, making it suitable for real-time applications.
By leveraging knowledge distillation, our approach enhances the generalization ability of speech enhancement models, enabling robust ASR performance across various noise types and intensities. Furthermore, the lightweight student model reduces latency and energy consumption, making it ideal for deployment in resource-constrained environments such as edge devices and mobile applications. The findings of this study contribute to advancing noise-robust ASR and demonstrate the effectiveness of KD in optimizing speech enhancement models for practical use cases.
Keywords:Knowledge Distillation, Speech Enhancement, Noise-Robust ASR, Deep Learning, Automatic Speech Recognition, Model Compression, Neural Networks, Noise Suppression, Lightweight Models, Real-Time Speech Processing.