Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 47
File Size 394.68 KB
File Count 1
Create Date 25/05/2025
Last Updated 25/05/2025

Download

Description

Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition

1 Guide: Dr. S China Venkateswarlu , Professor, ECE & IARE

2 Guide: Dr. V Siva Nagaraju, Professor, ECE & IARE

Jinnuri Charishma

1Jinnuri Charishma Electronics and Communication Engineering & Institute of Aeronautical Engineering

Abstract: Knowledge distillation (KD) is a widely used model compression technique that enables smaller, computationally efficient models to inherit the performance benefits of larger, high-capacity models. In this study, we investigate the application of KD in training noise-robust speech enhancement models to improve automatic speech recognition (ASR) in adverse acoustic environments. Traditional speech enhancement models often struggle to balance noise suppression and speech intelligibility, leading to degradation in ASR performance. To address this, we propose a KD-based training framework where a powerful teacher model, trained on high-quality speech enhancement tasks, guides the learning process of a lightweight student model.

The proposed approach employs both frame-level and sequence-level distillation techniques to ensure that the student model learns critical speech representations while maintaining noise suppression effectiveness. The frame-level loss helps retain fine-grained speech features, whereas sequence-level loss enhances the overall intelligibility of the reconstructed speech. We evaluate our framework on multiple noisy datasets, including real-world and synthetic noise conditions, using standard ASR benchmarks. Our results demonstrate that KD-based speech enhancement significantly improves ASR performance compared to conventional noise reduction techniques. Additionally, the student model achieves comparable performance to the teacher while maintaining a reduced computational footprint, making it suitable for real-time applications.

By leveraging knowledge distillation, our approach enhances the generalization ability of speech enhancement models, enabling robust ASR performance across various noise types and intensities. Furthermore, the lightweight student model reduces latency and energy consumption, making it ideal for deployment in resource-constrained environments such as edge devices and mobile applications. The findings of this study contribute to advancing noise-robust ASR and demonstrate the effectiveness of KD in optimizing speech enhancement models for practical use cases.

Keywords:Knowledge Distillation, Speech Enhancement, Noise-Robust ASR, Deep Learning, Automatic Speech Recognition, Model Compression, Neural Networks, Noise Suppression, Lightweight Models, Real-Time Speech Processing.

Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition

Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition

Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us