Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Download 37
File Size 442.30 KB
File Count 1
Create Date 27/05/2025
Last Updated 27/05/2025

Download

Description

Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures

Guide: Dr. S China Venkateswarlu, Professor, ECE & IARE

Dr. V Siva Nagaraju, Professor, ECE & IARE

Aennam Ashritha patel1

1Aennam Ashritha patel Electronics and Communication Engineering & Institute of Aeronautical Engineering

Abstract -- Acoustic noise significantly degrades speech quality and intelligibility in almost all applications, ranging from telecommunications to voice assistants. In this paper, we address this problem by designing an efficient speech enhancement system based on deep learning. Our approach relies on spectrogram denoising, wherein audio signals are represented as 2D magnitude spectrograms that well maintain signal structure and enable direct application of Convolutional Neural Networks (CNNs).

The backbone of our system is a U-Net model, which is a strong deep convolutional autoencoder capable of approximating the noise model of noisy voice spectrograms. We compiled a heterogeneous dataset carefully by mixing clean English speech from SiSec and LibriSpeech and 10 environmental noise classes from ESC-50 and others, using data augmentation and random noise levelization to encourage model generalization. We trained the U-Net with the Adam optimizer and Huber loss and attained strong performance with training loss 0.002129 and validation loss 0.002406.

In prediction, the trained U-Net estimates the noise model accurately, which is then subtracted from the noisy spectrogram. The denoised magnitude spectrogram is then combined with the original phase, and the enhanced audio is reconstructed using an inverse Short Time Fourier Transform (ISTFT) process. Qualitative evaluations, including visual comparisons of time series and spectrograms, and audio demonstrations, confirm the efficacy of the system in suppressing various noises and preserving speech fidelity, even at high-noise levels. This project demonstrates a real-world and scalable deep learning solution to significant speech quality improvement in noisy environments.

Key Words: speech enhancement, deep learning, spectrogram denoising, U-Net, convolutional neural networks, noise reduction, audio processing.

Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures

Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures

Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us