- Download 18
- File Size 264.22 KB
- File Count 1
- Create Date 26/05/2025
- Last Updated 26/05/2025
Deepfake Voice Detection Using Machine Learning
Amruthesh S G
UG Student, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Maninder Kaur
UG Student, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Tejaswini G H
UG Student, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Mokshaprada P
UG Student, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Ms.Sowjanya Lakshmi. A
Assistant Professor, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Dr G C Bhanu Prakash
Professor and Head of Department, Department of Information Science and Engineering,
Sir M Visvesvaraya Institute of Technology, Bengaluru, Karnataka, India
Abstract
With the advent of deep learning, audio synthesis techniques has advanced so quickly that it’s hard to tell real speech from fake ones. In this work, we leverage voice-conversion and synthesis approaches to design and implement a robust method for fake speech detection targeted towards LA threats. To increase size of dataset and generalize model, the system uses deep learning model with data augmentation that includes techniques like time stretching, pitch shifting and volume scaling. Some of the pre-processing steps include normalizing, removing noise, and converting the audio (segmented in chunks of 4 seconds) into ABRs.Melspectrograms obtained by Znormalization and the Fast Fourier Transform (FFT) are used as feature representations. The proposed architecture consists of multiple layers of convolutions, both 2D and 1x1 convolutions, batch normalization, max-pooling, ReLU activation and fully connected layers. Dropout and other possible sources of regularisation are used to increase the model resistance to overfitting. We used ASVspoof 2019 corpus for the training and testing of the model and further add the various techniques to the simulate the real world. For the analysis of classification, confusion matrix along with accuracy, precision, recall, F1-score and ROC-AUC were used. Results indicate the system was effective in discriminating between genuine and deceptive speech, and had high levels of accuracy in detecting deception.Our goal is to create an artificial speech recognition system that uses deep learning to differentiate between real and fake speech. Our method makes advantage of the ASVspoof 2019 benchmark, which comprises an extensive set of spoof and real speech samples. We use Mel-Frequency Cepstral Coefficients (MFCCs) and spectrograms to extract key components that aid in the detection of significant speech patterns. The learnt characteristics are then examined using a Convolutional Neural Network (CNN), a potent deep learning framework that can identify patterns in both audio and visual data. Unprocessed audio input is converted into MEL spectrograms, preprocessed to reduce noise and improve clarity, and then supplied into the CNN model for classification as part of the training pipeline's systematic process.By providing a scalable, useful, and widely applicable defence against emerging audio spoofing attacks, this study significantly enhances voice-based security systems.