Advancing Deepfake Detection: A Comprehensive Review of Multimodal Fusion Methods
Yogesh Kadam
Dept. of Computer Engineering
Bharati Vidyapeeth College of Engineering, Lavale,pune
Pune, India
yogesh.kadam@bharatividyapeeth.edu
Pradnya Vavale
Dept. of Computer Engineering
Bharati Vidyapeeth College of Engineering, Lavale,pune
Pune, India
vavalepradnya8@gmail.com
Prachi Nikalje
Dept. of Computer Engineering
Bharati Vidyapeeth College of Engineering, Lavale,pune
Pune, India
prachinikalje850@gmail.com
Akshay Redekar
Dept. of Computer Engineering
Bharati Vidyapeeth College of Engineering, Lavale,pune
Pune, India
akshayredekar4545@gmail.com
Mansi Alhat
Dept. of Computer Engineering
Bharati Vidyapeeth College of Engineering, Lavale,pune
Pune, India
mansialhat29@gmail.com
Abstract— The emergence of deepfake content, which is propelled by a range of generative models such as GANs and autoencoders, severely undermines digital trust, security, and information integrity. Traditional unimodal detection—such that is focused exclusively on audio, video, or text—has quickly lost its effectiveness in the battle against advanced deepfakes capable of exploiting more than one modality. This study provides a comprehensive review of fusion-based multimodal deepfake detection techniques that categorizes them into early, late, hybrid, and attention-based fusion approaches. The authors provide an in-depth discussion of the benefits and limitations of these methodologies, demonstrate the feature extraction pipelines, and give a performance comparison on various sample datasets, including FakeAVCeleb, DFDC, and PolyGlotFake. In addition, the paper presents cross-modal difficulties, ethical considerations, and real-world implementation limitations. In total, the paper integrates recent literature findings to present the directions to future trends, technical barriers, and new research paths, and it calls attention to the need for generalizable, strong, and interpretable fusion models to deal with the more and more sophisticated threats from synthetic media.
Keywords— Deepfake detection, multimodal fusion, generative adversarial networks (GANs), autoencoders, early fusion, late fusion, hybrid fusion, attention mechanisms, feature extraction, cross-modal analysis, synthetic media, dataset evaluation.