DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 15
File Size 602.51 KB
File Count 1
Create Date 08/03/2026
Last Updated 08/03/2026

Download

Description

DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment

Nikhil Yadav, Mayur Raval, Om Yadav, Atharav Chougule, Ritesh Upadhye
Department of Computer Science and Engineering (AIML)
Shivaji University, Kolhapur, India

Abstract—

The rapid advancement of generative artificial intelligence has led to the widespread creation of highly realistic deepfake content across images, videos, audio, and text. While such technologies offer innovative applications, they also pose significant risks to digital trust, cybersecurity, and information integrity. Existing deepfake detection methods often rely on unimodal analysis, which limits their ability to detect sophisticated multimodal manipulations. To address this limitation, this paper proposes DeepGuard, a comprehensive multimodal deepfake detection framework that integrates image, video, audio, and textual analysis using an attention-based fusion strategy.

The proposed system employs pretrained MobileNetV2 models for feature extraction from images, video frames, and audio spectrograms, ensuring computational efficiency and robust representation learning. Textual features are extracted using TF–IDF vectorization and classified through a Multinomial Naïve Bayes model. The modality-specific embeddings are projected into a shared latent space and adaptively fused using a learnable attention mechanism that dynamically assigns importance weights based on contextual relevance.

Experimental results demonstrate that the proposed multimodal approach outperforms unimodal baselines and static fusion methods across standard evaluation metrics. The lightweight architecture further supports scalable deployment in cloud and edge environments. The DeepGuard framework provides an efficient and practical solution for detecting evolving deepfake threats in real-world multimedia systems.

Keywords—

Deepfake Detection, Multimodal Fusion, CNN, LSTM, Transformer, Attention, Explainable AI

DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment

DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment

DEEPGUARD: A Comprehensive Multimodal Deepfake Detection Framework with Attention-Based Fusion, Explainability, and Scalable Deployment

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us