Deepfakespotter: Vision Transformer–Based Deepfake Detection System
Dr. M. Hemalatha1; M. RAMESH KRISHNA 2
1Assistant Professor Department of Computer Science, Sri Ramakrishna College of Arts & Science
2PG Student, Department of Computer Science, Sri Ramakrishna College of Arts & Science
ABSTRACT
The rapid growth of artificial intelligence and deep learning has revolutionized digital media creation, enabling the generation of highly realistic synthetic images and videos known as deepfakes. These deepfakes are created using advanced neural networks such as Generative Adversarial Networks (GANs) and diffusion models, making manipulated media visually indistinguishable from authentic content. While such technology has beneficial applications in entertainment, education, and virtual reality, it also introduces serious threats including misinformation propagation, identity fraud, political manipulation, and erosion of public trust in digital media.
This project, Deepfake Spotter, presents a robust and explainable deepfake detection system based on Vision Transformer (ViT) models. Unlike traditional convolution-based approaches, Vision Transformers leverage self-attention mechanisms to capture global contextual relationships across image patches, enabling more effective identification of subtle manipulation artifacts. The system is implemented using TensorFlow/Keras and deployed through a Streamlit based web interface, allowing users to upload both images and videos for analysis.
The proposed system performs frame-level processing for video inputs, aggregates predictions, and provides an authenticity probability score. Additionally, Grad-CAM heatmap visualizations are generated to highlight regions that significantly influence the model’s decision, improving interpretability and trust. This combination of accuracy, usability, and explainability makes Deepfake Spotter suitable for academic research, digital forensics, and real-world media verification applications.
Keywords—Deepfake Detection, Vision Transformer, Streamlit, Computer Vision, Explainable AI, Media Forensics