Anamoly Based Malware Detection Using AutoEncoders
Mrs. A. Navya (Guide), Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Sai Sarath Jeedigunta, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Raju Katila, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Anusha Tangudu, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Charan Sai Tadapaneni, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Akhil Chand Kothapalli, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
Hrishi Raghavi Peela, Computer science & Engineering Department, Raghu Engineering College, Visakhapatnam, Andhra Pradesh, India.
ABSTRACT
Malware detection is a critical aspect of cybersecurity in today's digital landscape. With the rapid evolution of malicious software and new variations emerging daily, traditional signature-based detection methods fall short. This research presents a hybrid malware detection system that integrates machine learning algorithms (Random Forest, Autoencoder), YARA rule-based detection, and static file analysis to achieve superior detection capabilities. The system processes executable files through multiple detection engines simultaneously, extracting features such as entropy, file headers, imported functions, and section characteristics. A Random Forest classifier trained on the EMBER dataset [1] provides behavioral classification, while a 4-layer Autoencoder [13] utilizing PyTorch [8] identifies anomalies and previously unseen threats. YARA rules [4] enable signature-based malware family identification with real-time rule updates from the Yara Community Repository [12]. The hybrid approach achieves an overall accuracy of 96.3% with a precision of 95.8% and recall of 96.9%, outperforming each individual component. The system is containerized using Docker [10] for seamless deployment and includes a Django REST [7] backend with React frontend for intuitive user interaction. Results validate that combining static analysis, machine learning, and signature matching provides robust defense-in-depth protection against both known malware families and zero-day threats.
KEYWORDS: Malware Detection, Hybrid Analysis, Deep Learning, Autoencoder, Random Forest, YARA Rules, Static Analysis, Cybersecurity