Fraud Email Analysis and Risk Scoring for Anomaly-Based Fraud Detection
 
 
Mr. Viraj Kothari1
1Student, Department of MSc.IT,
Nagindas Khandwala College, Mumbai,
Maharashtra, India
viraj.kothari2003@gmail.com
 
Dr. Pallavi Devendra Tawde2
Assistant Professor, Department of IT & CS,
Nagindas Khandwala College, Mumbai,
Maharashtra, India
pallavi@nkc.ac.in
 
 
Abstract:  Email scams, such as phishing and business email compromise, are real threats to individuals and organizations, which in most cases easily evade conventional filters by leveraging on trust and slight aberrations. This paper proposes a fraud detection framework that is based on anomalies which couples cutting-edge parsing of email with risk scoring to identify suspicious messages. First, we create a strong email parsing and preprocessing pipeline through tools such as pypff (for Outlook .pst archives), Beautiful Soup (for HTML content cleaning), and regular expressions to parse and sanitize email content, headers, and metadata. Second, we create a hybrid risk scoring model that blends machine learning - specifically Logistic Regression and Decision Trees - with rule-based heuristics. The model predicts a continuous risk value between 0 (benign) and 100 (highly suspicious) for every email. A proof-of-concept application was created to test the method on a live email dataset. The system processed hundreds of emails successfully and detected high-risk anomalies, and the supervised model was successful in identifying fraudulent emails with high accuracy (over 90% on test data). This report outlines methodology, implementation, and results, illustrating the efficacy of a hybrid parsing and scoring approach for augmenting email fraud detection. The findings highlight the importance of blending data-driven models with expert rules in recognizing malicious emails early on, and we explain how this paves the way for more advanced email forensics dashboards in subsequent work.
Keywords: Email Fraud Detection, Anomaly Detection, Email Parsing, Risk Scoring, Machine Learning