Multimodal Smart Stress Analyzer Using Facial Expression and Speech Emotion Recognition
Prof. Ms. Vaishali D. Parihar
Information technology
Anuradha College of Engineering &
Technology
Chikhli , India vaishaliparihar08@gmail.com
Jivan Kishor Tapkire
Information technology
Anuradha College of Engineering and
Technology
Chikhli,India jivantapkire@gmail.com
Irfankha Yasinkha Pathan
Information technology
Anuradha College of Engineering &
Technology
Chikhli , India ip7958558@gmail.com
Karan Suresh Ambhore
Information technology
Anuradha College of Engineering & Technology
Chikhli , India karanambhore07@gmail.com
Govinda Suresh Pawar
Information technology
Anuradha College of Engineering and
Technology
Chikhli,India govindapawar8999@gmail.com
Abstract—Mental stress has emerged as a pervasive health challenge in modern society, affecting cognitive performance, physiological well-being, and overall quality of life. This paper presents a Smart Stress Analyzer—a web-based multimodal system that detects and quantifies human stress by integrating facial expression recognition with speech-based emotion analysis. The facial component employs an EfficientNet-B2 convolutional neural network augmented with a Convolutional Block Attention Module (CBAM) to classify seven discrete emotional states from real-time webcam frames, achieving a test accuracy of 65.85%. The speech component fine-tunes a frozen Wav2Vec2-base transformer on a combined CREMA-D and RAVDESS dataset to predict five stress severity levels, attaining an overall test accuracy of 57%. Both models are integrated into a lightweight HTML/CSS/JavaScript web application that performs real-time inference without server-side processing. A weighted fusion strategy combines the two modality scores into a unified stress index displayed on an interactive dashboard. Experimental evaluation demonstrates the system's feasibility for continuous, non-intrusive stress monitoring in everyday environments.
Keywords—Stress Detection, Facial Expression Recognition, Speech Emotion Recognition, EfficientNet-B2, Wav2Vec2, CBAM, Multimodal Fusion, Web Application