Audio and Video Based Emotion Detection
R.Mohamed Yousuf , Dr.T.C.Subbulakshmi
Student(IT) – Professor(IT)
Francis Xavier Engineering College, Tirunelveli, India
ABSTRACT
The massive and growing burden imposed on modern society by depression has motivated investigations into early detection through automated, scalable and non-invasive methods, including those based on speech, facial expression and text. In response to the pressing need for accurate depression detection, researchers in affective computing have turned to behavioral cues such as facial expressions and prosodic features from speech to predict mental disorders like depression and Post- traumatic Stress Disorder. This project introduces a novel framework that harnesses attention mechanisms across multiple layers to identify and extract crucial features from various modalities, facilitating the prediction of depression levels. Leveraging low-level and mid-level features from text, audio, and video data, the proposed network employs attention mechanisms at different levels to quantify the significance of each feature and modality, resulting in enhanced prediction performance. Through a series of meticulous experiments involving individual features from diverse modalities as well as their combinations, the study showcases the efficacy of the approach. Additionally, the project addresses the challenge of identifying effective depression-related features in adverse recording conditions and across different smartphone devices. Introducing two innovative sets of features rooted in speech landmarks, the study achieves promising outcomes on disparate datasets. Employing a blend of machine learning and deep learning algorithms, this research contributes to advancing accurate depression detection by fusing behavioral cues from multiple sources.