Speech Emotion Recognition: An LSTM Approach
Jhansi Kothuri
Mohd. Adhnan
Institute of Aeronautical Engineering(Autonomous).
Department of Electronics and Communication Engineering
Email : 21951a0468@iare.ac.in
Email : 21951a0499@iare.ac.in
B. Naresh
DR. S China Venkateswarlu
Institute of Aeronautical Engineering(Autonomous)
Professor, Institute of AeronauticalEngineering(Autonomous)
Department of Electronics and Communication Engineering
Email : 21951a04B0@iare.ac.in
Email : c.venkateswarlu@iare.ac.in
Abstract – This paper presents a novel approach to Speech Emotion Recognition (SER) utilizing a Long Short-Term Memory (LSTM) network to classify emotions from audio inputs in real-time. The primary goal of this research is to accurately identify various emotions, including happiness, sadness, anger, fear, and surprise, enhancing user experience in applications such as human-computer interaction, virtual assistants, and mental health monitoring. The methodology involves a comprehensive process that begins with the preprocessing of audio signals to ensure clarity and consistency. This is followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCCs), which capture essential characteristics of the speech signals. The LSTM network is then employed to model the temporal dependencies inherent in the extracted features, enabling precise emotion classification. To assess the system's performance, we focus on key evaluation metrics, including classification accuracy and processing latency, demonstrating the system's capability for real-time applications. Additionally, user feedback is collected to evaluate the practical applicability and usability of the system in various real- world scenarios. The results of this study underscore the effectiveness of LSTM networks in recognizing emotions from speech, highlighting their potential for deployment in automated emotional intelligence systems. This work not only advances the field of SER but also lays the groundwork for future research aimed at refining detection capabilities and expanding the range of identifiable emotions.
Keywords: real-time classification, Mel-Frequency Cepstral Coefficients (MFCCs), human-computer interaction, emotional intelligence, audio signal processing, emotion classification, temporal dependencies, feature extraction, mental health monitoring, automated systems