Speech Emotion Recognizer Using Machine Learning
Madhura Mohod1, Nisarga Mhaisane1,
Aditi Thakare1, Sakshi Nichat1
47, Samarpan Colony, Near Pathyapustak Mandal, V. M. Area , Amravati, Maharashtra 444604.
Prof. Vaishali R. Thakare2
Sai Nagar, Amravati
1Student, Computer Science and Engineering, P. R. Pote Patil College of Engineering and Management, Amravati, India.
2Assistant Professor, Computer Science and Engineering, P. R. Pote Patil College of Engineering and Management, Amravati, India
Abstract
In this project, a deep learning approach for emotion classification using speech data from different modalities is performed. A convolutional neural network(CNN) that captures discriminative information from audio features is trained to correctly classify the emotion labels. Further experiments also include experimenting with different network architectures of each individual model, using regularization strategies such as dropout, etc. To ensure that the trained model performs well on an unknown audio sample, different from the samples used for training and testing, audio samples from a completely separate dataset are collected and tested using the trained model. To ensure that the results obtained are indeed true, the Kaggle kernel1 used for training the models is also made public. Emotion recognition from speech signals is an important but challenging component of Human-Computer Interaction (HCI). In the literature of speech emotion recognition (SER), many techniques have been utilized to extract emotions from signals, including many well-established speech analysis and classification techniques. Deep Learning techniques have been recently proposed as an alternative to traditional techniques in SER. This paper presents an overview of Deep Learning techniques and discusses some recent literature where these methods are utilized for speech-based emotion recognition. The review covers databases used, emotions extracted, contributions made toward speech emotion recognition and limitations related to it. The expression of emotions in human communication plays a very important role in the information that needs to be conveyed to the partner. The forms of expression of human emotions are very rich. It could be body language, facial expressions, eye contact, laughter, and tone of voice. The languages of the world’s peoples are different, but even without understanding a language in communication, people can almost understand part of the message that the other partner wants to convey with emotional expressions as mentioned. Among the forms of human emotional expression, the expression of emotions through voice is perhaps the most studied.
Keywords— Speech Emotion Recognizer, CNN, Machine Learning, Python, Support Vector Machine(SVM).