Abstract:
In the analysis of psychological disorders, behavioral decision making, human machine
interaction application speech recognition is plays a essential role. Speech emotion recognition
is a system that detects emotions from live audio. people from all over the world utilize words
to express their emotions, regardless of their origin. In this project, we focus on using machine
learning (ML), which employs a dataset and algorithms to predict or detect any future
possibilities. The data sets of audio files in wave format with 8 emotional states: anger, disgust,
fear, happiness, pleasant, surprise, sadness, and neutral. Using the librosa library, features were
extracted from the audio files in the datasets. The features were applied to multiple machine
learning models and results were compared. Speech Emotion Recognition is a popular study
topic with numerous applications. It has also became a challenge in the field of speech
recognition processing too. Overall, a CNN model would be a good method to human speech
emotion recognition with the accuracy rate 85%, because of its capacity to extract complicated
patterns and characteristics from input data. The other two models accuracy rates are, SVM 82%
and MLP 83%. However, the model's success would be determined by the quality of the
preprocessed data, the model architecture used, and the efficacy of the data augmentation
strategies employed