Performance Improvement of Speech Emotion Recognition in Bengali Language Using Deep Learning and BLSTM Networks.

Mia, Md Shumon

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Mia, Md Shumon
dc.date.accessioned	2025-09-14T06:08:35Z
dc.date.available	2025-09-14T06:08:35Z
dc.date.issued	2024-07-13
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14464
dc.description	Project report	en_US
dc.description.abstract	Speech Emotion Recognition (SER) is a new field in artificial intelligence (AI) and Bengali signal processing that has the potential to improve targeted user interactions and enable more positive interactions with smart devices. The goal of this work is to improve SER for Bengali, a language with limited resources in the suggested domain. Additionally, a novel deep learning model (DCNN-BLSTM) is proposed, which aims to improve the accuracy of emotion recognition by combining 1D-CNN, TDF, BLSTM networks. In this article, a deep learning model is trained using the audio data's Mel-Frequency Cepstral Coefficients (MFCCs) to create a system that can identify audio signals almost exactly like a human auditory system. Mel Frequency Cepstral Coefficients (MFCCs) are obtained by decoding the audio signal before proceeding with local feature learning blocks (LFLBs), which create the feature values using one-dimensional convolutional neural networks (CNNs). Due to the temporal properties of audio signals, these feature values are then added to the Bi-LSTM layer, which helps to improve temporal learning. The TDF layers ensure that temporal dynamics are preserved throughout the processing stages, while the Dropout layer improves model generalization. Lastly, the procedures of categorization and prediction are carried out using fully connected layers. The Bi-LSTM model demonstrates that the recovered features by 1D CNN are well captured because of the time-series properties of speech signals, as can be observed from the experimental evaluation of the SUBESCO database. Additionally, this study employs five distinct data augmentation strategies, each of which helps to increase recognition accuracy. On the SUBESCO dataset, the suggested model produced promising accuracy of 88%, respectively. The results indicate that, in comparison to comparable studies in voice emotion recognition, the suggested approach obtains greater recognition rates.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Natural language processing (NLP)	en_US
dc.subject	Emotion detection	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Natural language processing (NLP)	en_US
dc.subject	Bengali language	en_US
dc.title	Performance Improvement of Speech Emotion Recognition in Bengali Language Using Deep Learning and BLSTM Networks.	en_US
dc.type	Other	en_US