DSpace Repository

Performance Improvement of Speech Emotion Recognition in Bengali Language Using Deep Learning and BLSTM Networks.

Show simple item record

dc.contributor.author Mia, Md Shumon
dc.date.accessioned 2025-09-14T06:08:35Z
dc.date.available 2025-09-14T06:08:35Z
dc.date.issued 2024-07-13
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/14464
dc.description Project report en_US
dc.description.abstract Speech Emotion Recognition (SER) is a new field in artificial intelligence (AI) and Bengali signal processing that has the potential to improve targeted user interactions and enable more positive interactions with smart devices. The goal of this work is to improve SER for Bengali, a language with limited resources in the suggested domain. Additionally, a novel deep learning model (DCNN-BLSTM) is proposed, which aims to improve the accuracy of emotion recognition by combining 1D-CNN, TDF, BLSTM networks. In this article, a deep learning model is trained using the audio data's Mel-Frequency Cepstral Coefficients (MFCCs) to create a system that can identify audio signals almost exactly like a human auditory system. Mel Frequency Cepstral Coefficients (MFCCs) are obtained by decoding the audio signal before proceeding with local feature learning blocks (LFLBs), which create the feature values using one-dimensional convolutional neural networks (CNNs). Due to the temporal properties of audio signals, these feature values are then added to the Bi-LSTM layer, which helps to improve temporal learning. The TDF layers ensure that temporal dynamics are preserved throughout the processing stages, while the Dropout layer improves model generalization. Lastly, the procedures of categorization and prediction are carried out using fully connected layers. The Bi-LSTM model demonstrates that the recovered features by 1D CNN are well captured because of the time-series properties of speech signals, as can be observed from the experimental evaluation of the SUBESCO database. Additionally, this study employs five distinct data augmentation strategies, each of which helps to increase recognition accuracy. On the SUBESCO dataset, the suggested model produced promising accuracy of 88%, respectively. The results indicate that, in comparison to comparable studies in voice emotion recognition, the suggested approach obtains greater recognition rates. en_US
dc.description.sponsorship DIU en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Natural language processing (NLP) en_US
dc.subject Emotion detection en_US
dc.subject Computational linguistics en_US
dc.subject Natural language processing (NLP) en_US
dc.subject Bengali language en_US
dc.title Performance Improvement of Speech Emotion Recognition in Bengali Language Using Deep Learning and BLSTM Networks. en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account