DSpace Repository

AI-Based Real-time Voice to Text and Emotion system to enhance communication for Deaf individuals

Show simple item record

dc.contributor.author Athoi, Farah Ulfat
dc.date.accessioned 2026-04-25T09:22:11Z
dc.date.available 2026-04-25T09:22:11Z
dc.date.issued 2025-12-27
dc.identifier.citation SWT en_US
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17022
dc.description Thesis Report en_US
dc.description.abstract In this research, an integrated Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) system for the Bangla language has been developed. This system aims to make communication easier for hearing-impaired users. Although ASR and SER technologies are rapidly advancing worldwide, there is a lack of reliable datasets, emotion recognition models, and real-time subtitle systems for the Bangla language. To address this issue, I collected a total of 1,400 audio samples—600 Normal, 400 Angry, and 400 Sad. To clean the audio, I applied Voice Activity Detection (VAD), noise reduction, and trimming. The ASR component uses the Whisper model. Initially, the Word Error Rate (WER) was 58.8% and the Character Error Rate (CER) was 28.2%. After cleaning and preprocessing the data, both WER and CER decreased significantly, improving the system's transcription quality. For the SER component, LSTM, Random Forest, and SVC three models were tested. The SVC model showed the highest accuracy at 96.09%. These results indicate that SVC provides comparatively the most stable and effective performance in emotion recognition for the Bangla language. This research proposes the design and development of an integrated Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) system for the Bengali language. The planned system will convert the speaker's voice into written form in real- time. Additionally, the system will identify the speaker's emotion into three categories— Normal, Angry, or Sad. Although, in practical implementation, noisy environments or multilingual support may pose challenges. However, through initial research and the application of data pre-processing techniques, there is significant potential to enhance the system's performance and accuracy. en_US
dc.description.sponsorship DIU en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Speech-to-Text Recognition en_US
dc.subject Emotion Detection Assistive en_US
dc.subject Technology Real-Time Processing en_US
dc.subject Deaf Communication en_US
dc.title AI-Based Real-time Voice to Text and Emotion system to enhance communication for Deaf individuals en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account