Optimizing Human-device Interaction through Real-Time Automated Speech Recognition and NLP

Oishy, Tahiya Rahman

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Optimizing Human-device Interaction through Real-Time Automated Speech Recognition and NLP

Oishy, Tahiya Rahman

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17401

Date: 2024-12-28

Abstract:

Automatic speech recognition (ASR) is a technique which enables machines to interpret, convert and translate spoken language into text. To produce a text from spoken language, ASR system receives input from the speaker and subsequently decodesthe input using some patterns, algorithms or model. In this project, the research emphasized how speech recognition systems can be used to automation tasks, prioritizing the performance of both online and offline algorithms such as Google API, PocketSphinx and Vosk in various circumstances. Therefore, in current study, ASR model had been analyzed in detail where Hidden Markove Model and Gaussian Mixture Model (HMM and GMM) symbiosis set as the base of the experiment. The project was built-up on Python to execute three platforms as preliminary target and the algorithms of the platforms are Google API, PocketSphinx and Vosk. All these three platforms had been compared to find robustness and superiority, but interestingly, Vosk was conducted extensively better accuracy than Google API and PocketSphinx. An assessment platform was prepared with the voice of different age groups and considered voice, frequency-noise and word error rate (WER) to highlight the durability of these systems. The findings illustrated that Vosk beat Google API and PocketSphinx in a variety of contexts. Therefore, to overcome the problem, in current study a predefined command list was set up as a methodical foundation for the assessment of every system in automated application. Despite of the limitations, the research was provided the companionship between human and computer especially for disabled people who are facing challenges using devices. Finally, this innovation opened a new window in ASR technique due to its effectiveness with the use of real time data and the evidence of more accuracy.