Abstract:
Automatic speech recognition (ASR) is a technique which enables machines to
interpret, convert and translate spoken language into text. To produce a text from
spoken language, ASR system receives input from the speaker and subsequently
decodesthe input using some patterns, algorithms or model. In this project, the research
emphasized how speech recognition systems can be used to automation tasks,
prioritizing the performance of both online and offline algorithms such as Google API,
PocketSphinx and Vosk in various circumstances. Therefore, in current study, ASR
model had been analyzed in detail where Hidden Markove Model and Gaussian Mixture
Model (HMM and GMM) symbiosis set as the base of the experiment. The project was
built-up on Python to execute three platforms as preliminary target and the algorithms
of the platforms are Google API, PocketSphinx and Vosk. All these three platforms had
been compared to find robustness and superiority, but interestingly, Vosk was
conducted extensively better accuracy than Google API and PocketSphinx. An
assessment platform was prepared with the voice of different age groups and considered
voice, frequency-noise and word error rate (WER) to highlight the durability of these
systems. The findings illustrated that Vosk beat Google API and PocketSphinx in a
variety of contexts. Therefore, to overcome the problem, in current study a predefined
command list was set up as a methodical foundation for the assessment of every system
in automated application. Despite of the limitations, the research was provided the
companionship between human and computer especially for disabled people who are
facing challenges using devices. Finally, this innovation opened a new window in ASR
technique due to its effectiveness with the use of real time data and the evidence of
more accuracy.