dc.description.abstract |
Speech is the most suitable form of communication. Speech-based applications are playing a vital
role in modern technology for the last few decades. Because it has a lot of identical features for
measuring performance and behavior of human voice. Speech-based application is not only the
trend of modern and efficient technology but also a new shift of information and technology
paradigm. Several research works have been completed on voice-based applications because it has
more practical application than any other form of communication. In this work, we tried to
recognize the feature of voice in term of identify the speakers from Bengali speech. We consider
speakers Age, Division, Height, Weight, Gender, Occupation as the parameter to identify a
speaker. But here we presenting the application of recognizing Bangladeshi speaker’s age and
division from Bengali Speech. We used our own dataset containing 16730 samples. Each sample
is a wav format audio of 8-10 seconds duration. We consider MFCC, Delta, Delta-Delta, LSF,
Spectral Bandwidth and mel spectrogram features to train our model. We tried some traditional
Machine Learning algorithms early but we understand that the huge number of data does better
with Deep Learning algorithms. We tried different Deep Learning algorithms such as Artificial
Neural Network, Convolutional Neural Network, Region Based Convolutional Neural Network,
Long Short-Term Memory with different types of features but ended with Artificial Neural
Network with 85% accuracy for Division recognition and Convolutional Neural Network with
78% accuracy for Age recognition. |
en_US |