Abstract:
Speech recognition has been a very popular topic for the students of computer science and engineering. Over time, with the advancement of technology, we can now recognize Bengali speech as well. But people from all corners of Bangladesh do not speak the standard language. Therefore, sometimes it becomes difficult to recognize the local accented speech. With the hope of resolving this issue, we have made every effort to develop a local-accented Bengali speech recognition system. In our system, we have detected the audio with a CNN (convolutional neural network) by converting audios into a spectrogram. Aside from audio recognition, it also provides sample data that is simple to handle and use. Then we have done gradient boosting with MFCC (Mel Frequency Cepstral Coefficients) features of audio data, which reduces the prediction error by combining with the previous model. It is important to note that MFCC is a popular algorithm used to filter vocal tracks. Using this, we classified the local accented speech sample, which is further assured by the Random Forest Classifier. Finally, we got the desired output from the ANN (artificial neural network) model. For checking the functioning of our system, we have been using the local accent of the people of Chittagong as a reference. We go maximum 91% training accuracy and 81% test accuracy with 3161 audio files of 25 classes. We are hopeful to expand our research on this topic, which will be very helpful to the people who speak in a local tone to be at par with others