dc.description.abstract |
Language, the core of human civilization, is all about communication. This research
explores the intricacies of Automatic Speech Recognition (ASR) systems, particularly
Speech-to-Language Identification (SLID), in multilingual environments. It focuses on
data collection, feature extraction, and language classification, using techniques such
as Mel spectrograms, MFCC, and deep learning architectures like CNNs and RNNs.
Our dataset is from Kaggle and the rest of them are real-time. Deep learning models,
particularly CNNs and hybrid architectures like CRNN, show promising results in
language identification. An audio speech recognition model is developed using deep
learning techniques. Data is stored in Google Drive and accessed via Colab for model
training. The model was trained on Mel spectrograms of audio data using deep learning
techniques. Pre-trained models like VGG16, Efficientnet, GoogleNet, and DenseNet
offer viable alternatives for audio recognition tasks. We are going to propose them on
our pre-processed dataset. The performance of all models will be compared using
standard metrics like accuracy, precision, recall, and F1 score. This research offers an
in-depth analysis of SLID systems, showing their importance in communication across
languages and proposing technologies that may be able to increase their accuracy and
allow wider acceptance. |
en_US |