Abstract:
Identification of spoken language is the way to detect the specific language which is
spoken by an anonymous speaker. We will also find out several techniques of machine
learning for detecting spoken language. Our major task is to identify parameters and
features from spoken language that can be used to separate languages. To extract features
from the audio file we will use the Mel Frequency Cepstral coefficient (MFCC). So far, many
methods have been used for language identification (LID). Of all the techniques, the
accuracy of machine learning is the best. That's why we also used machine learning in
our project for lid. Our system will train with 30,000 data. This project aims to classify
Spanish, German & English languages. The main goal of this project is to find out the best
algorithm for detecting specific language. We get the best accuracy from the random forest
algorithm.