Instrument Sound Classification

Jewel, Md.; Foysal, Abdullah Al

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Instrument Sound Classification

Jewel, Md.; Foysal, Abdullah Al

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/5324

Date: 2020-07-19

Abstract:

Sound or audio classification is always a challenging task since it’s not always handy to collect the dataset. Even after the collection, it’s not guaranteed that any specific model of Machine learning or deep learning will perform better. In this project, we have designed a novel approach to classify musical instruments of different categories using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Nowadays acoustic scene sound, music, and speech are considered to be handled in the domain of audio since they have resemblance for the application of digital signal processing (DSP). Recently the domain of image classification has grown up very swiftly by the application of different machine learning models, so it is high time to study numerous extensible models in the domain of audio classification. But collecting audio data is not always feasible again training a network based on a small dataset and get the maximum accuracy is a quite challenging task in the deep learning approaches. We took that challenge and successfully built two models based on convolutional and recurrent neural networks. In order to classify instrumental sounds, initially, we have to extract different features from the audio samples. We used the best feature extraction technique which is Mel Frequency Cepstral Coefficient (MFCC). MFCCs work like a human auditory system, that’s why it provides extremely typical features from audio or music samples. However, we got around 95.76% and 87.62% accuracy for convolutional and recurrent neural network models respectively on 10 different music instrumental classes. Applying the same techniques we tried to classify human emotion from their speech. It’s easy for humans to recognize others' emotions by hearing their voice but for machines, it gives a nightmare experience. So we analyzed different techniques to perform speaker discrimination and speech analysis to find efficient algorithms to perform this task.

Show full item record