DSpace Repository

Multilabel Movie Genre Classification from Movie Subtitle Using Supervised and Unsupervised Machine Learning Approach

Show simple item record

dc.contributor.author Hasan, Md. Mehedi
dc.contributor.author Debnath, Susanta Chandra
dc.contributor.author Hasan, Md. Mozahid
dc.date.accessioned 2022-02-13T03:54:01Z
dc.date.available 2022-02-13T03:54:01Z
dc.date.issued 2021-06-02
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/7110
dc.description.abstract Technological breakthroughs and the interest of business entities have made the categorization of media products increasingly conventional in this digital environment. This is usually often a multilabel scenario in which an object might be labeled with several categories. Most of the literature addresses the movie genre classification as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using both supervised and unsupervised machine learning techniques to classify the movies into their corresponding genres. We created a dataset consisting of English subtitle files taken from The Movie Database (IMDB), which contains 1200 movies and each of the movies was labeled according to a set of eleven genre labels. We experimented with two feature extraction methods combined with the classifiers and a feature selection technique to reduce the dimensionality of our proposed work. In this study, we compared the performance of unsupervised and supervised techniques for the classification using several standard performance measures using both feature representation methods. We assessed that the best performers of the unsupervised techniques are K-means and Bisecting k-means in the term of cluster quality. In contrast, we observed the model evaluation using KNN, SVM and DT and find that SVM is better than the other classifiers among the supervised techniques. Finally, we compared the unsupervised and supervised technique in the term of quality of the clusters. We observed that the K-Means and Bisecting K-Means of unsupervised technique produced the cluster of higher quality than the SVM, DT and KNN supervised technique. We addressed the reason for the outliers of the training set and recommended to use unsupervised techniques to improve the assignment of predefining the categories and labeling the textual documents in the training set. en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Technological breakthroughs en_US
dc.subject Digital environment en_US
dc.subject Machine learning en_US
dc.title Multilabel Movie Genre Classification from Movie Subtitle Using Supervised and Unsupervised Machine Learning Approach en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics