Multilabel Movie Genre Classification from Movie Subtitle Using Supervised and Unsupervised Machine Learning Approach

Hasan, Md. Mehedi; Debnath, Susanta Chandra; Hasan, Md. Mozahid

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Hasan, Md. Mehedi
dc.contributor.author	Debnath, Susanta Chandra
dc.contributor.author	Hasan, Md. Mozahid
dc.date.accessioned	2022-02-13T03:54:01Z
dc.date.available	2022-02-13T03:54:01Z
dc.date.issued	2021-06-02
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/7110
dc.description.abstract	Technological breakthroughs and the interest of business entities have made the categorization of media products increasingly conventional in this digital environment. This is usually often a multilabel scenario in which an object might be labeled with several categories. Most of the literature addresses the movie genre classification as a mono-labeling task, generally based on audio-visual features. This study addressed a multilabel movie genre classification model using both supervised and unsupervised machine learning techniques to classify the movies into their corresponding genres. We created a dataset consisting of English subtitle files taken from The Movie Database (IMDB), which contains 1200 movies and each of the movies was labeled according to a set of eleven genre labels. We experimented with two feature extraction methods combined with the classifiers and a feature selection technique to reduce the dimensionality of our proposed work. In this study, we compared the performance of unsupervised and supervised techniques for the classification using several standard performance measures using both feature representation methods. We assessed that the best performers of the unsupervised techniques are K-means and Bisecting k-means in the term of cluster quality. In contrast, we observed the model evaluation using KNN, SVM and DT and find that SVM is better than the other classifiers among the supervised techniques. Finally, we compared the unsupervised and supervised technique in the term of quality of the clusters. We observed that the K-Means and Bisecting K-Means of unsupervised technique produced the cluster of higher quality than the SVM, DT and KNN supervised technique. We addressed the reason for the outliers of the training set and recommended to use unsupervised techniques to improve the assignment of predefining the categories and labeling the textual documents in the training set.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Technological breakthroughs	en_US
dc.subject	Digital environment	en_US
dc.subject	Machine learning	en_US
dc.title	Multilabel Movie Genre Classification from Movie Subtitle Using Supervised and Unsupervised Machine Learning Approach	en_US
dc.type	Article	en_US