DSpace Repository

MCNN-LSTM: Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data

Show simple item record

dc.contributor.author Hasib, Khan Md.
dc.contributor.author Azam, Sami
dc.contributor.author Karim, Asif
dc.contributor.author Marouf, Ahmed Al
dc.contributor.author Shamrat, F M Javed Mehedi
dc.contributor.author Montaha, Sidratul
dc.contributor.author Yeo, Kheng Cher
dc.contributor.author Jonkman, Mirjam
dc.date.accessioned 2024-06-29T09:46:29Z
dc.date.available 2024-06-29T09:46:29Z
dc.date.issued 2023-08-29
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12810
dc.description.abstract "Searching, retrieving, and arranging text in ever-larger document collections necessitate more efficient information processing algorithms. Document categorization is a crucial component of various information processing systems for supervised learning. As the quantity of documents grows, the performance of classic supervised classifiers has deteriorated because of the number of document categories. Assigning documents to a predetermined set of classes is called text classification. It is utilized extensively in a wide range of data-intensive applications. However, the fact that real-world implementations of these models are plagued with shortcomings begs for more investigation. Imbalanced datasets hinder the most prevalent high-performance algorithms. In this paper, we propose an approach name multi-class Convolutional Neural Network (MCNN)-Long Short-Time Memory (LSTM), which combines two deep learning techniques, Convolutional Neural Network (CNN) and Long Short-Time Memory, for text classification in news data. CNN’s are used as feature extractors for the LSTMs on text input data and have the spatial structure of words in a sentence, paragraph, or document. The dataset is also imbalanced, and we use the Tomek-Link algorithm to balance the dataset and then apply our model, which shows better performance in terms of F1- score (98%) and Accuracy (99.71%) than the existing works. The combination of deep learning techniques used in our approach is ideal for the classification of imbalanced datasets with underrepresented categories. Hence, our method outperformed other machine learning algorithms in text classification by a large margin. We also compare our results with traditional machine learning algorithms in terms of imbalanced and balanced datasets." en_US
dc.language.iso en_US en_US
dc.publisher IEEE en_US
dc.subject Classification en_US
dc.subject Big data en_US
dc.subject Machine learning en_US
dc.title MCNN-LSTM: Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics