DSpace Repository

Bengali news clustering using k-means clustering based on LSA

Show simple item record

dc.contributor.author Zilani, Rashedul Alam
dc.date.accessioned 2024-08-29T06:37:38Z
dc.date.available 2024-08-29T06:37:38Z
dc.date.issued 2024-01-25
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/13264
dc.description.abstract Effective information retrieval and organization have become increasingly important, especially in contexts involving diverse cultural backgrounds, as the continued growth of digital content has demonstrated. The subject matter of this paper is the clustering of Bengali news using the Kmeans algorithm, which integrates LSA. Because of being uncommon, clustering news based on latent semantic analysis poses a tricky problem. Document clustering is also known as textual document clustering. It is one form of cluster analysis. Recent research in this technological age has focused on the implementation of text clustering techniques in diverse domains, including text extraction for extracting vast quantities of valuable content from the Internet and automated document organization [15] and [16]. This article introduces a more advantageous K-means clustering news clustering framework for the purpose of clustering text or news documents. A self-taught learning model is employed to cluster a given set of data into distinct groups, obviating the need for external labels or identifiers. We analyzed a dataset consisting of approximately 0.5 (504266) million portal news texts retrieved from several Bengali newspapers, as well as seven distinct kinds of news content. To categorize the dataset using clustering and semantic analysis, we first set the dataset up. Following that, the punctuation and keywords are converted into codes so that deep learning techniques may be applied to them for the training process. Once we have the learned groups, we cluster them using K-means. However, there are certain things to work on, like data processing and the separation of sentences and punctuation. We recommend a strategy neural network-based deep learning that can solve such issues. Since no groundbreaking work has been done on news text or document clustering yet, this is an effective method. Additionally, we have conducted a few experiments to show how the approach is specifically implemented, confirming the proposed method's efficacy. en_US
dc.publisher Daffodil International University en_US
dc.subject Bengali News en_US
dc.subject Latent Semantic Analysis (LSA) en_US
dc.subject Clustering en_US
dc.subject K-Means Clustering en_US
dc.subject Unsupervised Learning en_US
dc.subject Natural Language Processing (NLP) en_US
dc.title Bengali news clustering using k-means clustering based on LSA en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account