Bengali news clustering using k-means clustering based on LSA

Zilani, Rashedul Alam

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Zilani, Rashedul Alam
dc.date.accessioned	2024-06-12T03:49:59Z
dc.date.available	2024-06-12T03:49:59Z
dc.date.issued	2024-01-24
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12692
dc.description.abstract	Effective information retrieval and organization have become increasingly important, especially in contexts involving diverse cultural backgrounds, as the continued growth of digital content has demonstrated. The subject matter of this paper is the clustering of Bengali news using the Kmeans algorithm, which integrates LSA. Because of being uncommon, clustering news based on latent semantic analysis poses a tricky problem. Document clustering is also known as textual document clustering. It is one form of cluster analysis. Recent research in this technological age has focused on the implementation of text clustering techniques in diverse domains, including text extraction for extracting vast quantities of valuable content from the Internet and automated document organization [15] and [16]. This article introduces a more advantageous K-means clustering news clustering framework for the purpose of clustering text or news documents. A self-taught learning model is employed to cluster a given set of data into distinct groups, obviating the need for external labels or identifiers. We analyzed a dataset consisting of approximately 0.5 (504266) million portal news texts retrieved from several Bengali newspapers, as well as seven distinct kinds of news content. To categorize the dataset using clustering and semantic analysis, we first set the dataset up. Following that, the punctuation and keywords are converted into codes so that deep learning techniques may be applied to them for the training process. Once we have the learned groups, we cluster them using K-means. However, there are certain things to work on, like data processing and the separation of sentences and punctuation. We recommend a strategy neural network-based deep learning that can solve such issues. Since no groundbreaking work has been done on news text or document clustering yet, this is an effective method. Additionally, we have conducted a few experiments to show how the approach is specifically implemented, confirming the proposed method's efficacy.	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Bengali Linguistic	en_US
dc.subject	Deep Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	LSA (Latent Semantic Analysis)	en_US
dc.subject	News	en_US
dc.title	Bengali news clustering using k-means clustering based on LSA	en_US
dc.type	Other	en_US