Abstract:
K-means Clustering for News Clustering based on Latent Semantic Analysis is a challenging problem because of its rarity. Document clustering or text clustering is a cluster analysis process of textual documents. In this technological era, the use of text clustering processes in different domains is most happening work in research fields recently, including automatic document organization and text extraction for mining significant sets of valuable information on the Internet. In this article, we propose a K-means Clustering News Clustering using Latent Semantic Analysis-based framework, which is more beneficial for clustering news documents or text. The clustering process works by considering a set of data to cluster them in groups with the help of a self-taught learning model, which need not use any external labels or tags. We used the BBC news classification dataset and implemented our proposed model on the dataset. Firstly, we set the dataset to prepare for the clustering and semantic analysis method to categorize the dataset. Then the keywords and punctuation marks are processed into codes to apply the Deep Learning methods on it for the training procedure. Then we use K-means to cluster them after getting the learned delegations. But there are some issues to operate such as sentences and punctuation marks separation and data processing. To conquer these issues, we propound a Deep Learning framework using neural networks. This is an effective approach for news text or document clustering because no revolutionary work has been done about it yet. We also have done some experiments to demonstrate the specific implementation of the approach which confirms the effectiveness of the proposed method.