Nocs2

Zobaed, S.M.; Haque, Enamul; Kaiser, Shahidullah; Hussain, Razin Farhan

DSpace Home
→
DIU Faculty Publication
→
Articles
→
View Item

dc.contributor.author	Zobaed, S.M.
dc.contributor.author	Haque, Enamul
dc.contributor.author	Kaiser, Shahidullah
dc.contributor.author	Hussain, Razin Farhan
dc.date.accessioned	2021-08-23T07:31:48Z
dc.date.available	2021-08-23T07:31:48Z
dc.date.issued	2019-02-14
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/6046
dc.description.abstract	Cloud services are widely deployed to store and process big data. Organizations who deal with big data, especially large document set, prefer utilizing cloud services for storage and computational efficiency. However, for processing large text corpus, an inefficient data processing is computationally expensive for real-time systems. In addition, efficient memory utilization is important to cluster big data including large text corpus. Clustering of the large text corpus is an important component of various document retrieval systems such as PubMed. To address these challenges, in this paper, we present NoCS2 (Number of Cluster and Seed Selection) for efficient topic-based clustering from unstructured big data in the cloud. NoCS2 relies on computing and storage services in the cloud server. Traditional clustering solutions for text dataset consider a fixed number of clusters irrespective of the dataset size and characteristics such as science and technology. Alternatively, our solution dynamically determines the appropriate k number of clusters based on the characteristics of the dataset. Particularly, we use precomputed matrix trace as the number of clusters for a dataset that represents the total number of keywords using vector representation. Then, we build k clusters using topic-based similarity among keywords. Finally, we compare our proposed method with two state-of-the-art clustering methods. Empirical results demonstrate that the average closeness score of NoCS2 is better than other methods for large and sparse datasets.	en_US
dc.language.iso	en_US	en_US
dc.publisher	2020 21st International Conference of Computer and Information Technology, ICCIT 2018, IEEE	en_US
dc.subject	Big data	en_US
dc.subject	Cloud computing	en_US
dc.subject	Matrix algebra	en_US
dc.subject	Text analysis	en_US
dc.subject	Mathematical model	en_US
dc.subject	Clustering algorithms	en_US
dc.title	Nocs2	en_US
dc.title.alternative	Topic-based Clustering of Big Data Text Corpus in the Cloud	en_US
dc.type	Article	en_US