Show simple item record

dc.contributor.author Zobaed, S.M.
dc.contributor.author Haque, Enamul
dc.contributor.author Kaiser, Shahidullah
dc.contributor.author Hussain, Razin Farhan
dc.date.accessioned 2021-08-23T07:31:48Z
dc.date.available 2021-08-23T07:31:48Z
dc.date.issued 2019-02-14
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/6046
dc.description.abstract Cloud services are widely deployed to store and process big data. Organizations who deal with big data, especially large document set, prefer utilizing cloud services for storage and computational efficiency. However, for processing large text corpus, an inefficient data processing is computationally expensive for real-time systems. In addition, efficient memory utilization is important to cluster big data including large text corpus. Clustering of the large text corpus is an important component of various document retrieval systems such as PubMed. To address these challenges, in this paper, we present NoCS2 (Number of Cluster and Seed Selection) for efficient topic-based clustering from unstructured big data in the cloud. NoCS2 relies on computing and storage services in the cloud server. Traditional clustering solutions for text dataset consider a fixed number of clusters irrespective of the dataset size and characteristics such as science and technology. Alternatively, our solution dynamically determines the appropriate k number of clusters based on the characteristics of the dataset. Particularly, we use precomputed matrix trace as the number of clusters for a dataset that represents the total number of keywords using vector representation. Then, we build k clusters using topic-based similarity among keywords. Finally, we compare our proposed method with two state-of-the-art clustering methods. Empirical results demonstrate that the average closeness score of NoCS2 is better than other methods for large and sparse datasets. en_US
dc.language.iso en_US en_US
dc.publisher 2020 21st International Conference of Computer and Information Technology, ICCIT 2018, IEEE en_US
dc.subject Big data en_US
dc.subject Cloud computing en_US
dc.subject Matrix algebra en_US
dc.subject Text analysis en_US
dc.subject Mathematical model en_US
dc.subject Clustering algorithms en_US
dc.title Nocs2 en_US
dc.title.alternative Topic-based Clustering of Big Data Text Corpus in the Cloud en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics