DSpace Repository

Efficiency Demonstration of Embedding Models and Libraries for Bengali Word Vector Representation

Show simple item record

dc.contributor.author Bulbul, Aminul Islam
dc.contributor.author Das, Saurav
dc.contributor.author Tasnim, Tamanna
dc.date.accessioned 2023-05-03T04:46:31Z
dc.date.available 2023-05-03T04:46:31Z
dc.date.issued 23-02-12
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/10295
dc.description.abstract Word embedding demonstrates the magic in the field of NLP Our goal is to find out the best embedding model. Finding out the best embedding model for specific tasks is difficult. Embedding demonstrates different results according to size and source of data set in various embedding tasks. The purpose of the study is to find out the performance it shows for different types of embedding tasks. Researchers have invented several embedding models after finding the magical performance of word-embedding in the field of NLP. In our paper we discussed CBOW, skip-gram and Glove models performance. The models do embedding by representing the word into vector forms. We collect 2.5 lakh Bengali newspapers articles from a renowned newspaper of BD. We trained the architectures CBOW and skip-gram, which is for wor2vec and FastText models, dataset containing 20 million Bengali words. We use the same data set for training the Glove model. For collecting such a large amount of data, we build a web scraper by using Scrapy. Gensim, FastText and the python library has been used for training these three models consequently. For evaluating the models, we perform various word embedding tasks namely word analogy, semantic and syntactic prediction of words. Surprisingly they FastText perform in a better way for semantic and syntactic tasks than others. On the other hand, for analogy task, the performance was almost same for all the models except Glove. en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Bengali words en_US
dc.subject Bengali language en_US
dc.subject Bengali newspaper en_US
dc.title Efficiency Demonstration of Embedding Models and Libraries for Bengali Word Vector Representation en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account