DSpace Repository

Lemmatization Algorithm Development for Bangla Natural Language Processing

Show simple item record

dc.contributor.author Kowsher, Md.
dc.contributor.author Tahabilder, Anik
dc.contributor.author Sarker, Md Murad Hossain
dc.contributor.author Sanjid, Md. Zahidul Islam
dc.contributor.author Prottasha, Nusrat Jahan
dc.date.accessioned 2021-11-23T10:04:07Z
dc.date.available 2021-11-23T10:04:07Z
dc.date.issued 2020-01-07
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/6441
dc.description.abstract Natural language processing (NLP) finds enormous applications in autonomous communication, while lemmatization is an essential preprocessing technique for simplification of a word to its origin-word in NLP. However, there is scarcity of effective algorithms in Bangla NLP. This leads us to develop a useful Bangla language lemmatization tool. Usually, some rule base stemming processes play the vital role of lemmatization in Bangla language processing as there is lack of Bangla lemmatization tool. In this paper, we propose a Bangla lemmatization framework using three effective lemmatization techniques based on data structures and dynamic programming. We have used Trie algorithm and developed a mapping algorithm named “Dictionary Based Search by Removing Affix (DBSRA)” based on data structure. We have applied both Trie and DBSRA lemmatization and selected the better one by considering the Levenshtein distance between the lemma and the original word. Eventually, we have experimented with Bangla language lemmatization among all three techniques and the framework. Among the three proposed techniques, the DBSRA performed better compared to others with an accuracy of 93.1 percent. The framework, developed by fusing three algorithms, came out with the highest efficiency of 95.89 percent. Contribution-This paper presents the development of three lemmatization algorithms and their fusion to develop a framework for Bangla Natural Language Processing. en_US
dc.language.iso en_US en_US
dc.publisher 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), IEEE en_US
dc.subject Bangla NLP en_US
dc.subject lemmatization en_US
dc.subject Trie en_US
dc.subject DBSRA en_US
dc.subject Corpus en_US
dc.title Lemmatization Algorithm Development for Bangla Natural Language Processing en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics