Bengali Named Entity Recognition Using Deep Learning

Lima, Khadija Akter; Asadujjaman, Md.

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Lima, Khadija Akter
dc.contributor.author	Asadujjaman, Md.
dc.date.accessioned	2022-02-22T05:08:13Z
dc.date.available	2022-02-22T05:08:13Z
dc.date.issued	2021-05-05
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/7239
dc.description.abstract	Named Entity Recognition (NER) is considered fundamental for extracting information in Natural Language Processing (NLP), and this task aims to classify each word of a text document into a list of predefined named entity classes. Numerous architectures for highresource languages with high exactness, such as English and Chinese, have been built over time. In recent years, the NER challenge for low-resource languages like Bangla has piqued researchers' interest. To perform the NER task in low resource language Bangla, this work proposes a novel neural network that reduces the need for most feature engineering and aspires to utilize minimal information to get optimal performance. In this research, we have used a new dataset to observe various deep learning models' performance in respect of non-contextual word embedding such as word2vec, glove, and fastText. Consequently, a hybrid architecture made out of bidirectional Gated Recurrent Unit (BGRU), Convolutional Neural Network (CNN), and Conditional Random Field (CRF) emerged triumphant with the F1 Macro Score of 91.90%, and F1 Micro Score of 98.21%. Since precision, recall, and F1 were measured differently in different studies, this value may change. All of the experimental models have also been subjected to a previously introduced method for measuring precision, recall, and F1, with the proposed model scoring 86.83% on F1. The proposed BGRU-CNN-CRF architecture provides peak performance for all the non-contextual word embedding specified and has the highest accuracy for the word2vec word embedding. In addition, this study demonstrates the impact of a well-annotated dataset on accuracy by creating a unique dataset.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Entity recognition	en_US
dc.subject	Deep learning	en_US
dc.title	Bengali Named Entity Recognition Using Deep Learning	en_US
dc.type	Article	en_US