dc.description.abstract |
Named Entity Recognition (NER) is considered fundamental for extracting information
in Natural Language Processing (NLP), and this task aims to classify each word of a text
document into a list of predefined named entity classes. Numerous architectures for highresource languages with high exactness, such as English and Chinese, have been built
over time. In recent years, the NER challenge for low-resource languages like Bangla has
piqued researchers' interest. To perform the NER task in low resource language Bangla,
this work proposes a novel neural network that reduces the need for most feature
engineering and aspires to utilize minimal information to get optimal performance. In this
research, we have used a new dataset to observe various deep learning models'
performance in respect of non-contextual word embedding such as word2vec, glove, and
fastText. Consequently, a hybrid architecture made out of bidirectional Gated Recurrent
Unit (BGRU), Convolutional Neural Network (CNN), and Conditional Random Field
(CRF) emerged triumphant with the F1 Macro Score of 91.90%, and F1 Micro Score of
98.21%. Since precision, recall, and F1 were measured differently in different studies,
this value may change. All of the experimental models have also been subjected to a
previously introduced method for measuring precision, recall, and F1, with the proposed
model scoring 86.83% on F1. The proposed BGRU-CNN-CRF architecture provides
peak performance for all the non-contextual word embedding specified and has the
highest accuracy for the word2vec word embedding. In addition, this study demonstrates
the impact of a well-annotated dataset on accuracy by creating a unique dataset. |
en_US |