dc.description.abstract |
As a task of Natural Language Processing, the Q/A system is becoming increasingly popular, particularly in the research community. In recent years, several notable works have been published. The majority of the Q/A system has been designed with the English language in mind. Aside from that, resources for English linguistics are widely present. However, despite being a popular language, particularly in the South Asian region, very few works on the Q/A system have been found in the Bengali language. As a result, the following research was carried out in the domain of Bangla Question Answering system from by fine-tuning BERT pre-trained models for Bangla Language. Fined tuned model has been trained with reference text with questions holding answers inside the text and tested with several scenarios. The expected output of the work is to find answers from the corresponding context. To improve the model's efficiency, a new dataset based on the widely used SQUAD dataset has been proposed. The constraints of the SQUAD dataset have been attempted to be overcome by our proposed dataset, since the context has been collected manually from various sources such as Wikipedia or Banglapedia and processed to eliminate grammatical errors while retaining the true sense of the sentence Preprocessing was carried out with the use of a tokenizer called "csebuetnlp/ banglabert," which was exclusively built for Bangla sentences or words from the embracing face library. Despite the fact that our dataset had limitations, our fine-tuned models managed to produce some satisfactory results, and among the 5 models we chose to work on, bert-base-cased and distilbert-base-cased produced some promising results with f1 of 0.6542 and 0.60901 and accuracy of 0.69 and 0.83, respectively. |
en_US |