A Transformer Based Question Answering System for Answering Open Domain Questions from Bengali Reference Text

Shakil, S M Khasrul Alam; Ahmed, Md. Foysal; Sholi, Rubaiya Tasnim

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Shakil, S M Khasrul Alam
dc.contributor.author	Ahmed, Md. Foysal
dc.contributor.author	Sholi, Rubaiya Tasnim
dc.date.accessioned	2023-04-01T03:20:20Z
dc.date.available	2023-04-01T03:20:20Z
dc.date.issued	23-01-29
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/10073
dc.description.abstract	As a task of Natural Language Processing, the Q/A system is becoming increasingly popular, particularly in the research community. In recent years, several notable works have been published. The majority of the Q/A system has been designed with the English language in mind. Aside from that, resources for English linguistics are widely present. However, despite being a popular language, particularly in the South Asian region, very few works on the Q/A system have been found in the Bengali language. As a result, the following research was carried out in the domain of Bangla Question Answering system from by fine-tuning BERT pre-trained models for Bangla Language. Fined tuned model has been trained with reference text with questions holding answers inside the text and tested with several scenarios. The expected output of the work is to find answers from the corresponding context. To improve the model's efficiency, a new dataset based on the widely used SQUAD dataset has been proposed. The constraints of the SQUAD dataset have been attempted to be overcome by our proposed dataset, since the context has been collected manually from various sources such as Wikipedia or Banglapedia and processed to eliminate grammatical errors while retaining the true sense of the sentence Preprocessing was carried out with the use of a tokenizer called "csebuetnlp/ banglabert," which was exclusively built for Bangla sentences or words from the embracing face library. Despite the fact that our dataset had limitations, our fine-tuned models managed to produce some satisfactory results, and among the 5 models we chose to work on, bert-base-cased and distilbert-base-cased produced some promising results with f1 of 0.6542 and 0.60901 and accuracy of 0.69 and 0.83, respectively.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Natural language	en_US
dc.subject	Linguistics	en_US
dc.subject	English language	en_US
dc.subject	Popular language	en_US
dc.title	A Transformer Based Question Answering System for Answering Open Domain Questions from Bengali Reference Text	en_US
dc.type	Other	en_US