Suffix Based Automated Parts of Speech Tagging for Bangla Language

Roy, Monjoy Kumar; Paul, Pinto Kumar

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

dc.contributor.author	Roy, Monjoy Kumar
dc.contributor.author	Paul, Pinto Kumar
dc.date.accessioned	2019-07-06T04:42:58Z
dc.date.available	2019-07-06T04:42:58Z
dc.date.issued	2018-12-11
dc.identifier.uri	http://hdl.handle.net/123456789/2711
dc.description.abstract	Natural language processing (NLP) is the technique by which we process the human language with the computer. Parts-of-Speech (POS) tagging is one of the fundamental requirements for some NLP applications. It is considered as a solved problem for some foreign languages, such as English, Chinese, due to higher accuracy (97%), where it is still an unsolved problem for Bangla because of its ambiguity. Although making a POS tagger for Bangla is not a new work, but each one of available POS taggers has different kinds of limitations. We choose to develop an unsupervised system rather than a supervised system, because a supervised system needs a huge data resource for training purpose and available resources in Bangla is really poor. Here we develop a POS tagger mainly based on Bangla grammar especially suffixes. Because Bangla is a very inflectional language, where a single word has many variants based on their suffixes. In this POS tagger, we assign 8 base POS tags, where some rules, based on Bangla grammar and suffix, are applied to identify POS tags with the cooperation of verb root dataset. To handle non-suffix words, a dataset of almost 14500 Bangla words, with having their default POS tags, is added with the system, which helps to increase the efficiency of this POS tagger. A modified version of previously used algorithm for suffix analysis is applied, which result in a satisfactory level of about 94.2%.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.relation.ispartofseries	;P12220
dc.subject	Computer Science	en_US
dc.subject	Language Automation	en_US
dc.subject	Language Processing	en_US
dc.title	Suffix Based Automated Parts of Speech Tagging for Bangla Language	en_US
dc.type	Other	en_US