Abstract:
It is difficult to determine tense in Bangla text due to its morphological complexity, syntactic variation and the absence of annotated corpus. This study uses the BanglaTense dataset with 17,819 sentences labelled into three tenses (Past, Present and Future) by manual annotation for alleviating temporal features learning tasks in low resource language. We compare with the different testbeds (recurrent architecture LSTM, GRU, Bi/LSTM-GRU), Multichannel CNN-LSTM hybrid and transformer-based IndicBERT model). We used a range of pre-processing and class balancing techniques to address the presence of noise and avoid class specific bias. Experiments proved that GRU is a better recurrent neuron when compared to the other ones being 96% accuracy from time, whereas the CNN-LSTM hybrid presents good generalization by boosting Future tense recognition. In our case, the tense- based IndicBERT we induce with embeddings pretrained on a diverse range of Indic languages, becomes the new state-of-the-art model with 97% accuracy and improved F1-scores for all tenses. The results of this study reveal the significance of balancing mechanisms, architecture design, and contextualized representations for tense classification. This work also demonstrates the promise of blended and transformer models in dealing with temporal reasoning when confronted with morphologically rich resourced scarce language such as Bangla.