DSpace Repository

Beyond Words: Unraveling Text Complexity with Novel Dataset and a Classifier Application

Show simple item record

dc.contributor.author Islam, Mohammad Shariful
dc.contributor.author Rony, Mohammad Abu Tareq
dc.contributor.author Saha, Pritom
dc.contributor.author Ahammad, Mejbah
dc.contributor.author Alam, Shah Md Nazmul
dc.contributor.author Rahman, Md Saifur
dc.date.accessioned 2024-05-04T06:21:20Z
dc.date.available 2024-05-04T06:21:20Z
dc.date.issued 2023-02-27
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/12216
dc.description.abstract Text classification is a fundamental aspect of Natural Language Processing (NLP). This research presents a novel human-annotated English sentence dataset categorized into four classes (simple, complex, compound, complex-compound) containing 22331 sentences and a sophisticated sentence classifier tool offering the capability to analyze and classify sentences within English text with particular relevance to literature writing. This study explores its performance using three distinct feature representation methods: Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and Word Embedding Features. The study involves the evaluation of four machine learning and two deep learning classifier models. BoW combined with Support Vector Classifier (SVC) and Logistic Regression (LR) demonstrated impressive accuracy rates, excelling in distinguishing sentence complexity. Word Embedding Features, specifically LSTM and RNN, offer a more profound semantic representation. LSTM stands out with the highest accuracy of 98.03% and balanced precision and recall, yielding an average F1-score of 97%. RNN, slightly less accurate at 97.75%, nevertheless exhibits competence in grasping sentence structure dependencies. It offers valuable insights for practical applications and contributes to the broader understanding of sentence structures and semantics. en_US
dc.language.iso en_US en_US
dc.publisher IEEE en_US
dc.subject Classification en_US
dc.subject Datasets en_US
dc.subject Natural language en_US
dc.title Beyond Words: Unraveling Text Complexity with Novel Dataset and a Classifier Application en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics