Author Identification using Deep Learning Approach

Sutradhar, Shimul

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
M.SC. in CSE
→
Thesis
→
View Item

Author Identification using Deep Learning Approach

Sutradhar, Shimul

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16685

Date: 2025-05-23

Abstract:

self-administered questionnaires and clinical diagnostics, and both are subjective and time- consuming. In this work, we use machine learning to predict depression based on a dataset from In an age dominated by digital communication and the fast expansion of textual information, the job of authorship attribution-- identifying the real author of a given piece of composing-- has gotten considerable significance across domains such as digital forensics, cybersecurity, scholastic stability, and literary analysis. Traditional approaches relying on surface-level stylometric features and classical maker discovering algorithms have actually shown minimal scalability and versatility, specifically in multilingual or stylistically intricate scenarios. This research study addresses these constraints by proposing a robust, deep learning-based framework for author detection utilizing both English and Bangla texts. The study introduces a comparative evaluation of various deep knowing designs including Long Short-Term Memory (LSTM), Bidirectional LSTM with Attention (BiLSTM + Attention), and a hybrid Convolutional Neural Network combined with LSTM (CNN+LSTM). Most significantly, it includes modern transformer-based architectures such as BERT (bert-base-multilingual-cased) and XLM-RoBERTa, leveraging their powerful contextual embedding abilities to enhance authorship category precision. A by hand curated multilingual dataset including text samples from 6 diverse authors was utilized, with extensive preprocessing and stratified train-test splitting to ensure balanced and clean input. Speculative results demonstrate that the proposed bert- base-multilingual-cased model exceeded all other approaches, attaining an exceptional precision of 95.5% and best AUC ratings (1.00) for all authors. This work not just highlights the efficacy of transformer-based designs for author recognition but also offers a scalable and language-agnostic method applicable to real-world multilingual text classification tasks. The outcomes highly promote for the adoption of deep contextualized models in authorship attribution and unlock for future research study in cross-genre, low- resource, and adversarial text environments.