Abstract:
This study explores the application of deep learning to automate the identification of
research fields within scientific paper abstracts. The goal is to create a resilient model that
effectively categorizes the primary subject matter discussed in abstracts, enhancing
precision and efficiency. The dataset undergoes preprocessing, tokenization, and
transformation into sequences suitable for input into various models, including Artificial
Neural Networks (ANN), Convolutional Neural Networks (CNN), and Bidirectional Long
Short-Term Memory (BLSTM) cells. The trained model incorporates techniques such as
word embedding and dropout, and its performance is evaluated using metrics like accuracy
and the AUC-ROC score. The research addresses challenges in identifying research fields
within English language abstracts, employing language-specific preprocessing and data
augmentation. The results highlight the efficacy of deep learning in accurately categorizing
diverse research fields within English abstracts, showcasing its potential applicability
beyond English contexts. The findings contribute to advancing automated techniques for
recognizing research themes, streamlining the comprehension and classification of
scientific papers. Various algorithms, including ANN, CNN, BLSTM, DT, GB, ABC, RF,
SVC, XGB, MNB, PA, RC, and LR were employed. Notably, the Gradient Boosting (GB)
the model demonstrated exceptional performance with an 83.82% accuracy rate, and the
Support Vector Classification (SVC) yielded impressive results with an 83.50% accuracy
rate. These outcomes were achieved through meticulous hyperparameter tuning, enhancing
the overall robustness of the model.