| dc.description.abstract |
Amyloid fibrils formed by misfolded proteins are central to the pathology of several neurodegenerative disorders, including Alzheimer’s and Parkinson’s disease. Reliable in silico prediction of amyloidogenic proteins and peptides can greatly reduce experimental burden and guide mechanistic studies. Existing computational tools are dominated by hand-crafted sequence descriptors coupled with shallow machine-learning classifiers or ensemble models. While these approaches have achieved high accuracy, they often struggle to capture long-range residue dependencies and contextual patterns that underlie aggregation propensity.This study proposes iAmyloid_PepCG, a sequence-based predictor that integrates multiple engineered features with a hybrid Convolutional Neural Network–Gated Recurrent Unit (CNN–GRU) architecture. Protein/peptide sequences were collected from publicly available benchmark datasets and encoded into a diverse feature space including amino-acid composition, composition–transition– distribution (CTD/CTDC/CTDD), dipeptide composition, pseudo amino-acid composition, physicochemical property (PCP) vectors, and contextual embeddings from transformer models (ESM, ProtBERT, ProtALBERT). A two-stage evaluation was performed: (i) 10-fold cross-validation on the training set and (ii) assessment on an independent hold-out test set.The proposed hybrid CNN–GRU model (iAmyloid_PepCG) achieved an independent-test accuracy of 95.45%, sensitivity of 100%, F1- score of 0.9333, Matthews correlation coefficient (MCC) of 0.9037, Cohen’s kappa of 0.8991, and area under the ROC curve (AUC) of 0.9714, outperforming classical ML baselines and several state-of-the- art amyloid predictors on the same benchmarks.Cross-validation accuracy reached 78.18% with an AUC of 0.8861, indicating stable generalisation.These findings demonstrate that combining local pattern extraction by CNN with long-range dependency modelling by GRU, applied to a rich multi- view feature representation, yields a powerful framework for amyloid protein prediction. |
en_US |