| dc.description.abstract |
Background Anti-diabetic peptides (ADPs) are a potentially appealing therapeutic modality even with the slow progression of experimental discovery. We compiled a balanced dataset of 4,061 peptides (1,261 active; 2,800 inactive) and tested four families of sequence-derived features AAC, DPC, CKSAAP, and PseAAC. Objective Design an accurate, lightweight and interpretable ADP predictor (ADPpred) and identify the best generalizing feature-model combination. Highlight MCC and F1 as key measures, and report Accuracy, Sensitivity, Specificity and kappa. A five-fold stratified cross- validation (CV) and independent 20 percent hold-out test were used.
Results Several feature sets were tested, with ResidualMLP showing the best performance CV across all (mean MCC = 0.919, F1 = 0.960, Accuracy = 0.959). In terms of features, CKSAAP remained the most dominant: CKSAAP + ResidualMLP had MCC = 0.986, F1 = 0.996, Accuracy = 0.993, Sensitivity = 0.992, Specificity = 0.998 in CV. The same configuration achieved Accuracy = 0.970, F1 = 0.961, MCC = 0.941, Sensitivity = 0.981, Specificity = 0.961; ROC AUCs on CKSAAP were ~0.987 on all model families, which shows good discrimination. ADPpred thus outperforms prior ADP-specific RF baselines and is similar in performance to PLM-based methods, but computationally efficient.
Conclusion ADP pred, with focus on CKSAAP features and ResidualMLP classifier, yields high balanced performance and is generalizable to unseen peptides. Its ease of use, fastness and interpretability of the features make it a convenient tool to screen and design against- diabetic peptides. |
en_US |