| dc.description.abstract |
Interleukin-6 (IL-6) is a versatile cytokine that plays a key role in regulating the immune system, managing inflammation, and contributing to the development of diseases like COVID-19. Finding peptides that can trigger IL-6 is essential for advancing immunotherapy and drug development. However, traditional lab methods for screening these peptides can be quite expensive and take a lot of time. This study introduces a machine learning approach designed to predict IL-6 inducing peptides accurately, utilizing biologically relevant features extracted through the ProPy3 Python library. We gathered data on amino acid composition (AAC), dipeptide composition (DPC), and various physicochemical properties for each peptide, resulting in a total of 435 descriptors. Our dataset included over 113,000 peptides, but only 369 were identified as IL-6 inducers, leading to a significant class imbalance. To tackle this issue, we employed the Synthetic Minority OverSampling Technique (SMOTE). We trained and assessed three different models: Random Forest, Support Vector Machine, and XGBoost. Among these, XGBoost stood out with the best performance, achieving an AUC of 0.95. To make sense of the predictions, we used SHAP (Shapley Additive explanations) analysis, which helped us pinpoint the key features that drive IL-6 induction. In the end, we applied our trained models to peptides from the SARS-CoV-2 spike protein to identify potential new IL-6 inducers, showcasing the practical application of our work. The pipeline we proposed is not only accurate and interpretable but also scalable for predicting IL6 peptides, and it can be adapted for other immunological targets as well. |
en_US |