SARS-CoV-2 Spike Analysis for IL-6 Inducing Peptide Prediction Using Machine Learning

Akter, Mahmuda

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

dc.contributor.author	Akter, Mahmuda
dc.date.accessioned	2026-05-07T05:52:05Z
dc.date.available	2026-05-07T05:52:05Z
dc.date.issued	2025-09-20
dc.identifier.citation	SWT	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17151
dc.description	Thesis Report	en_US
dc.description.abstract	Interleukin-6 (IL-6) is a versatile cytokine that plays a key role in regulating the immune system, managing inflammation, and contributing to the development of diseases like COVID-19. Finding peptides that can trigger IL-6 is essential for advancing immunotherapy and drug development. However, traditional lab methods for screening these peptides can be quite expensive and take a lot of time. This study introduces a machine learning approach designed to predict IL-6 inducing peptides accurately, utilizing biologically relevant features extracted through the ProPy3 Python library. We gathered data on amino acid composition (AAC), dipeptide composition (DPC), and various physicochemical properties for each peptide, resulting in a total of 435 descriptors. Our dataset included over 113,000 peptides, but only 369 were identified as IL-6 inducers, leading to a significant class imbalance. To tackle this issue, we employed the Synthetic Minority OverSampling Technique (SMOTE). We trained and assessed three different models: Random Forest, Support Vector Machine, and XGBoost. Among these, XGBoost stood out with the best performance, achieving an AUC of 0.95. To make sense of the predictions, we used SHAP (Shapley Additive explanations) analysis, which helped us pinpoint the key features that drive IL-6 induction. In the end, we applied our trained models to peptides from the SARS-CoV-2 spike protein to identify potential new IL-6 inducers, showcasing the practical application of our work. The pipeline we proposed is not only accurate and interpretable but also scalable for predicting IL6 peptides, and it can be adapted for other immunological targets as well.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	SARS-CoV-2 Spike Protein Analysis	en_US
dc.subject	IL-6 Inducing Peptide Prediction	en_US
dc.subject	Bioinformatics Machine Learning	en_US
dc.subject	Immunoinformatics Modeling	en_US
dc.title	SARS-CoV-2 Spike Analysis for IL-6 Inducing Peptide Prediction Using Machine Learning	en_US
dc.type	Thesis	en_US