Machine Learning–Based Drug Response Prediction Using Gene Expression Profiles in  Cancer Cell Lines

Jeem, Morsaline Ahamed

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Project Report
→
View Item

dc.contributor.author	Jeem, Morsaline Ahamed
dc.date.accessioned	2026-04-21T04:53:48Z
dc.date.available	2026-04-21T04:53:48Z
dc.date.issued	2025-12-30
dc.identifier.citation	SWT	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16964
dc.description	Project Report	en_US
dc.description.abstract	Understanding the response of cancer cells to various drugs is emerging as one of the crucial issues of contemporary precise medicine. As there is a lot of data on gene expression now, there is an increased potential to apply machine learning in order to learn more accurate patterns of drug sensitivity. In this thesis, I take advantage of the potential of basal gene expression patterns to be predictive of drug response, primarily of the IC50 values, in a collection of supervised ML models. The work relies on the GDSC data, that offers extensive information on the ideas of the expression and the vulnerability of tumor cells on a multitude of medications. My process involves preprocessing of high dimensional gene expression information, properly matching it with drug response labels and subsequently training various models (Random Forest, XGBoost, and MLP) and observing which model works best. In the process, I also assess the impact of feature scaling, data sampling as well as hyperparameter tuning so as to comprehend what impact each step has on the final result. The findings indicate that although the prediction of drug response remains a highly difficult exercise because of the noise and complexity of the data, there are always models which are more effective than others. Specifically, the accuracy of XGBoost and MLP increases by a small margin in an unfolding technique, however, their overall performance makes it obvious how challenging it is to model direct gene-to-drug causality using such high-dimensional, biological data. Despite those, the work remains useful with its provision of a reproducible pipeline to work with gene expression-based drug prediction tasks, as well as insights into the preprocessing and modeling choices that have the most significant impact. This thesis also argues the point that existing models fall short and how these strategies may be refined in future, such as selecting features, elaborating neural architectures, or including other omics data may potentially improve the model. In general, the project provides a demonstration of machine learning application in drug response prediction in a practical, hands-on way, as well as demonstrates actual difficulties that researchers encounter when working with biological data.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Machine learning in bioinformatics	en_US
dc.subject	Drug response prediction	en_US
dc.subject	Gene expression analysis	en_US
dc.subject	Cancer cell line modeling	en_US
dc.title	Machine Learning–Based Drug Response Prediction Using Gene Expression Profiles in Cancer Cell Lines	en_US
dc.type	Working Paper	en_US