Bias Reduction in ICU Mortality Prediction Through Targeted  Synthetic Data Generation

Soyad, Tahedi

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

dc.contributor.author	Soyad, Tahedi
dc.date.accessioned	2026-05-03T09:25:09Z
dc.date.available	2026-05-03T09:25:09Z
dc.date.issued	2025-09-14
dc.identifier.citation	SWT	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17124
dc.description	Thesis Report	en_US
dc.description.abstract	Accurate predictions of patient mortality in the Intensive Care Unit (ICU) are critical for guiding clinical decisions and optimizing scarce resources, yet machine learning models trained on electronic health records (EHRs) often inherit demographic and outcome biases that lead to unfair predictions for vulnerable subgroups. This thesis investigates whether targeted synthetic data augmentation using the Synthetic Minority Over-sampling Technique (SMOTE) can mitigate these biases without compromising overall model performance. We assemble a multicenter cohort of 4,177 ICU stays from the eICU Collaborative Research Database, incorporating patient demographics (age, gender, ethnicity), aggregated vital signs (mean, minimum, maximum during the first 24 hours), and severity scores. Four classifiers—logistic regression, random forest, gradient boosting, and support vector machine—are trained and tested on an 80/20 stratified split of the raw data to establish baseline performance, revealing high overall accuracy (random forest: 97.6% accuracy, F1 0.836, ROC-AUC 0.942) alongside severe inequities in small subgroups (e.g., F1 of 0.00 for Hispanic patients under logistic regression). We then employ focused SMOTE augmentation to minor demographic subgroups and the minority mortality class, injecting synthetic data only for groups with both outcomes and small numbers. Fine-tuning on an augmented training set preserves overall accuracy (random forest: 97.2%), whereas the improved fairness makes subgroup F1 scores 1.00 for previously disadvantaged ethnic group and with a delta of 0.84 under elderly patients; logistic regression and SVM also benefited from a ttk equal improvement in fairness as well. We demonstrate that the introduced synthetic augmentation procedure can drastically improve fairness in ICU mortality prediction –particularly on tree-based models– without deteriorating performance. We describe an SMOTE integration pipeline and offer practical advice for CCMeaning-making machine learning pipelines that stress data-layered interventions with accompanying rigorous fair evaluation to encourage equitable AI in the ICU.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	Clinical Decision Support Systems	en_US
dc.subject	ICU Mortality	en_US
dc.subject	Bias Mitigation in Machine Learning	en_US
dc.subject	Synthetic Data Generation	en_US
dc.title	Bias Reduction in ICU Mortality Prediction Through Targeted Synthetic Data Generation	en_US
dc.type	Technical Report	en_US