DSpace Repository

Bias Reduction in ICU Mortality Prediction Through Targeted Synthetic Data Generation

Show simple item record

dc.contributor.author Soyad, Tahedi
dc.date.accessioned 2026-05-03T09:25:09Z
dc.date.available 2026-05-03T09:25:09Z
dc.date.issued 2025-09-14
dc.identifier.citation SWT en_US
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17124
dc.description Thesis Report en_US
dc.description.abstract Accurate predictions of patient mortality in the Intensive Care Unit (ICU) are critical for guiding clinical decisions and optimizing scarce resources, yet machine learning models trained on electronic health records (EHRs) often inherit demographic and outcome biases that lead to unfair predictions for vulnerable subgroups. This thesis investigates whether targeted synthetic data augmentation using the Synthetic Minority Over-sampling Technique (SMOTE) can mitigate these biases without compromising overall model performance. We assemble a multicenter cohort of 4,177 ICU stays from the eICU Collaborative Research Database, incorporating patient demographics (age, gender, ethnicity), aggregated vital signs (mean, minimum, maximum during the first 24 hours), and severity scores. Four classifiers—logistic regression, random forest, gradient boosting, and support vector machine—are trained and tested on an 80/20 stratified split of the raw data to establish baseline performance, revealing high overall accuracy (random forest: 97.6% accuracy, F1 0.836, ROC-AUC 0.942) alongside severe inequities in small subgroups (e.g., F1 of 0.00 for Hispanic patients under logistic regression). We then employ focused SMOTE augmentation to minor demographic subgroups and the minority mortality class, injecting synthetic data only for groups with both outcomes and small numbers. Fine-tuning on an augmented training set preserves overall accuracy (random forest: 97.2%), whereas the improved fairness makes subgroup F1 scores 1.00 for previously disadvantaged ethnic group and with a delta of 0.84 under elderly patients; logistic regression and SVM also benefited from a ttk equal improvement in fairness as well. We demonstrate that the introduced synthetic augmentation procedure can drastically improve fairness in ICU mortality prediction –particularly on tree-based models– without deteriorating performance. We describe an SMOTE integration pipeline and offer practical advice for CCMeaning-making machine learning pipelines that stress data-layered interventions with accompanying rigorous fair evaluation to encourage equitable AI in the ICU. en_US
dc.description.sponsorship DIU en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Clinical Decision Support Systems en_US
dc.subject ICU Mortality en_US
dc.subject Bias Mitigation in Machine Learning en_US
dc.subject Synthetic Data Generation en_US
dc.title Bias Reduction in ICU Mortality Prediction Through Targeted Synthetic Data Generation en_US
dc.type Technical Report en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account