Guided Learning with Reinforcement Learning for Bias  Mitigation in LLMs

Shovon, Shazid Nawas

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
M.SC. in CSE
→
Thesis
→
View Item

dc.contributor.author	Shovon, Shazid Nawas
dc.date.accessioned	2026-04-02T06:42:09Z
dc.date.available	2026-04-02T06:42:09Z
dc.date.issued	2025-10-04
dc.identifier.citation	CSE	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16545
dc.description	Masters of Thesis	en_US
dc.description.abstract	The introduction of large language models (LLMs) has led to global attention being focused on a few high-stakes ethical issues - not least among them would be that they amplify and spread social prejudices inevitably embedded in their training data. The point of this work is to frontally attack the crying need for effective methods that use practical means to reduce biased output and so make it more fair and safe with AI based generative systems. In this paper we present a fine-tuning approach with Reinforcement Learning from Human Feedback (RLHF) for debiasing a pre-trained causal language model. Our approach in training the base model with supervised fine-tuning objective on custom data (then applying multi-step RL step) We introduced the bias of the final model via a reward signal that penalizes the bias, to generate bias-free language. The base model (supervised) and final (RLHF-tuned) models were extensively tested using a classification method on a held-out test system. In general the final model is much better at generating neutral text than the base models. The classification report for the last model revealed a significant increase on precision and recall of "Unbiased" and significant loss of stats of "Biased". Such results help to confirm that our RLHF-based finetuning can effectively mitigate harmful bias in practice, and indicate a scalable and sturdy method for creating fair as well as responsible AI. This work contributes to that literature which seeks now to produce robust and indeed ethical generative models available for large-scale use by the public.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	LLMs	en_US
dc.subject	Bias Mitigation	en_US
dc.subject	Reinforcement Learning	en_US
dc.title	Guided Learning with Reinforcement Learning for Bias Mitigation in LLMs	en_US
dc.type	Thesis	en_US