DSpace Repository

Guided Learning with Reinforcement Learning for Bias Mitigation in LLMs

Show simple item record

dc.contributor.author Shovon, Shazid Nawas
dc.date.accessioned 2026-04-02T06:42:09Z
dc.date.available 2026-04-02T06:42:09Z
dc.date.issued 2025-10-04
dc.identifier.citation CSE en_US
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16545
dc.description Masters of Thesis en_US
dc.description.abstract The introduction of large language models (LLMs) has led to global attention being focused on a few high-stakes ethical issues - not least among them would be that they amplify and spread social prejudices inevitably embedded in their training data. The point of this work is to frontally attack the crying need for effective methods that use practical means to reduce biased output and so make it more fair and safe with AI based generative systems. In this paper we present a fine-tuning approach with Reinforcement Learning from Human Feedback (RLHF) for debiasing a pre-trained causal language model. Our approach in training the base model with supervised fine-tuning objective on custom data (then applying multi-step RL step) We introduced the bias of the final model via a reward signal that penalizes the bias, to generate bias-free language. The base model (supervised) and final (RLHF-tuned) models were extensively tested using a classification method on a held-out test system. In general the final model is much better at generating neutral text than the base models. The classification report for the last model revealed a significant increase on precision and recall of "Unbiased" and significant loss of stats of "Biased". Such results help to confirm that our RLHF-based finetuning can effectively mitigate harmful bias in practice, and indicate a scalable and sturdy method for creating fair as well as responsible AI. This work contributes to that literature which seeks now to produce robust and indeed ethical generative models available for large-scale use by the public. en_US
dc.description.sponsorship DIU en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject LLMs en_US
dc.subject Bias Mitigation en_US
dc.subject Reinforcement Learning en_US
dc.title Guided Learning with Reinforcement Learning for Bias Mitigation in LLMs en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account