DSpace Repository

Explainable Multimodal Hybrid Framework for Cervical Cancer Detection via Vision Transformers and LLM-Based Clinical Feature Fusion

Show simple item record

dc.contributor.author Alo, Alaya Parven
dc.date.accessioned 2026-03-31T02:35:18Z
dc.date.available 2026-03-31T02:35:18Z
dc.date.issued 2025-09-17
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16525
dc.description Project Report en_US
dc.description.abstract Cervical cancer is still among the major female causes of death globally, particularly in regions with limited access to healthcare. Although deep learning has been prospective for computer-aided diagnosis, its lack of interpretability and limited use of heterogeneous data types have limited clinical acceptance. In this paper, we present an interpretable multimodal hybrid deep learning architecture that combines Vision Transformers (ViT) with clinical metadata via Large Language Model (LLM)-assisted fusion. Visual features are learned from Pap smear images with ViT and enhanced predictive accuracy is achieved with the incorporation of structured clinical inputs such as patient history, HPV status, and cytology scores, regulated through an LLM-based attention mechanism.For feature imbalance handling and enhanced interpretability, we apply a dual-branch framework wherein image and text streams are conjoined via a semantic fusion layer for facilitating cross-modal alignment. Transparency is achieved by employing Lime, Saliency maps, and Grad-CAM visualizations that allow clinicians to trace-back predictions onto predictive image regions and metadata attributes. Comprehensive testing on benchmark datasets including SIPaKMeD and Herlev indicates that our approach achieves higher classification accuracy (99.93%) than single CNNs, ViTs, and classic machine learning models. Furthermore, hybrid models such as ViT-MobileNet and ViT-BERT are superior in generalization and outperform state-of-the-art methods for cell-level classification and patient-level classification tasks.Overall, the system enhances cervical cancer screening by providing a clear, stable, and clinically viable AI solution for early detection. Future directions include adding information on histopathology and real-time deployment through web-based diagnostic tools. en_US
dc.description.sponsorship Daffodil International University en_US
dc.language.iso en_US en_US
dc.publisher Daffodil International University en_US
dc.subject Cervical Cancer Detection en_US
dc.subject Vision Transformer (ViT) en_US
dc.subject Multimodal Deep Learning en_US
dc.subject Large Language Model (LLM) en_US
dc.title Explainable Multimodal Hybrid Framework for Cervical Cancer Detection via Vision Transformers and LLM-Based Clinical Feature Fusion en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account