| dc.description.abstract |
Early and accurate identification of Non-Small Cell Lung Cancer (NSCLC) subtypes is critically important, as it enables reliable differentiation between Adenocarcinoma (ADC) and Squamous Cell Carcinoma (SCC) and supports the adoption of truly personalized and targeted treatment strategies tailored to individual patients. Conventional biopsy-based diagnosis, however, is invasive and often time-consuming, highlighting an urgent need for reliable, non-invasive computational approaches using Computed Tomography (CT) imaging. Although deep learning models have shown promise in this domain, their clinical adoption remains limited due to challenges such as limited data availability, severe class imbalance, and poor interpretability. This thesis directly addresses these limitations by proposing a novel, interpretable multimodal feature fusion pipeline. The framework begins with a three-layer ROI imputation strategy designed to overcome the absence of explicit tumor boundary annotations, resulting in a unified, high-quality, nodule-level dataset comprising 134 unique patients. From this dataset, three complementary feature streams are extracted and systematically fused: ROI-imputed deep CNN embeddings, handcrafted radiomics features, and carefully preprocessed clinical metadata. These fused representations are then classified using a Stacking Ensemble Meta-Model with a Level-1 Logistic Regression classifier. The experimental results validate the effectiveness of the proposed approach. The initial ROI imputation stage significantly enhanced the safety-critical SCC recall of the image-based model, increasing it from 0.17 to 0.50. The final multimodal ensemble achieved strong clinical performance, with a Macro F1-score of 0.7363 and an SCC recall of 66.7%, demonstrating a balanced and reliable diagnostic capability. Furthermore, explainability analysis using SHAP values provided conclusive evidence supporting the central hypothesis of this work: the model’s balanced predictive performance arises from the integration of complementary, multi-domain features. Radiomic Maximum Density emerged as the most influential objective feature, synergizing effectively with abstract deep learning signals represented by CNN probability scores. Finally, a multi-perspective Explainable AI (XAI) protocol offered clinically meaningful insights, revealing that the model’s decision-making aligns closely with established pathological knowledge. Intra-tumoral texture was identified as the most influential feature for ADC classification, while peripheral invasion and pleural margin characteristics were dominant for SCC. Overall, this study presents a coherent, interpretable, and clinically aligned decision-support system, designed with translational readiness for real-world clinical application. |
en_US |