Abstract:
Lung cancer is a major cancer death that should be diagnosed at an early stage. Even though deep learning has resulted in slight advance in the diagnosis of CT, certain models have limited data sets, restricted in their global features, which ought to be too costly to compute CNNs are too local in their focus, and ViTs require large data sets and resources. These gaps are filled in the current paper, which proposes a computationally efficient and lightweight hybrid deep learning network that consists of custom residual CNN blocks and a Vision Transformer block. The CNN component adds to the gradient flow and is learned on more detailed spatial representations of the lung CT slices and the ViT component is learned on the global contextual relationship by multi-head self-attention. In order to train this model, the IQ-OTH/NCCD dataset (scaled to 15,000 images) was employed to ensure that this model is powerful and that it is free of overfitting. The specified architecture reached an accuracy of 0.98, a macro-average precision of 0.98, a recall of 0.98, and an F1-score of 0.98, with class-wise AUC scores of 1.00. The model also possesses very less parameters that stand at 2.46 million and is performing well and this saves on a lot of cost in calculating the model relative to the traditional transformer-based approach. The interpretability of the model is also supported by the explainable AI techniques such as Grad-CAM and LIME because of the presentation of the clinically relevant features as nodule boundaries. On the whole, the findings propose that the proposed hybrid CNN-ViT is a valid, interpretable, and computationally efficient model of automatic multi-class lung cancer discovery..