Abstract:
Recent rapid advancement in deep learning has shown promise in manifold applications,
with medical image analysis as one of its prime focus areas. A study was undertaken with
the aim of comparing the effectiveness of traditional CNNs-VGG19, ResNet50, and a
Custom-designed CNN architecture-with the newly evolving ViTs for categorizing chest
radiological images of infectious respiratory conditions. Through an extensive EDA, we
found that there are inherent challenges and complexities with medical imaging datasets.
Addressing these challenges, tailored image pre-processing methodologies were used,
emphasizing the importance of zoom and noise reduction in enhancing model efficacy.
Our study findings have demonstrated the robustness and adaptability of CNNs, with
VGG19, ResNet50, and Custom CNN outperforming the Vision Transformer on various
performance metrics. However, besides accuracy, the importance of model interpretability
was underlined. By applying gradient-based visualization and attention map
methodologies, we tried to shed light on the "black box" nature of deep learning models
and possibly open up new perspectives for enhancing cooperation between AI systems and
healthcare professionals. This research underlines both the potential and challenges of
AI in medical imaging and forms a foundation for further studies that conjoin
technological innovation with clinical expertise.