Abstract:
This study presents a comprehensive deep learning-based framework for multiclass plant disease classification using high-resolution leaf images, with a
particular focus on evaluating the performance of convolutional neural networks
(CNNs) and a hybrid ResNet + Vision Transformer (ViT) architecture. A curated
dataset comprising 15,200 training and 3,800 validation images spanning 38
classes across multiple crops, including tomato, apple, grape, corn, potato,
strawberry, peach, pepper, orange, blueberry, raspberry, soybean, and squash, was
subjected to preprocessing steps such as resizing, normalization, and data
augmentation to enhance model robustness. Multiple CNN architectures—
including ResNet-50, MobileNetV2, and EfficientNet-B0—were trained and
compared against the hybrid ResNet + ViT model. All models were fine-tuned using
the AdamW optimizer and cross-entropy loss, with early stopping applied to
prevent overfitting and ensure generalization. Additionally, interpretability
techniques including Grad-CAM and Saliency Maps were employed to visualize
disease-relevant regions, while segmentation-based analysis was performed to
localize affected areas on leaves. Among all architectures evaluated, ResNet-50
achieved the highest validation accuracy of 98.74%, while the hybrid ResNet + ViT
model recorded a competitive accuracy of 98.58%, demonstrating the effectiveness
of hybrid architectures in capturing both local and global features. The
experimental results highlight the potential of transformer-based and lightweight
CNN models to deliver highly accurate, interpretable, and computationally
efficient solutions for automated multi-class plant disease detection, offering
valuable support for precision agriculture and crop management practices.