Abstract:
Agriculture faces significant challenges from plant diseases, with potato production particularly vulnerable to early and late blight, leading to substantial yield losses and food insecurity. Conventional diagnostic methods are often slow, costly, and inaccessible to resource-constrained farmers, underscoring the need for affordable, scalable solutions. This study proposes a lightweight Vision Transformer (ViT)–based framework integrated with a mobile application to enable real-time, on-field detection of potato leaf diseases. A curated dataset of 6,954 potato leaf images across three classes—Early Blight, Late Blight, and Healthy—was preprocessed using resizing, augmentation, and normalization to ensure robustness. The proposed MobileViT architecture, optimized for efficiency, was trained and benchmarked against state-of-the-art CNN models (EfficientNetV2S/M/L, ConvNeXtBase) under identical experimental settings. Results demonstrated that the lightweight ViT achieved superior performance, attaining 99.91% validation accuracy and 100% test accuracy with only 1.3M parameters and a 4.99 MB model size, significantly outperforming larger models in both accuracy and computational efficiency. Confusion matrix analyses confirmed flawless class-wise classification, while ROC-AUC scores of 1.00 validated its reliability in distinguishing visually similar disease symptoms. The trained model was deployed to a mobile application via TensorFlow Lite, enabling farmers to capture or upload leaf images for instant disease diagnosis offline, thus ensuring usability in low-connectivity regions. Beyond technical performance, the system offers economic and environmental benefits by reducing crop losses, minimizing pesticide misuse, and enhancing sustainable agricultural practices. This research bridges the gap between advanced deep learning architectures and their practical field deployment, presenting a scalable, cost-effective, and farmer-friendly tool for precision agriculture and contributing to global food security.