Abstract:
Crop diseases cause great risks to world food security and livelihood of farmers
particularly in the developing nations such as Bangladesh where maize, tomato
and onion are staple crops. The conventional manual detection systems are
inaccessible and slow with many errors and this highlights the importance of
scalable AI-based solutions. This work suggests a deep learning model to perform
multi-crop disease detection with Convolutional Neural Networks (CNNs) and
Vision Transformer (ViTs). A number of baseline CNN models were compared to
a custom ViT architecture where their performance was measured in accuracy,
precision, recall, and F1-score. The experiments showed that the ViT was better
than CNN baselines with an individual dataset accuracy of 98% on tomato, 96%
on onion, and 97% on maize. In the case of the multi-crop classification carried
out when the datasets were pooled, the ViT model achieved a higher overall
accuracy of 98.7% which shows good generalization across crops. To better
interpretability the use of pseudo-segmentation methods was undertaken, where
the specific disease-affected areas which are highlighted by the model could be
visualized. In addition, an operational web application was created to allow
identifying diseases in time when a user uploaded a leaf image, which should be
provided as a useful tool to farmers and agricultural advisors. In the course of
evaluation, Explainable AI (XAI) tools like LIME, and SHAP were implemented,
but their integration into deployed systems is still a matter of future work.
Altogether, the study confirms the usefulness of Vision Transformers in terms of
strong, explainable, and convenient detection of crop diseases and offers a base
in the development of mobile devices that could be used in the future to help
farmers to sustain their agriculture.