Abstract:
Small datasets and device-to-device variations in image quality frequently limit deep learning for color fundus images. In this thesis, I use a nine-class dataset of 4,500 images (500 per class) to examine the impact of image augmentation strength on multi-class retinal disease prediction. Four convolutional models are used in the experiments: EfficientNet-B4, MobileNetV3-Large, DenseNet-121, and Custom CNN.Here every model is trained with four different augmentation settings. Those are: no augmentation, mild, strong, and an advanced. All runs share the same train–validation–test split and training setup, so differences in performance can be linked mainly to the model and the augmentation level. The results show that the effect of augmentation is model-dependent. The Custom CNN performs best without augmentation, while DenseNet-121 reaches its peak with mild augmentation. EfficientNet-B4 performs best with strong augmentation. MobileNetV3-Large benefits the most from heavy augmentation: with the advanced setting, it achieves the highest overall performance, with accuracy around 0.851 and macro-F1 around 0.849. These findings suggest that augmentation strength should not be chosen as a single fixed recipe for all backbones. Instead, it needs to be tuned per model when designing retinal disease classification systems based on deep learning.