Abstract:
Classification of medical images is essential to the diagnosis of many diseases. This
paper presents a thorough analysis of multimodal deep learning fusion methods for
problems involving the classification of medical images. In order to improve the
precision and resilience of classification tasks, this work investigates pre-trained deep
learning architectures (VGG19, ResNet50 and InceptionV3). By fine-tuning these models
with a dataset of various medical images that includes categories like lung and colon
problems, the research takes advantage of transfer learning. The software, which is
TensorFlow-implemented, combines model ensembling methods with picture data
generators. It uses a conventional ensemble approach to combine predictions from each
individual model as part of a specific fusion strategy. The ensemble model performed
well across several classes, achieving a strong accuracy of roughly 97.43% on the
validation set. Early stopping criteria were used in the training phase, and the Adam
optimizer was used to optimize on categorical cross-entropy loss. In order to reduce
overfitting and improve generalization, hyperparameters are fine-tuned using strategies
like data augmentation, dropout, batch normalization, and early termination. Confusion
matrix analysis further demonstrated the model's ability to correctly categorize the
various categories, with high true positive rates and low false positive and false negative
rates across all classes. The final ensemble model, which is stored in HDF5 format,
provides a solid foundation for accurate image categorization within the dataset.