| dc.description.abstract |
Oral cancer continues to represent a significant challenge to world health because late diagnosis contributes to a decrease in survival. Currently, diagnosis is based on subjective clinical examinations and invasive histopathology. In this paper, we assess three deep learning algorithms - Convolutional Neural Networks, InceptionV3, and Vision Transformers - on a multi-source collection of clinical photos and histopathology images, which are all publicly available datasets. The images were resized, normalized, and augmented prior to patient splitting to avoid data leakage. Models were evaluated according to their accuracy, precision, recall, F1-score, ROC-AUC, PR-AUC, and calibration. The best results were obtained by Vision Transformers at a testing accuracy of 98.8%, and a ROC-AUC value of 0.99, also surpassing CNN and InceptionV3 benchmarks. While results demonstrate the potential of Vision Transformers in oral cancer screening, the dataset presented was not truly multi-source and requires more validation. Future work should construct paired multi-source datasets, validate different patient groups independently, and analyze clinical applications to obtain early noninvasive screening and informed decisions. |
en_US |