| dc.description.abstract |
Brain MRI classifiers sometimes have trouble generalizing between hospitals because of privacy rules and differences in protocols. In addition to being accurate, clinicians need clear models that back up predictions with anatomically sound evidence. We develop and evaluate an explainable federated learning (FL) framework that safeguards privacy for the classification of four categories of brain tumors: glioma_tumor, meningioma_tumor, pituitary_tumor, and no_tumor. Our objective is to align model selection with a quantitative notion of explanation faithfulness as well as generalization performance. We train parameter-efficient CNNs (ShuffleNetV2, RegNetY400, MobileNetV3-Large), deeper CNNs (ResNet-50, DenseNet121), a compact Custom CNN, and (Hybrid Swin-T + DenseNet-121, MLP) under synchronous FedAvg using 10,417 de-identified, single-channel MRI slices distributed across four clients. A harmonized classification head and preprocessing pipeline are shared by all backbones at 224×224. After each local round, clients compute Grad-CAM++ overlays and a lightweight, deletion-style faithfulness score on a fixed validation subset; they never share images or heatmaps with the server, only model weights and scalar summaries (loss, ACC, Macro-F1, and faithfulness mean±std). The server aggregates updates, records round-wise trajectories, uses validation Macro-F1 for checkpoint selection, and applies tie-breaking rules that favor higher and more consistent faithfulness. The best held-out accuracy is achieved by RegNetY400 (Test ACC = 0.9827), followed closely by ShuffleNetV2 and MobileNetV3- Large. The hybrid Swin-T+DenseNet121 exhibits the smallest cross-client dispersion, indicating particularly stable performance across sites. In terms of explanation quality, ShuffleNetV2 attains the strongest combination of top validation F1 (0.9592 at R20) and deletion-style faithfulness (mean 0.38 at R18), with RegNetY400 ranking second (mean 0.41 at R20). When rounds are close to their respective best checkpoints, models with higher faithfulness generally also show higher validation F1, suggesting a positive coupling between generalization and explanation quality. Overall, the proposed FL pipeline turns explainability from a purely post-hoc visualization step into a federated training signal. This design makes the model more accurate without giving away any raw data. By combining strong backbones with a quantitative, privacy-preserving faithfulness metric, the system makes accurate, clear, and auditable models that can be used at many clinical sites. |
en_US |