Computer-aided diagnosis (CAD) systems based on deep learning have shown significant potential for Alzheimer’s disease (AD) stage classification from Magnetic Resonance Imaging (MRI). Nevertheless, challenges such as class imbalance, small sample sizes, and the presence of multiple slices per subject may lead to biased evaluation and statistically unreliable performance, particularly for minority classes. In this study, a Vision Transformer (ViT)-based framework is proposed for multi-class AD classification using a Kaggle dataset containing 6400 MRI slices across four cognitive stages. A subject-wise data-splitting strategy is employed to prevent information leakage between the training and testing sets, and the statistical unreliability of near-perfect scores in underrepresented classes is critically examined. An ablation study is conducted to assess the contribution of key architectural components, demonstrating the effectiveness of self-attention and patch embedding in capturing discriminative features. Furthermore, attention-based visualization maps are incorporated to highlight brain regions influencing the model’s decisions and to illustrate subtle anatomical differences between MildDemented and VeryMildDemented cases. The proposed approach achieves a test accuracy of 97.98%, outperforming existing methods on the same dataset while providing improved interpretability. It supports early and accurate AD stage identification.

Enhancing Early Detection of Alzheimer’s Disease via Vision Transformer Machine Learning Architecture Using MRI Images

Marco Leo
;
Pierluigi Carcagnì;Marco Del-Coco;
2026

Abstract

Computer-aided diagnosis (CAD) systems based on deep learning have shown significant potential for Alzheimer’s disease (AD) stage classification from Magnetic Resonance Imaging (MRI). Nevertheless, challenges such as class imbalance, small sample sizes, and the presence of multiple slices per subject may lead to biased evaluation and statistically unreliable performance, particularly for minority classes. In this study, a Vision Transformer (ViT)-based framework is proposed for multi-class AD classification using a Kaggle dataset containing 6400 MRI slices across four cognitive stages. A subject-wise data-splitting strategy is employed to prevent information leakage between the training and testing sets, and the statistical unreliability of near-perfect scores in underrepresented classes is critically examined. An ablation study is conducted to assess the contribution of key architectural components, demonstrating the effectiveness of self-attention and patch embedding in capturing discriminative features. Furthermore, attention-based visualization maps are incorporated to highlight brain regions influencing the model’s decisions and to illustrate subtle anatomical differences between MildDemented and VeryMildDemented cases. The proposed approach achieves a test accuracy of 97.98%, outperforming existing methods on the same dataset while providing improved interpretability. It supports early and accurate AD stage identification.
2026
Istituto di Scienze Applicate e Sistemi Intelligenti "Eduardo Caianiello" - ISASI - Sede Secondaria Lecce
accuracy
Alzheimer’s Disease
classification
MRI
Vision Transformer
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/577141
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact