Vision transformers represent the cutting-edge topic in computer vision and are usually employed on two-dimensional data following a transfer learning approach. In this work, we propose a trained-from-scratch stacking ensemble of 3D-vision transformers to assess prostate cancer aggressiveness from T2-weighted images to help radiologists diagnose this disease without performing a biopsy. We trained 18 3D-vision transformers on T2-weighted axial acquisitions and combined them into two- and three-model stacking ensembles. We defined two metrics for measuring model prediction confidence, and we trained all the ensemble combinations according to a five-fold cross-validation, evaluating their accuracy, confidence in predictions, and calibration. In addition, we optimized the 18 base ViTs and compared the best-performing base and ensemble models by re-training them on a 100-sample bootstrapped training set and evaluating each model on the hold-out test set. We compared the two distributions by calculating the median and the 95% confidence interval and performing a Wilcoxon signed-rank test. The best-performing 3D-vision-transformer stacking ensemble provided state-of-the-art results in terms of area under the receiving operating curve (0.89 [0.61-1]) and exceeded the area under the precision-recall curve of the base model of 22% (p < 0.001). However, it resulted to be less confident in classifying the positive class.

3D-Vision-transformer stacking ensemble for assessing prostate cancer aggressiveness from T2w images

Pachetti E;Colantonio S
2023

Abstract

Vision transformers represent the cutting-edge topic in computer vision and are usually employed on two-dimensional data following a transfer learning approach. In this work, we propose a trained-from-scratch stacking ensemble of 3D-vision transformers to assess prostate cancer aggressiveness from T2-weighted images to help radiologists diagnose this disease without performing a biopsy. We trained 18 3D-vision transformers on T2-weighted axial acquisitions and combined them into two- and three-model stacking ensembles. We defined two metrics for measuring model prediction confidence, and we trained all the ensemble combinations according to a five-fold cross-validation, evaluating their accuracy, confidence in predictions, and calibration. In addition, we optimized the 18 base ViTs and compared the best-performing base and ensemble models by re-training them on a 100-sample bootstrapped training set and evaluating each model on the hold-out test set. We compared the two distributions by calculating the median and the 95% confidence interval and performing a Wilcoxon signed-rank test. The best-performing 3D-vision-transformer stacking ensemble provided state-of-the-art results in terms of area under the receiving operating curve (0.89 [0.61-1]) and exceeded the area under the precision-recall curve of the base model of 22% (p < 0.001). However, it resulted to be less confident in classifying the positive class.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Vision transformers
Ensemble
Prostate cancer
MRI imaging
Deep learning
Cla
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/457625
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact