Segmentation Variability in Bayesian U-Net versus Manual Annotations: Impact on Radiomic Reproducibility in Lung Tumor CT Images

Damiano, Rossella; Merli, Alessandro; Lanzarone, Ettore; Scalco, Elisa

doi:10.1109/embc58623.2025.11253060

: Radiomic analysis is highly sensitive to variations in Region of Interest (ROI) segmentation. Automatic segmentation methods based on Deep Learning (DL) can enhance radiomic reproducibility due to their high accuracy. Moreover, incorporating uncertainty quantification into these approaches can increase the trustworthiness of DL predictions, particularly when this uncertainty well captures the variability observed in expert annotations. This study aims to evaluate whether the uncertainty quantified by DL-based segmentation aligns with expert variability, and to identify the optimal configuration for maximizing radiomic features reproducibility. To this end, the Monte Carlo Dropout (MCD) approach was integrated into a U-Net model to segment lung tumors on CT images from two publicly available datasets. Tumor masks manually delineated by multiple experts were compared with masks predicted by MCD-based inferences at various confidence levels. Radiomic features were extracted from each segmentation, and reproducibility was assessed across combinations of confidence thresholds. The results indicate that the MCD approach can produce segmentations that partially reflect the variability observed in expert annotations, particularly at lower confidence thresholds. Also, radiomics remained highly sensitive to segmentation variability, with only about half of the features achieving reproducibility under the best conditions.Clinical relevance- This study supports the introduction and adoption of DL segmentation approaches and radiomic analysis in clinical practice, by increasing trustworthiness on their prediction, compared to manual delineation.