Over the past couple of years, Transformers became increasingly popular within the deep learning community. Initially designed for Natural Language Processing tasks, Transformers were then tailored to fit to the Image Analysis field. The self-attention mechanism behind Transformers immediately appeared a promising, although computationally expensive, learning approach. However, Transformers do not adapt as well to tasks involving large images or small datasets. This propelled the exploration of hybrid CNN-Transformer models, which seemed to overcome those limitations, thus sparkling an increasing interest also in the field of medical imaging. Here, a hybrid approach is investigated for Pigment Signs (PS) segmentation in Fundus Images of patients suffering from Retinitis Pigmentosa, an eye disorder eventually leading to complete blindness. PS segmentation is a challenging task due to the high variability of their size, shape and colors and to the difficulty to distinguish between PS and blood vessels, which often overlap and display similar colors. To address those issues, we use the Group Transformer U-Net, a hybrid CNN-Transformer. We investigate the effects, on the learning process, of using different losses and choosing an appropriate parameter tuning. We compare the obtained performances with the classical U-Net architecture. Interestingly, although the results show margins for a consistent improvement, they do not suggest a clear superiority of the hybrid architecture. This evidence raises several questions, that we address here but also deserve to be further investigated, on how and when Transformers are really the best choice to address medical imaging tasks.

Exploring a Transformer Approach for Pigment Signs Segmentation in Fundus Images

Sangiovanni M;Frucci M;Riccio D;Brancati N
2022

Abstract

Over the past couple of years, Transformers became increasingly popular within the deep learning community. Initially designed for Natural Language Processing tasks, Transformers were then tailored to fit to the Image Analysis field. The self-attention mechanism behind Transformers immediately appeared a promising, although computationally expensive, learning approach. However, Transformers do not adapt as well to tasks involving large images or small datasets. This propelled the exploration of hybrid CNN-Transformer models, which seemed to overcome those limitations, thus sparkling an increasing interest also in the field of medical imaging. Here, a hybrid approach is investigated for Pigment Signs (PS) segmentation in Fundus Images of patients suffering from Retinitis Pigmentosa, an eye disorder eventually leading to complete blindness. PS segmentation is a challenging task due to the high variability of their size, shape and colors and to the difficulty to distinguish between PS and blood vessels, which often overlap and display similar colors. To address those issues, we use the Group Transformer U-Net, a hybrid CNN-Transformer. We investigate the effects, on the learning process, of using different losses and choosing an appropriate parameter tuning. We compare the obtained performances with the classical U-Net architecture. Interestingly, although the results show margins for a consistent improvement, they do not suggest a clear superiority of the hybrid architecture. This evidence raises several questions, that we address here but also deserve to be further investigated, on how and when Transformers are really the best choice to address medical imaging tasks.
2022
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Retinitis Pigmentosa
Transformers
Segmentation
Deep Learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/417489
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact