This work presents a new dataset created especially for Visual Question Answering (VQA) on brain tumor MRI images. This dataset includes 750 MRI images of brain tumor with a 512 × 512 pixel resolution. It also includes two different kinds of expert-annotated question-answer combinations in natural language (What/Which, and Yes/No) associated with three possible brain tumor categories (glioma, meningioma, and pituitary). To create a benchmark for this dataset, we propose a dual-stream VQA framework that leverages two transformer-based models to handle image feature extraction, question interpretation, and answer generation. The baseline model is thoroughly assessed on the dataset, revealing the task’s inherent complexity and emphasizing the difficulties in achieving precise medical VQA. The outcomes underscore the dataset’s utility in advancing multimodal medical support systems and lay the groundwork for future progress in this domain.

Brain Tumor MRI Interpretation: Towards a Benchmark for Medical Visual Question Answering

Minutolo, Aniello;Esposito, Massimo;
2025

Abstract

This work presents a new dataset created especially for Visual Question Answering (VQA) on brain tumor MRI images. This dataset includes 750 MRI images of brain tumor with a 512 × 512 pixel resolution. It also includes two different kinds of expert-annotated question-answer combinations in natural language (What/Which, and Yes/No) associated with three possible brain tumor categories (glioma, meningioma, and pituitary). To create a benchmark for this dataset, we propose a dual-stream VQA framework that leverages two transformer-based models to handle image feature extraction, question interpretation, and answer generation. The baseline model is thoroughly assessed on the dataset, revealing the task’s inherent complexity and emphasizing the difficulties in achieving precise medical VQA. The outcomes underscore the dataset’s utility in advancing multimodal medical support systems and lay the groundwork for future progress in this domain.
2025
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR - Sede Secondaria Napoli
9789819688883
9789819688890
Brain tumor MRI
Multimodal medical support systems
Visual Question Answering (VQA)
File in questo prodotto:
File Dimensione Formato  
Brain Tumor MRI Interpretation.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 753.94 kB
Formato Adobe PDF
753.94 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559566
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact