The proliferation of visual data in Science, Technology, Engineering, and Mathematics (STEM) presents accessibility barriers for blind and low-vision users. While Artificial Intelligence (AI) can generate alternative descriptions of STEM images, research remains fragmented and pratical impact is limited. This systematic survey examines 20 peer-reviewed studies on AI-based STEM visual description, focusing on accessibility and Human–Computer Interaction (HCI). Following PRISMA methodology and a ROBIS-based risk-of-bias assessment, the review analyzes (i) STEM visuals targeted, (ii) AI architectures employed, (iii) datasets and evaluation metrics, and (iv) interaction modalities for delivering descriptions. Findings show a shift from static alt-text toward interactive, multimodal systems integrating conversational interfaces, keyboard navigation, audio, and haptic feedback. However, challenges persist, including hallucinations, limited accessibility-first datasets co-designed with BLV users, and overreliance on automatic text-overlap metrics. The survey identifies future HCI priorities: user-controlled verbosity, explainable AI pipelines, and integration of accessible description into mainstream STEM environments.

A systematic survey on image description techniques for STEM domains

Buzzi Marina;Galesi Giulio;Leporini Barbara
2026

Abstract

The proliferation of visual data in Science, Technology, Engineering, and Mathematics (STEM) presents accessibility barriers for blind and low-vision users. While Artificial Intelligence (AI) can generate alternative descriptions of STEM images, research remains fragmented and pratical impact is limited. This systematic survey examines 20 peer-reviewed studies on AI-based STEM visual description, focusing on accessibility and Human–Computer Interaction (HCI). Following PRISMA methodology and a ROBIS-based risk-of-bias assessment, the review analyzes (i) STEM visuals targeted, (ii) AI architectures employed, (iii) datasets and evaluation metrics, and (iv) interaction modalities for delivering descriptions. Findings show a shift from static alt-text toward interactive, multimodal systems integrating conversational interfaces, keyboard navigation, audio, and haptic feedback. However, challenges persist, including hallucinations, limited accessibility-first datasets co-designed with BLV users, and overreliance on automatic text-overlap metrics. The survey identifies future HCI priorities: user-controlled verbosity, explainable AI pipelines, and integration of accessible description into mainstream STEM environments.
2026
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Istituto di informatica e telematica - IIT
Accessibility
Artificial intelligence
Human-computer interaction
Image description
Systematic survey
File in questo prodotto:
File Dimensione Formato  
Cardia et al_A Systematic Survey on Image Description Techniques for STEM Domains_VOR.pdf

solo utenti autorizzati

Descrizione: A Systematic Survey on Image Description Techniques for STEM Domains
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.17 MB
Formato Adobe PDF
2.17 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/589222
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact