Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple retinal diseases, including Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), drusen, Central Serous Retinopathy (CSR), and Macular Hole (MH), as well as normal cases. The proposed framework integrates a Convolutional Neural Network (CNN) for image-based feature extraction, a Graph Neural Network (GNN) to model complex relationships among clinical risk factors, and a Large Language Model (LLM) to process patient medical reports. By leveraging diverse data sources, VisionTrack improves prediction accuracy and offers a more comprehensive assessment of retinal health. Experimental results demonstrate the effectiveness of this hybrid system, highlighting its potential for early detection, risk assessment, and personalized ophthalmic care. Experiments were conducted using two publicly available datasets, RetinalOCT and RFMID, which provide diverse retinal imaging modalities: OCT images and fundus images, respectively. The proposed multi-modal AI system demonstrated strong performance in multi-label disease prediction. On the RetinalOCT dataset, the model achieved an accuracy of 0.980, F1-score of 0.979, recall of 0.978, and precision of 0.979. Similarly, on the RFMID dataset, it reached an accuracy of 0.989, F1-score of 0.881, recall of 0.866, and precision of 0.897. These results confirm the robustness, reliability, and generalization capability of the proposed approach across different imaging modalities.

Multi-Modal AI for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images: A Hybrid Approach

Antonio Guerrieri
2025

Abstract

Ocular diseases can significantly affect vision and overall quality of life, with diagnosis often being time-consuming and dependent on expert interpretation. While previous computer-aided diagnostic systems have focused primarily on medical imaging, this paper proposes VisionTrack, a multi-modal AI system for predicting multiple retinal diseases, including Diabetic Retinopathy (DR), Age-related Macular Degeneration (AMD), Diabetic Macular Edema (DME), drusen, Central Serous Retinopathy (CSR), and Macular Hole (MH), as well as normal cases. The proposed framework integrates a Convolutional Neural Network (CNN) for image-based feature extraction, a Graph Neural Network (GNN) to model complex relationships among clinical risk factors, and a Large Language Model (LLM) to process patient medical reports. By leveraging diverse data sources, VisionTrack improves prediction accuracy and offers a more comprehensive assessment of retinal health. Experimental results demonstrate the effectiveness of this hybrid system, highlighting its potential for early detection, risk assessment, and personalized ophthalmic care. Experiments were conducted using two publicly available datasets, RetinalOCT and RFMID, which provide diverse retinal imaging modalities: OCT images and fundus images, respectively. The proposed multi-modal AI system demonstrated strong performance in multi-label disease prediction. On the RetinalOCT dataset, the model achieved an accuracy of 0.980, F1-score of 0.979, recall of 0.978, and precision of 0.979. Similarly, on the RFMID dataset, it reached an accuracy of 0.989, F1-score of 0.881, recall of 0.866, and precision of 0.897. These results confirm the robustness, reliability, and generalization capability of the proposed approach across different imaging modalities.
2025
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Convolutional Neural Network (CNN)
Graph Neural Network (GNN)
Large Language Model (LLM)
ocular diseases
ophthalmology
retinal image
File in questo prodotto:
File Dimensione Formato  
sensors-25-04492.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Altro tipo di licenza
Dimensione 688.21 kB
Formato Adobe PDF
688.21 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/560169
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact