In recent years, the storage and transfer of clinical data within electronic health records (EHRs) have represented a significant step forward for research, providing direct and almost instantaneous access to information useful for patient care. However, despite their potential, extracting generalisable knowledge from these records remains a challenge. The lack of uniformity in the data they contain and the absence of standardisation in their formats hinder the direct use of EHR data for training predictive models of disease-associated risks. Access to the information contained in EHRs is crucial for identifying issues or risk factors and is essential for developing new therapies. In this context, word embedding algorithms play a crucial role in standardising and analysing clinical data within EHRs, representing medical terms as numerical vectors capable of capturing semantic and syntactic similarities between terms. This approach facilitates the extraction of clinically meaningful patterns, revealing possible hidden relationships between features useful to improve the quality of healthcare. A concrete example of the application of these techniques is a model of embedding tested on a dataset of clinical records of asplenic patients provided by the Italian Network for Asplenia (INA), a collaborative network of more than 60 Italian hospital centres. This model was used to identify possible relationships between clinical features, predict potential issues related to the disorder, and identify the most suitable therapies for each patient.
Representation learning of asplenic patient data for disease risk prediction
Teresa CappuccioPrimo
;Laura Casalino;Maurizio Giordano;Marcella Vacca;Ilaria Granata
Ultimo
2025
Abstract
In recent years, the storage and transfer of clinical data within electronic health records (EHRs) have represented a significant step forward for research, providing direct and almost instantaneous access to information useful for patient care. However, despite their potential, extracting generalisable knowledge from these records remains a challenge. The lack of uniformity in the data they contain and the absence of standardisation in their formats hinder the direct use of EHR data for training predictive models of disease-associated risks. Access to the information contained in EHRs is crucial for identifying issues or risk factors and is essential for developing new therapies. In this context, word embedding algorithms play a crucial role in standardising and analysing clinical data within EHRs, representing medical terms as numerical vectors capable of capturing semantic and syntactic similarities between terms. This approach facilitates the extraction of clinically meaningful patterns, revealing possible hidden relationships between features useful to improve the quality of healthcare. A concrete example of the application of these techniques is a model of embedding tested on a dataset of clinical records of asplenic patients provided by the Italian Network for Asplenia (INA), a collaborative network of more than 60 Italian hospital centres. This model was used to identify possible relationships between clinical features, predict potential issues related to the disorder, and identify the most suitable therapies for each patient.| File | Dimensione | Formato | |
|---|---|---|---|
|
Abstracts_ICSA_2025.pdf
accesso aperto
Tipologia:
Abstract
Licenza:
Creative commons
Dimensione
224.66 kB
Formato
Adobe PDF
|
224.66 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


