CNR Institutional Research Information System

Numerous studies have demonstrated the functional role of small nuclear RNAs (snoRNAs) in various biological processes associated with developing complex human disorders. Therefore, understanding the connections between different snoRNAs and diseases is essential for improving disease detection and therapy. In this work, we propose a graph neural network model to predict unknown snoRNA-disease associations. Our network consists of four layers, each constructed by a sequence of SAGEConv and GATConv layers. We take into account two class of features for both snoRNAs and disease never used by similar works. We generate the snoRNA node features according to substructures of varying sizes within their secondary structures; we obtain the disease node features from disease textual descriptions using two large language models, one trained on medical texts and the other trained on different text type, including medical text. Our results indicate that our model outperforms current state-ofthe- ar model and in particular that the snoRNA features derived from smaller substructures are the most suitable for this problem, whereas the disease features that yielded the best performance were those extracted using a model specifically trained on medical texts.

A GNN model for the prediction of snoRNA-disease associations based on snoRNA secondary structures and LLM disease embedding

Isabella Mendolia;Andrea Licciardi;Fiannaca Antonino;La Paglia Laura;La Rosa Massimo;Urso Alfonso

2024

Abstract

Numerous studies have demonstrated the functional role of small nuclear RNAs (snoRNAs) in various biological processes associated with developing complex human disorders. Therefore, understanding the connections between different snoRNAs and diseases is essential for improving disease detection and therapy. In this work, we propose a graph neural network model to predict unknown snoRNA-disease associations. Our network consists of four layers, each constructed by a sequence of SAGEConv and GATConv layers. We take into account two class of features for both snoRNAs and disease never used by similar works. We generate the snoRNA node features according to substructures of varying sizes within their secondary structures; we obtain the disease node features from disease textual descriptions using two large language models, one trained on medical texts and the other trained on different text type, including medical text. Our results indicate that our model outperforms current state-ofthe- ar model and in particular that the snoRNA features derived from smaller substructures are the most suitable for this problem, whereas the disease features that yielded the best performance were those extracted using a model specifically trained on medical texts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				GNN, LLM, prediction, snoRNA, disease
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
A_GNN_Model_for_the_Prediction_of_snoRNA-Disease_Associations_Based_on_snoRNA_Secondary_Structures_and_LLM_Disease_Embedding.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 230.56 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	230.56 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/529401

Citazioni

ND

ND

ND

social impact