The discovery of a functional relationship between human diseases and non-coding RNAs (ncRNAs) is not new. In the last decade, it improved the elucidation of many diseases’ mechanisms and the improvement of therapeutic approaches (Lekka and Hall, 2018; Wang et al., 2016; Yang et al., 2014). Nevertheless, the function of many ncRNAs is still unclear or completely unknown, and therefore, their role in human diseases is difficult, if not impossible, to be identified. We have developed a new system, called LP-HCLUS, that is able to predict previously unknown disease-ncRNA associations by exploiting multi-type hierarchical clustering techniques. Differently from other approaches, LP-HCLUS is able to analyse and benefit from heterogeneous networks of interactions/relationships among multiple types of entities (e.g., diseases, ncRNAs, target genes) and relationships between them. To this aim, the proposed method first estimates the strength of the disease-ncRNA associations, exploiting both direct and indirect relationships. It constructs a hierarchy of heterogeneous clusters based on known and estimated relationships between diseases and ncRNAs. Finally, LP-HCLUS uses the generated clusters to induce new relationships, associating each of them with a certainty score. We conducted several experiments, comparing the performances achieved by LP-HCLUS with those obtained by two different competitors: HOCCLUS2 (Pio et al., 2013) and ncPred (Alaimo et al., 2014). In particular, we analysed two different datasets: HMDD v3.0, which contains data about relationships between diseases and miRNAs, and a dataset constructed integrating different state-of-the-art data sources (Chen et al., 2013; Helwak et al., 2013; Bauer-Mehren et al., 2010; Jiang et al., 2009). The results show that our system is able to outperform its competitors, and it can help biologists to conduct more focused research. Such a conclusion is also confirmed by a qualitative analysis conducted on the predicted associations that showed that many associations predicted by LP-HCLUS with a high certainty score have been subsequently validated and introduced in a more recent version of HMDD dataset (v3.2). The importance of such a development is also in its easy transfer for applications in any biological study involving heterogeneous data from different sources and types (e.g., different omics data, chemicals, biochemical and structural data, etc.).

LP-HCLUS: a novel tool for the prediction of relationships between ncRNAs and human diseases

Domenica D'Elia
Penultimo
Formal Analysis
;
2021

Abstract

The discovery of a functional relationship between human diseases and non-coding RNAs (ncRNAs) is not new. In the last decade, it improved the elucidation of many diseases’ mechanisms and the improvement of therapeutic approaches (Lekka and Hall, 2018; Wang et al., 2016; Yang et al., 2014). Nevertheless, the function of many ncRNAs is still unclear or completely unknown, and therefore, their role in human diseases is difficult, if not impossible, to be identified. We have developed a new system, called LP-HCLUS, that is able to predict previously unknown disease-ncRNA associations by exploiting multi-type hierarchical clustering techniques. Differently from other approaches, LP-HCLUS is able to analyse and benefit from heterogeneous networks of interactions/relationships among multiple types of entities (e.g., diseases, ncRNAs, target genes) and relationships between them. To this aim, the proposed method first estimates the strength of the disease-ncRNA associations, exploiting both direct and indirect relationships. It constructs a hierarchy of heterogeneous clusters based on known and estimated relationships between diseases and ncRNAs. Finally, LP-HCLUS uses the generated clusters to induce new relationships, associating each of them with a certainty score. We conducted several experiments, comparing the performances achieved by LP-HCLUS with those obtained by two different competitors: HOCCLUS2 (Pio et al., 2013) and ncPred (Alaimo et al., 2014). In particular, we analysed two different datasets: HMDD v3.0, which contains data about relationships between diseases and miRNAs, and a dataset constructed integrating different state-of-the-art data sources (Chen et al., 2013; Helwak et al., 2013; Bauer-Mehren et al., 2010; Jiang et al., 2009). The results show that our system is able to outperform its competitors, and it can help biologists to conduct more focused research. Such a conclusion is also confirmed by a qualitative analysis conducted on the predicted associations that showed that many associations predicted by LP-HCLUS with a high certainty score have been subsequently validated and introduced in a more recent version of HMDD dataset (v3.2). The importance of such a development is also in its easy transfer for applications in any biological study involving heterogeneous data from different sources and types (e.g., different omics data, chemicals, biochemical and structural data, etc.).
2021
Istituto di Tecnologie Biomediche - ITB - Sede Secondaria Bari
machine learning, multi-type hierarchical clustering, heterogeneous networks, non-coding RNAs, disease
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/515073
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact