Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1-20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein-protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

Fast analytical methods for finding significant labeled graph motifs

2018

Abstract

Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1-20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein-protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.
2018
Network mining
Random network models
Labeled graph motifs
Graph algorithms
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/377800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 16
social impact