Essential genes are generally defined as necessary for the growth and survival of any organism or cell. Gene essentiality is a key concept in genetics, with implications for basic research, evolutionary biology, systems biology, and precision medicine. In these domains, the task of identifying EGs is challenging due to an increasing demand for procedures able to capture the context-specificity (e.g. tissue, organism or single cells) of gene essentiality. In the context of EG research field, one direction focuses on the development of new computational methods for processing data from gene knock-out experiments. The more harmful the phenotype observed after gene silencing, the more essential the gene. Gene deletion technique becomes complex, costly, labour- and time-intensive when applied at a genome-wide level. Thus, computational approaches are also crucially needed to develop models to predict EGs by learning characteristics of genes that can be associated with their essentiality status. Moreover, these methods allow to compensate for the lack of experimental data due to the inherently limited availability of in vitro models. Prediction models are commonly machine learning models trained in a supervised mode on several genetic characteristics. Some approaches in the literature also exploit physical information automatically learned by deep learning models based on the centrality of genes in protein-protein interaction (PPI) networks. The present work addresses two important tasks concerning context-specific EGs identification, providing novel approaches and related tools: 1) an unsupervised method to identify EGs from gene deletion-derived scores using a binarization scheme based on the Otsu dynamic thresholding algorithm; 2) an ensemble of learners dealing with unbalancing data as the reference machine learning method for EGs prediction based on multi-omics and deep learning features. Multi-omics features involve genomic, transcriptomic, epigenetic, functional, evolutionary and disease-related characteristics gathered from several source database and suitably analyzed and mined. Deep learning features of genes are meant to be network embeddings, i.e. vectorial representation of genes, capturing the centrality of genes in PPI networks according to the centrality-lethality rule: the more central a gene, or its product, the higher its probability of being essential. The methods discussed in in this work are provided as software tools in a unified programming framework, namely HELP (Human Gene Essentiality Labelling & Prediction), and their performance is validated and compared with respect to state-of-the-art methods.

Context-specific essential genes identification and prediction by learning multi-omics and network data

Maurizio Giordano
;
Lucia Maddalena;Mario Rosario Guarracino;Ilaria Granata
2024

Abstract

Essential genes are generally defined as necessary for the growth and survival of any organism or cell. Gene essentiality is a key concept in genetics, with implications for basic research, evolutionary biology, systems biology, and precision medicine. In these domains, the task of identifying EGs is challenging due to an increasing demand for procedures able to capture the context-specificity (e.g. tissue, organism or single cells) of gene essentiality. In the context of EG research field, one direction focuses on the development of new computational methods for processing data from gene knock-out experiments. The more harmful the phenotype observed after gene silencing, the more essential the gene. Gene deletion technique becomes complex, costly, labour- and time-intensive when applied at a genome-wide level. Thus, computational approaches are also crucially needed to develop models to predict EGs by learning characteristics of genes that can be associated with their essentiality status. Moreover, these methods allow to compensate for the lack of experimental data due to the inherently limited availability of in vitro models. Prediction models are commonly machine learning models trained in a supervised mode on several genetic characteristics. Some approaches in the literature also exploit physical information automatically learned by deep learning models based on the centrality of genes in protein-protein interaction (PPI) networks. The present work addresses two important tasks concerning context-specific EGs identification, providing novel approaches and related tools: 1) an unsupervised method to identify EGs from gene deletion-derived scores using a binarization scheme based on the Otsu dynamic thresholding algorithm; 2) an ensemble of learners dealing with unbalancing data as the reference machine learning method for EGs prediction based on multi-omics and deep learning features. Multi-omics features involve genomic, transcriptomic, epigenetic, functional, evolutionary and disease-related characteristics gathered from several source database and suitably analyzed and mined. Deep learning features of genes are meant to be network embeddings, i.e. vectorial representation of genes, capturing the centrality of genes in PPI networks according to the centrality-lethality rule: the more central a gene, or its product, the higher its probability of being essential. The methods discussed in in this work are provided as software tools in a unified programming framework, namely HELP (Human Gene Essentiality Labelling & Prediction), and their performance is validated and compared with respect to state-of-the-art methods.
2024
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR - Sede Secondaria Napoli
Essential genes, Context-specificity, CRISPR, Multi-omics, Machine Learning, Deep Learning, Network embedding, Light Gradient Boosting Machine
File in questo prodotto:
File Dimensione Formato  
f1000research-693058.pdf

accesso aperto

Descrizione: slides BBCC2024
Tipologia: Altro materiale allegato
Licenza: Creative commons
Dimensione 3.21 MB
Formato Adobe PDF
3.21 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/527985
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact