Essential genes (EGs) are fundamental for the growth and survival of a cell or an organism. Identifying EGs is an important issue in many areas of biomedical research, such as synthetic and system biology, drug development, mechanistic and therapeutic investigations. The essentiality is a context-dependent dynamic attribute of a gene that can vary in different cells, tissues, or pathological conditions, and wetlab experimental procedures to identify EGs are costly and time-consuming. Commonly explored computational approaches are based on machine learning techniques applied to protein-protein interaction networks, but they are often unsuccessful, especially in the case of human genes. From a biological point of view, the identification of the node essentiality attributes is a challenging task. Nevertheless, from a data science perspective, suitable graph learning approaches still represent an open problem. Node classification in graph modeling/analysis is a machine learning task to predict an unknown node property based on defined node attributes. The model is trained based on both the relationship information and the node attributes. Here, we propose the use of a context-specific integrated network enriched with biological and topological attributes. To tackle the node classification task we exploit different machine and deep learning models. An extensive experimental phase demonstrates the effectiveness of both network structure and attributes associated with the nodes for EGs identification.
Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis
Giordano M;Maddalena L;Granata I
2023
Abstract
Essential genes (EGs) are fundamental for the growth and survival of a cell or an organism. Identifying EGs is an important issue in many areas of biomedical research, such as synthetic and system biology, drug development, mechanistic and therapeutic investigations. The essentiality is a context-dependent dynamic attribute of a gene that can vary in different cells, tissues, or pathological conditions, and wetlab experimental procedures to identify EGs are costly and time-consuming. Commonly explored computational approaches are based on machine learning techniques applied to protein-protein interaction networks, but they are often unsuccessful, especially in the case of human genes. From a biological point of view, the identification of the node essentiality attributes is a challenging task. Nevertheless, from a data science perspective, suitable graph learning approaches still represent an open problem. Node classification in graph modeling/analysis is a machine learning task to predict an unknown node property based on defined node attributes. The model is trained based on both the relationship information and the node attributes. Here, we propose the use of a context-specific integrated network enriched with biological and topological attributes. To tackle the node classification task we exploit different machine and deep learning models. An extensive experimental phase demonstrates the effectiveness of both network structure and attributes associated with the nodes for EGs identification.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.