Genome annotation and reconstruction of the metabolic pathways from genome sequence is a necessary step for studying and understanding the individual components that make up an organism. These individual components are monitored by technologies producing high-throughput data sets that needs to be analyzed and interpreted. There exits little or no annotation for non-model organisms so the inability to manipulate these data using the existing computational tools prevails. In order to better understand the biological information encoded in the genome of a non-model organism, a computational pipeline has been developed for functional annotation and metabolic pathway reconstruction using pattern recognition algorithms to compare with well annotated information from other organisms (model organisms) . In this work, we developed a top-down method that use not fully annotated genome sequences to predict the functions of genes. Instead of using not fully annotated genome to query public databases, conserved patterns from entries in protein databases are used as queries to search a local database of the raw genome sequence to predict proteins. Functions are assigned to the predicted proteins simultaneously. The methodology was based on sequence similarity techniques that made use of MUSCLE for multiple sequence alignment and TBLASTN for the BLAST alignment. The implementation was done using Python and shell scripting. We validate these method using 40 Manihot esculenta gene identifiers and compare the gene annotation results to KAAS results based on computational time and accuracy. We found that our gene annotations were larger and highly similar in terms of enzyme functions to the gene annotation from KAAS. Further, we found that our pipeline has a better computational time with focus on more data integration. The reconstructed pathway was visualized using a visualization tool called Cytoscape, which allow integration of network data, annotated data and expression data. This pipeline can serve as a useful automated tool that would help push further research in understanding the biological information that exist in genome of non-model organism, as well leading to many metabolic engineering applications.

A Computational Pipeline for Genome Annotation and Metabolic Pathway Reconstruction in Plants (A case study of Cassava)

Andreas Gisel;
2019

Abstract

Genome annotation and reconstruction of the metabolic pathways from genome sequence is a necessary step for studying and understanding the individual components that make up an organism. These individual components are monitored by technologies producing high-throughput data sets that needs to be analyzed and interpreted. There exits little or no annotation for non-model organisms so the inability to manipulate these data using the existing computational tools prevails. In order to better understand the biological information encoded in the genome of a non-model organism, a computational pipeline has been developed for functional annotation and metabolic pathway reconstruction using pattern recognition algorithms to compare with well annotated information from other organisms (model organisms) . In this work, we developed a top-down method that use not fully annotated genome sequences to predict the functions of genes. Instead of using not fully annotated genome to query public databases, conserved patterns from entries in protein databases are used as queries to search a local database of the raw genome sequence to predict proteins. Functions are assigned to the predicted proteins simultaneously. The methodology was based on sequence similarity techniques that made use of MUSCLE for multiple sequence alignment and TBLASTN for the BLAST alignment. The implementation was done using Python and shell scripting. We validate these method using 40 Manihot esculenta gene identifiers and compare the gene annotation results to KAAS results based on computational time and accuracy. We found that our gene annotations were larger and highly similar in terms of enzyme functions to the gene annotation from KAAS. Further, we found that our pipeline has a better computational time with focus on more data integration. The reconstructed pathway was visualized using a visualization tool called Cytoscape, which allow integration of network data, annotated data and expression data. This pipeline can serve as a useful automated tool that would help push further research in understanding the biological information that exist in genome of non-model organism, as well leading to many metabolic engineering applications.
2019
Genome
Metabolic pathways
Computational pipeline
Sequence alignment
Cassava
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/380641
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact