Perfectly integrated in the scenario of modern research, which requires advanced technologies to be applied in a wide range of research fields, the Institute for Biomedical Technologies in Bari has equipped its Labs with both the 454 Genome Sequencer FLX Titanium by Roche and with a powerful bioinformatics platform (hardware and software facilities) for managing and analysing NGS data. To cope with the fast rate at which NGS technologies are evolving, the development and setting up of new experimental protocols and of bioinformatics analysis pipelines is a challenge that researchers have to face daily. Major obstacles in NGS "omics" research are - at experimental level - finding the best way to extract DNA or RNAs from samples and obtaining good libraries for sequencing, and - at bioinformatics level - data storage, transfer, and analysis. The classical statistical methods and computational algorithms are inadequate for analysing the large amount of sequence data produced by new NGS technologies. Novel analytical strategies are urgently needed for exploring new features of sequencing data, integrating various genomic and epigenomic data, unravelling the structure, organisation, and function of the human genome, understanding fundamental principles of genomic biology, and discovering genetic and nongenetic bases of diseases. The Genomics Research team of ITB in Bari, is gaining great acquaintance with high throughput sequencing procedure and it has developed a new protocol for the preparation and amplification of representative cDNA libraries to be sequenced by NGS platforms. This protocol is patent pending in Europe. The Bioinformatics Group is focused on the development of bioinformatics tools for analysing data obtained by different NGS platforms for diverse projects spanning from molecular studies in cancer research to biodiversity studies. In this respect, the ITB-BA, in collaboration with other CNR and Academic Institutions, is presently involved in several projects for studying the molecular biodiversity in metagenomics and metatrascriptomics within Biomedical, Food and Environmental fields. In particular we are: - studying the transcriptome of normal and pathological samples with the aim of identifying genes, new mRNA isoforms, microRNA and genome wide mapping of transcription factors involved in the etiopathogenesis of human diseases; - analysing the exome and the transcriptome profile in short children with particular attention to the involvement of the p53 oncosuppressor gene family members (p53, p63, p73) in the regulation of the genes involved in growth; - investigating the possibility that only particular viral genotypes of Epstein Barr virus (EBV), that ubiquitously infects humans, can be associated with the etiopathogenesis of multiple sclerosis; - investigating the taxonomical complexity of microbial communities living in food industry "habitats", particularly in winemaking chain, shedding light on their contribution to quality, safety and traceability of final products; - categorising soil organisms through DNA barcode studies. A series of bioinformatics tools and analysis pipelines were developed, related to the studies and research lines mentioned above. Among these we cite: 1. ncRNA analysis of NGS data in RNA-Seq experiments: 1.1. in the case of cDNA obtained from a total RNA preparation, in addition to polyadenylated protein coding mRNAs, a great variety of transcripts are obtained, including ribosomal RNAs, mitochondrial transcripts and a large variety of functional non-coding RNAs (ncRNAs). To deal with these data a bioinformatics analysis pipeline has been developed. Given as input a large collection of experimental sequence reads, this pipeline identifies and classifies the mitochondrial, ribosomal and ncRNA fractions and provides the expression profile at qualitative and quantitative level of known ncRNAs. A collection of unmapped residual reads (potential coding fraction) is then generated to carry out further analyses; 1.2. development of a semi automatic workflow for NGS data analysis to find conserved and new small RNAs such as miRNA and siRNA. The workflow contains steps such as raw data quality control, adapter clipping, splitting by sequence size, mapping on reference genome, fold change calculations, and miRNA discovery. This tool, developed in collaboration with other plant and animal science groups, is still under development to guarantee a precise prediction of new small RNAs; 2. development of a new bioinformatics analysis pipeline devoted to the characterisation and comparison of taxonomic complexity of environmental microbial communities. Currently this pipeline, in which existing tools are integrated with new software created in our laboratory, is applied to the analysis of winemaking microbiota; 3. for the characterisation of soil Eukariots we are setting up a pipeline to characterise these communities both from a taxonomic point of view and based on phylogenetic diversity. To this goal we are developing tools (estimation of phylogenetic diversity) and experience on existing tools (AmpliconNoise).
454 GS-FLX TITANIUM PLATFORM: THE EXPERIENCE OF ITB-BA
Domenica D'Elia;Caterina Manzari;Mariano Francesco Caratozzolo;Flaviana Marzano;Marinella Marzano;Andreas Gisel;Saverio Vicario;Bachir Balech;Flavio Licciulli;Apollonia Tullo
2011
Abstract
Perfectly integrated in the scenario of modern research, which requires advanced technologies to be applied in a wide range of research fields, the Institute for Biomedical Technologies in Bari has equipped its Labs with both the 454 Genome Sequencer FLX Titanium by Roche and with a powerful bioinformatics platform (hardware and software facilities) for managing and analysing NGS data. To cope with the fast rate at which NGS technologies are evolving, the development and setting up of new experimental protocols and of bioinformatics analysis pipelines is a challenge that researchers have to face daily. Major obstacles in NGS "omics" research are - at experimental level - finding the best way to extract DNA or RNAs from samples and obtaining good libraries for sequencing, and - at bioinformatics level - data storage, transfer, and analysis. The classical statistical methods and computational algorithms are inadequate for analysing the large amount of sequence data produced by new NGS technologies. Novel analytical strategies are urgently needed for exploring new features of sequencing data, integrating various genomic and epigenomic data, unravelling the structure, organisation, and function of the human genome, understanding fundamental principles of genomic biology, and discovering genetic and nongenetic bases of diseases. The Genomics Research team of ITB in Bari, is gaining great acquaintance with high throughput sequencing procedure and it has developed a new protocol for the preparation and amplification of representative cDNA libraries to be sequenced by NGS platforms. This protocol is patent pending in Europe. The Bioinformatics Group is focused on the development of bioinformatics tools for analysing data obtained by different NGS platforms for diverse projects spanning from molecular studies in cancer research to biodiversity studies. In this respect, the ITB-BA, in collaboration with other CNR and Academic Institutions, is presently involved in several projects for studying the molecular biodiversity in metagenomics and metatrascriptomics within Biomedical, Food and Environmental fields. In particular we are: - studying the transcriptome of normal and pathological samples with the aim of identifying genes, new mRNA isoforms, microRNA and genome wide mapping of transcription factors involved in the etiopathogenesis of human diseases; - analysing the exome and the transcriptome profile in short children with particular attention to the involvement of the p53 oncosuppressor gene family members (p53, p63, p73) in the regulation of the genes involved in growth; - investigating the possibility that only particular viral genotypes of Epstein Barr virus (EBV), that ubiquitously infects humans, can be associated with the etiopathogenesis of multiple sclerosis; - investigating the taxonomical complexity of microbial communities living in food industry "habitats", particularly in winemaking chain, shedding light on their contribution to quality, safety and traceability of final products; - categorising soil organisms through DNA barcode studies. A series of bioinformatics tools and analysis pipelines were developed, related to the studies and research lines mentioned above. Among these we cite: 1. ncRNA analysis of NGS data in RNA-Seq experiments: 1.1. in the case of cDNA obtained from a total RNA preparation, in addition to polyadenylated protein coding mRNAs, a great variety of transcripts are obtained, including ribosomal RNAs, mitochondrial transcripts and a large variety of functional non-coding RNAs (ncRNAs). To deal with these data a bioinformatics analysis pipeline has been developed. Given as input a large collection of experimental sequence reads, this pipeline identifies and classifies the mitochondrial, ribosomal and ncRNA fractions and provides the expression profile at qualitative and quantitative level of known ncRNAs. A collection of unmapped residual reads (potential coding fraction) is then generated to carry out further analyses; 1.2. development of a semi automatic workflow for NGS data analysis to find conserved and new small RNAs such as miRNA and siRNA. The workflow contains steps such as raw data quality control, adapter clipping, splitting by sequence size, mapping on reference genome, fold change calculations, and miRNA discovery. This tool, developed in collaboration with other plant and animal science groups, is still under development to guarantee a precise prediction of new small RNAs; 2. development of a new bioinformatics analysis pipeline devoted to the characterisation and comparison of taxonomic complexity of environmental microbial communities. Currently this pipeline, in which existing tools are integrated with new software created in our laboratory, is applied to the analysis of winemaking microbiota; 3. for the characterisation of soil Eukariots we are setting up a pipeline to characterise these communities both from a taxonomic point of view and based on phylogenetic diversity. To this goal we are developing tools (estimation of phylogenetic diversity) and experience on existing tools (AmpliconNoise).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.