Introduction Environmental Sequencing of Amplicon and more recently Metagenomics with NGS approches are the main strategy for biodiversity exploration of microorganismal community since several years. Soil Mesofauna share several features with microorganisms: the possibility to be sampled in bulk; the difficulty to isolate and perform taxonomic identification on all life stages. This makes them suitable for an environmental sequencing on target loci ("metagenetic")[1], while genome size makes unaffordable a metagenomic approach. Typically for Eukaryote, and especially for metazoans the target loci is the one proposed by the Barcode of Life Initiative (http://www.barcodeoflife.org/), that in the case of metazoan is the first third of the cytochrome oxidase I [2]. Giving the sparse availability of reference sequences for italian soil mesofauna we implemented an approach based on phylogenetic diversity that makes use of the abundance. This approach has the further advantage to allow to spot the phylogenetic depth at which the sample are differentiate. Methods We sampled chestnut soil from 3 localities using 2 approaches for the enrichment with mobile mesofauna idrophilic and aereophilic respectively. Further 2 samples were taken from one of the 3 localities using pit trap and sorting all the content by Order and by Species for Carabidae family and terrestrial isopod. The first sample have the same proportion of organism than the pit trap while the second one biological tissue was equalized among the identified Carabidae species. The amplification of the barcode region of both mobile mesofauna idrophilic and aereophilic DNA extracted from each localities, was performed using "Folmer" universal primer [2] and high fidelity DNA polymerase system. We decided to sequence the amplicons with 454 Roche pyrosequencing platform and the Ligated Adaptors strategy. This allowed to amplify without modifying the Folmer universal primer with the fusion of the 454 adaptors (step necessary for the Basic-Amplicon sequencing strategy). 454 NGS reads were denoised with AmpliconNoise suite [3] and candidate chimerae were filtered out with Uchime software [4]. Reference sequences for denoised data were selected from protein translated project reference DB, BOLD (http://www.boldsystems.org) and NCBI nr using blastX search. Using the taxonomy of the reference sequence reads were clustered by rank Order. For each cluster an alignment was performed with HMMer (http://hmmer.org/) using the protein alignment of the reference sequences as guide. RaxML [5] allowed to infer the phylogenetic structure of each cluster. A series of python script allowed to estimated the phylogenetic entropy [6] of each sample and across all of them. This allowed to estimate beta diversity of the across samples as measured by the exponential of the average Kullback-Leibler distance between per sample and across samples phylogenetic diversity . A permutation procedure allowed to define how much the results were due to inadequate sampling effort or to actual biological signal. Tracing the contribution of each branch of the phylogenetic of the samples to the average Kullback-Leibler distance allowed to identify the lineage that caused the difference across sample. The actual taxonomic identity of the lineage could be then traced using the reference sequence included in the phylogenetic inference Results Preliminary result are shown. The blastx procedure allowed to identify organism that were passively transported by focal organism as metazoa gut parasite and bacteria. A simulation allowed to validate the phylogenetic diversity procedure. Results from real word data are available from the pit trap samples and for selected clusters in the other six samples. The procedure identified lineages that changed across the 3 localities and taxonomic referenced using the most near reference sequence.

Charting the unknown at twilight: partitioning phylogenetic diversity across samples on Barcode environmental sequencing when reference data are sparse

Saverio Vicario;Bachir Balech;Caterina Manzari;Apollonia Tullo;Giorgio Grillo;
2011

Abstract

Introduction Environmental Sequencing of Amplicon and more recently Metagenomics with NGS approches are the main strategy for biodiversity exploration of microorganismal community since several years. Soil Mesofauna share several features with microorganisms: the possibility to be sampled in bulk; the difficulty to isolate and perform taxonomic identification on all life stages. This makes them suitable for an environmental sequencing on target loci ("metagenetic")[1], while genome size makes unaffordable a metagenomic approach. Typically for Eukaryote, and especially for metazoans the target loci is the one proposed by the Barcode of Life Initiative (http://www.barcodeoflife.org/), that in the case of metazoan is the first third of the cytochrome oxidase I [2]. Giving the sparse availability of reference sequences for italian soil mesofauna we implemented an approach based on phylogenetic diversity that makes use of the abundance. This approach has the further advantage to allow to spot the phylogenetic depth at which the sample are differentiate. Methods We sampled chestnut soil from 3 localities using 2 approaches for the enrichment with mobile mesofauna idrophilic and aereophilic respectively. Further 2 samples were taken from one of the 3 localities using pit trap and sorting all the content by Order and by Species for Carabidae family and terrestrial isopod. The first sample have the same proportion of organism than the pit trap while the second one biological tissue was equalized among the identified Carabidae species. The amplification of the barcode region of both mobile mesofauna idrophilic and aereophilic DNA extracted from each localities, was performed using "Folmer" universal primer [2] and high fidelity DNA polymerase system. We decided to sequence the amplicons with 454 Roche pyrosequencing platform and the Ligated Adaptors strategy. This allowed to amplify without modifying the Folmer universal primer with the fusion of the 454 adaptors (step necessary for the Basic-Amplicon sequencing strategy). 454 NGS reads were denoised with AmpliconNoise suite [3] and candidate chimerae were filtered out with Uchime software [4]. Reference sequences for denoised data were selected from protein translated project reference DB, BOLD (http://www.boldsystems.org) and NCBI nr using blastX search. Using the taxonomy of the reference sequence reads were clustered by rank Order. For each cluster an alignment was performed with HMMer (http://hmmer.org/) using the protein alignment of the reference sequences as guide. RaxML [5] allowed to infer the phylogenetic structure of each cluster. A series of python script allowed to estimated the phylogenetic entropy [6] of each sample and across all of them. This allowed to estimate beta diversity of the across samples as measured by the exponential of the average Kullback-Leibler distance between per sample and across samples phylogenetic diversity . A permutation procedure allowed to define how much the results were due to inadequate sampling effort or to actual biological signal. Tracing the contribution of each branch of the phylogenetic of the samples to the average Kullback-Leibler distance allowed to identify the lineage that caused the difference across sample. The actual taxonomic identity of the lineage could be then traced using the reference sequence included in the phylogenetic inference Results Preliminary result are shown. The blastx procedure allowed to identify organism that were passively transported by focal organism as metazoa gut parasite and bacteria. A simulation allowed to validate the phylogenetic diversity procedure. Results from real word data are available from the pit trap samples and for selected clusters in the other six samples. The procedure identified lineages that changed across the 3 localities and taxonomic referenced using the most near reference sequence.
2011
Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari (IBIOM)
Istituto di Tecnologie Biomediche - ITB
phylogenetic diversity
NGS
Molecular biodiversity
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/384626
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact