CNR Institutional Research Information System

Metagenomics is the study of genomic sequences in a heterogeneous microbial sample taken, e.g., from soil, water, human microbiome and skin. One of the primary objectives of metagenomic studies is to assign a taxonomic identity to each read sequenced from a sample and then to estimate the abundance of the known clades. With ever-increasing metagenomic datasets obtained from high-throughput sequencing technologies readily available nowadays, several fast and accurate methods have been developed that can work with reasonable computing requirements. Here we provide an overview of the state-of-theart methods for the classification of metagenomic sequences, especially highlighting theoretical factors that seem to correlate well with practical factors, and could therefore be useful in the choice or development of a new method in experimental contexts. In particular, we emphasize that the information derived from the known genomes and eventually used in the learning and classification processes may create several experimental issues--mostly based on the amount of information used in the processes and its uniqueness, significance, and redundancy,--and some of these issues are intrinsic both in current alignment-based approaches and in compositional ones. This entails the need to develop efficient alignmentfree methods that overcome such problems by combining the learning and classification processes in a single framework.

Theoretical and Practical Analyses in Metagenomic Sequence Classification

Hend Amraoui;Mourad Elloumi;Francesco Marcelloni;Faouzi Mhamdi;Davide Verzotto

2019

Abstract

Metagenomics is the study of genomic sequences in a heterogeneous microbial sample taken, e.g., from soil, water, human microbiome and skin. One of the primary objectives of metagenomic studies is to assign a taxonomic identity to each read sequenced from a sample and then to estimate the abundance of the known clades. With ever-increasing metagenomic datasets obtained from high-throughput sequencing technologies readily available nowadays, several fast and accurate methods have been developed that can work with reasonable computing requirements. Here we provide an overview of the state-of-theart methods for the classification of metagenomic sequences, especially highlighting theoretical factors that seem to correlate well with practical factors, and could therefore be useful in the choice or development of a new method in experimental contexts. In particular, we emphasize that the information derived from the known genomes and eventually used in the learning and classification processes may create several experimental issues--mostly based on the amount of information used in the processes and its uniqueness, significance, and redundancy,--and some of these issues are intrinsic both in current alignment-based approaches and in compositional ones. This entails the need to develop efficient alignmentfree methods that overcome such problems by combining the learning and classification processes in a single framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di informatica e telematica - IIT
			
	Codice ISBN
	
				9783030276836
			
	Parole chiave
	
				Metagenomic sequence classification
Alignment-free algorithms
Genome analysis
Combinatorics
Pattern discovery
Strings
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/388068

Citazioni

ND

0

ND

social impact