Two culture-independent methods, amplicon-based sequencing and shotgun metagenomics, have significantly advanced the study of microbial communities. To date, short-read sequencing technologies have enabled high accuracy and deep coverage, while long-read sequencing approaches are increasingly being applied to improve genome assembly, despite challenges related to sequencing errors and nucleic acid input requirements. In this benchmark study, we compared the shotgun metagenomics approach across three sequencing technologies, Illumina (short reads), PacBio and Nanopore (long reads), using a 20-species commercial mock microbial community with even species representation. Specifically, we evaluated the effectiveness of the data generated by each platform in reconstructing genomes and identifying specific known taxa, as well as in understanding their functional potential, considering annotated genes, the length of predicted proteins and the number and types of inferred functions. Illumina sequencing provided high-throughput and high-quality data, but its limited read length precluded complete genome assembly. This affected the functional analysis, leading to an underestimation of coding and non-coding genes. Nanopore sequencing yielded the longest reads, resulting in more contiguous assemblies, although it was affected by higher error rates and the choice of assembly method. PacBio offered the best balance between read length and base accuracy, but with a lower number of reads. This affected genome coverage for certain taxa, influencing the quality of their assemblies, the completeness of MAGs (Metagenome Assembled Genomes), and the accuracy of functional annotation. Nevertheless, PacBio successfully retrieved MAGs for all mock community species, and the genome annotation was consistent with the reference. Evaluating the strengths and limitations of different NGS technologies and assembly strategies, this benchmark provides a practical framework for selecting the most suitable approach for optimizing data quality in microbiome genome characterization, according to study-specific goals.
Benchmarking short- and long-read sequencing technologies for metagenomic profiling of microbiomes
Grazia VisciCo-primo
;Elisabetta NotarioCo-primo
;Mariano Francesco Caratozzolo;Bruno Fosso
;Marinella Marzano
;Graziano Pesole
2026
Abstract
Two culture-independent methods, amplicon-based sequencing and shotgun metagenomics, have significantly advanced the study of microbial communities. To date, short-read sequencing technologies have enabled high accuracy and deep coverage, while long-read sequencing approaches are increasingly being applied to improve genome assembly, despite challenges related to sequencing errors and nucleic acid input requirements. In this benchmark study, we compared the shotgun metagenomics approach across three sequencing technologies, Illumina (short reads), PacBio and Nanopore (long reads), using a 20-species commercial mock microbial community with even species representation. Specifically, we evaluated the effectiveness of the data generated by each platform in reconstructing genomes and identifying specific known taxa, as well as in understanding their functional potential, considering annotated genes, the length of predicted proteins and the number and types of inferred functions. Illumina sequencing provided high-throughput and high-quality data, but its limited read length precluded complete genome assembly. This affected the functional analysis, leading to an underestimation of coding and non-coding genes. Nanopore sequencing yielded the longest reads, resulting in more contiguous assemblies, although it was affected by higher error rates and the choice of assembly method. PacBio offered the best balance between read length and base accuracy, but with a lower number of reads. This affected genome coverage for certain taxa, influencing the quality of their assemblies, the completeness of MAGs (Metagenome Assembled Genomes), and the accuracy of functional annotation. Nevertheless, PacBio successfully retrieved MAGs for all mock community species, and the genome annotation was consistent with the reference. Evaluating the strengths and limitations of different NGS technologies and assembly strategies, this benchmark provides a practical framework for selecting the most suitable approach for optimizing data quality in microbiome genome characterization, according to study-specific goals.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


