Reliable evolutionary inference increasingly depends on public genome resources, and the effects of uneven assembly quality, incomplete metadata, and biased taxonomic sampling remain poorly quantified. Using the species-rich fungal lineage Nectriaceae as a model system, we analysed 1,530 genome sequence assemblies to assess metadata completeness, sampling representation, and genome quality. One-third of the assemblies lacked essential metadata, sequencing was heavily skewed toward a few agriculturally important lineages, and sampling of many genera was limited or nonexistent. BUSCO and QUAST metrics revealed substantial heterogeneity in assembly quality, with widespread fragmentation and numerous assemblies falling outside expected quality thresholds. From 763 single-copy orthologs identified in 576 higher-quality genomes, we reconstructed a phylogenomic backbone and quantified gene- and site-level concordance across the tree. Although major clades were broadly recovered, extensive gene-tree discordance and a polyphyletic Fusarium nisikadoi species complex revealed unresolved boundaries and conflict among loci. These results show how data quality, incomplete sampling, and discordant genomic histories can constrain phylogenomic resolution, and provide a general framework for improving comparative genomic resources and large-scale evolutionary inference.

In genomes we trust: assessing genomic reliability within the family Nectriaceae

Villani, A.
Primo
;
Ghionna, V.;Susca, A.;Moretti, A.
Penultimo
;
2025

Abstract

Reliable evolutionary inference increasingly depends on public genome resources, and the effects of uneven assembly quality, incomplete metadata, and biased taxonomic sampling remain poorly quantified. Using the species-rich fungal lineage Nectriaceae as a model system, we analysed 1,530 genome sequence assemblies to assess metadata completeness, sampling representation, and genome quality. One-third of the assemblies lacked essential metadata, sequencing was heavily skewed toward a few agriculturally important lineages, and sampling of many genera was limited or nonexistent. BUSCO and QUAST metrics revealed substantial heterogeneity in assembly quality, with widespread fragmentation and numerous assemblies falling outside expected quality thresholds. From 763 single-copy orthologs identified in 576 higher-quality genomes, we reconstructed a phylogenomic backbone and quantified gene- and site-level concordance across the tree. Although major clades were broadly recovered, extensive gene-tree discordance and a polyphyletic Fusarium nisikadoi species complex revealed unresolved boundaries and conflict among loci. These results show how data quality, incomplete sampling, and discordant genomic histories can constrain phylogenomic resolution, and provide a general framework for improving comparative genomic resources and large-scale evolutionary inference.
2025
Istituto di Scienze delle Produzioni Alimentari - ISPA
Phylogenomics, Genome quality, Gene-tree discordance, Sampling bias, Nectriaceae
File in questo prodotto:
File Dimensione Formato  
2025.12.11.693719v1.full.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 1.72 MB
Formato Adobe PDF
1.72 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/581064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact