Reliable evolutionary inference increasingly depends on public genome resources, and the effects of uneven assembly quality, incomplete metadata, and biased taxonomic sampling remain poorly quantified. Using the species-rich fungal lineage Nectriaceae as a model system, we analysed 1,530 genome sequence assemblies to assess metadata completeness, sampling representation, and genome quality. One-third of the assemblies lacked essential metadata, sequencing was heavily skewed toward a few agriculturally important lineages, and sampling of many genera was limited or nonexistent. BUSCO and QUAST metrics revealed substantial heterogeneity in assembly quality, with widespread fragmentation and numerous assemblies falling outside expected quality thresholds. From 763 single-copy orthologs identified in 576 higher-quality genomes, we reconstructed a phylogenomic backbone and quantified gene- and site-level concordance across the tree. Although major clades were broadly recovered, extensive gene-tree discordance and a polyphyletic Fusarium nisikadoi species complex revealed unresolved boundaries and conflict among loci. These results show how data quality, incomplete sampling, and discordant genomic histories can constrain phylogenomic resolution, and provide a general framework for improving comparative genomic resources and large-scale evolutionary inference.
In genomes we trust: assessing genomic reliability within the family Nectriaceae
Villani, A.Primo
;Ghionna, V.;Susca, A.;Moretti, A.
Penultimo
;
2025
Abstract
Reliable evolutionary inference increasingly depends on public genome resources, and the effects of uneven assembly quality, incomplete metadata, and biased taxonomic sampling remain poorly quantified. Using the species-rich fungal lineage Nectriaceae as a model system, we analysed 1,530 genome sequence assemblies to assess metadata completeness, sampling representation, and genome quality. One-third of the assemblies lacked essential metadata, sequencing was heavily skewed toward a few agriculturally important lineages, and sampling of many genera was limited or nonexistent. BUSCO and QUAST metrics revealed substantial heterogeneity in assembly quality, with widespread fragmentation and numerous assemblies falling outside expected quality thresholds. From 763 single-copy orthologs identified in 576 higher-quality genomes, we reconstructed a phylogenomic backbone and quantified gene- and site-level concordance across the tree. Although major clades were broadly recovered, extensive gene-tree discordance and a polyphyletic Fusarium nisikadoi species complex revealed unresolved boundaries and conflict among loci. These results show how data quality, incomplete sampling, and discordant genomic histories can constrain phylogenomic resolution, and provide a general framework for improving comparative genomic resources and large-scale evolutionary inference.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025.12.11.693719v1.full.pdf
accesso aperto
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
1.72 MB
Formato
Adobe PDF
|
1.72 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


