Autism Spectrum Disorder: Linked-Read Sequencing Reveals New and Undetected Variants

Cupaioli, Fa; Mosca, E; Di Nanni, Noemi; Pelucchi, P; Milanesi, L; Raggi, Me; Villa, L; Mezzelani, A

Introduction Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by limited social interaction, communication impairments, restricted interests and repetitive behaviors. It occurs in pediatric age, within the first 3 years of life, and lasts for a lifetime with severe consequences for the individual and his/her family and with very high social costs. In the last decades, ASD has increased dramatically reaching the prevalence of 1:68. Although genetics play a key role in ASD, its etiology is complex and recent studies hypothesize a gene-environment interaction. Indeed, causative or predisposing genetic variants have been detected only in 30% of patients and thousands of genes are involved. Here, for the first time, linked-read whole genome sequencing of ASD patients is used to access disease associated regions unmappable by short-reads NGS. Materials and Methods Ten children suffering from ASD (diagnosed by ADOS 2 and ADI-R), including 3 couples of affected siblings, were enrolled, peripheral blood collected and HMW DNA isolated. All the procedures had been approved by Ethical Committee as well as clinical data collected according to current privacy laws. HMW DNA, 50 kb in size or greater, was submitted to 10Xgenomics microfluidics partitioning and barcoding and then to Illumina library preparation and whole genome-next generation sequencing. Data were analyzed through 10x Long Ranger pipelines to find SNVs, in/del and larger structural variants in comparison to 1,000 genomes, genome aggregation database and NHLBI-ESP populations. Genes affected by variants were compared with those already associated with ASD: SFARI database, large-scale sequencing studies, bioinformatics predictions. Results The linked-read sequencing approach successfully produced sequences up to 9mln bps in length haplo-blocks. This technique allowed to detect variants in whole genome of ASD patients. We found genes affected by mutations already listed in SFARI and previously predicted by bioinformatics analysis and discovered hundreds of new variants. Among these latter, we focused on those homozygous in the 3 couples of siblings and found that some of them are related to neuro-development and -physiology or to xenobiotic metabolism. Conclusion For the first time, we applied the linked-read sequencing technology for studying ASD genomics. This powerful approach allowed us deciphering the genomics heterogeneity in ADS and highlighting variations otherwise undetectable by classic NGS. Indeed, we identified homozygous variants within genes or regulatory regions common to 3 couples of affected siblings. Further studies will be performed to validate the new ASD variations by PCR amplification and Sanger sequencing in a large size of ASD samples and in public data; then, the new biomarkers will be used to stratify ASD patients population. Data integration will be also performed to identify pathways and gene networks involved in the disorder to understand disease mechanisms and design target-driven personalized treatment. Acknowledgements: EU project GEMMA (grant agreement No 825033), EPTRI and CNRBiOmics.