The advent of high-throughput technologies has accelerated biomedical research by facilitating the investigation of biological complexity at unprecedented resolution. Single-cell RNA sequencing (scRNA-seq) has transformed our ability to deconstruct cellular heterogeneity in complex diseases. Acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), for example, are characterized by extensive genetic and phenotypic heterogeneity, making diagnosis and therapy challenging. Although genetic variation is conventionally studied via DNA-based methods, the transcriptome can also be a source of genomic information. Here, we present scVAR, a computational framework that employs variational autoencoders to learn and integrate genetic variation directly from scRNA-seq data. scVAR implements a paired encoder–decoder architecture with a cross-attention–based fusion layer that combines transcriptomic and variant-derived information into a unified latent representation, enhancing the detection of subtle cellular differences under noisy and sparse conditions. We demonstrate its application to leukemia case studies, where scVAR reveals cell identities that are not discernible when transcriptomic or genomic data are analyzed separately. In the datasets analyzed in this study, scVAR identifies approximately 20%–30% more subpopulations than transcriptomic analysis alone, highlighting the benefit of integrating variant information even when coverage is limited. As expected for 3′ scRNA-seq, variant detection is restricted to captured regions, but scVAR maximizes the information available within these constraints. Overall, scVAR bridges the gap between transcriptomics and genomics, providing a broadly applicable platform for the integrative characterization of cell states and disease processes.
scVAR: integrating genomics and transcriptomics from single-cell RNA-seq —insights from leukemia case studies
Ivan Merelli
2025
Abstract
The advent of high-throughput technologies has accelerated biomedical research by facilitating the investigation of biological complexity at unprecedented resolution. Single-cell RNA sequencing (scRNA-seq) has transformed our ability to deconstruct cellular heterogeneity in complex diseases. Acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), for example, are characterized by extensive genetic and phenotypic heterogeneity, making diagnosis and therapy challenging. Although genetic variation is conventionally studied via DNA-based methods, the transcriptome can also be a source of genomic information. Here, we present scVAR, a computational framework that employs variational autoencoders to learn and integrate genetic variation directly from scRNA-seq data. scVAR implements a paired encoder–decoder architecture with a cross-attention–based fusion layer that combines transcriptomic and variant-derived information into a unified latent representation, enhancing the detection of subtle cellular differences under noisy and sparse conditions. We demonstrate its application to leukemia case studies, where scVAR reveals cell identities that are not discernible when transcriptomic or genomic data are analyzed separately. In the datasets analyzed in this study, scVAR identifies approximately 20%–30% more subpopulations than transcriptomic analysis alone, highlighting the benefit of integrating variant information even when coverage is limited. As expected for 3′ scRNA-seq, variant detection is restricted to captured regions, but scVAR maximizes the information available within these constraints. Overall, scVAR bridges the gap between transcriptomics and genomics, providing a broadly applicable platform for the integrative characterization of cell states and disease processes.| File | Dimensione | Formato | |
|---|---|---|---|
|
fgene-16-1604484.pdf
accesso aperto
Licenza:
Creative commons
Dimensione
4.73 MB
Formato
Adobe PDF
|
4.73 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


