This study presents a novel statistical and computational approach using nonparametric regression, which capitalizes on correlation structure to deal with the high-dimensional data often found in pharmacogenomics, for instance, in Crohn's inflammatory bowel disease. The empirical correlation between the test statistics, investigated via simulation, can be used as an estimate of noise. The theoretical distribution of -log10(p-value) is used to support the estimation of that optimal bandwidth for the model, which adequately controls type I error rates while maintaining reasonable power. Two proposed approaches, involving normal and Laplace-LD kernels, were evaluated by conducting a case-control study using real data from a genome-wide association study on Crohn's disease. The study successfully identified single nucleotide polymorphisms on the NOD2 gene associated with the disease. The proposed method reduces the computational burden by approximately 33% with reasonable power, allowing for a more efficient and accurate analysis of genetic variants influencing drug responses. The study contributes to the advancement of statistical methodology for analyzing complex genetic data and is of practical advantage for the development of personalized medicine.

Statistical Study Design for Analyzing Multiple Gene Loci Correlation in DNA Sequences

Andrea De Gaetano;
2023

Abstract

This study presents a novel statistical and computational approach using nonparametric regression, which capitalizes on correlation structure to deal with the high-dimensional data often found in pharmacogenomics, for instance, in Crohn's inflammatory bowel disease. The empirical correlation between the test statistics, investigated via simulation, can be used as an estimate of noise. The theoretical distribution of -log10(p-value) is used to support the estimation of that optimal bandwidth for the model, which adequately controls type I error rates while maintaining reasonable power. Two proposed approaches, involving normal and Laplace-LD kernels, were evaluated by conducting a case-control study using real data from a genome-wide association study on Crohn's disease. The study successfully identified single nucleotide polymorphisms on the NOD2 gene associated with the disease. The proposed method reduces the computational burden by approximately 33% with reasonable power, allowing for a more efficient and accurate analysis of genetic variants influencing drug responses. The study contributes to the advancement of statistical methodology for analyzing complex genetic data and is of practical advantage for the development of personalized medicine.
2023
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
Istituto per la Ricerca e l'Innovazione Biomedica -IRIB
DNA sequence; correlation structure; high-dimensional data; nonparametric regression; case-control study
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/453260
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact