For several years now, there has been an exponential growth of the amount of life science data (e.g., sequenced complete genomes, 3D structures, DNA chips, Mass spectroscopy data) generated by high throughput experiments. Carrying out analyses of complex, voluminous, and heterogeneous data and guiding the analysis of data using a statistical and mathematical sound methodology is thus of paramount importance. Here we make and justify the observation that experimental replicates and phylogenetic data may be combined to strength the evidences on identifying transcriptional motifs, which seems to be quite difficult using other currently used methods. We present a case study considering sequences and microarray data from fungi species. Although we show that our methodology can result of immediate practical utility to bioinformaticians and biologists for annotating new genomes, here the focus is also on discussing the dependent interesting mathematical problems that high throughput data integration poses.

Combining experimental evidences from replicates and nearby species data for annotating novel genomes.

C Angelini;I De Feis;
2008

Abstract

For several years now, there has been an exponential growth of the amount of life science data (e.g., sequenced complete genomes, 3D structures, DNA chips, Mass spectroscopy data) generated by high throughput experiments. Carrying out analyses of complex, voluminous, and heterogeneous data and guiding the analysis of data using a statistical and mathematical sound methodology is thus of paramount importance. Here we make and justify the observation that experimental replicates and phylogenetic data may be combined to strength the evidences on identifying transcriptional motifs, which seems to be quite difficult using other currently used methods. We present a case study considering sequences and microarray data from fungi species. Although we show that our methodology can result of immediate practical utility to bioinformaticians and biologists for annotating new genomes, here the focus is also on discussing the dependent interesting mathematical problems that high throughput data integration poses.
2008
Istituto Applicazioni del Calcolo ''Mauro Picone''
978-0-7354-0552-3
Bayesian variable selection
MCMC algorithm
Microarray data analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/66114
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact