Motivation: The huge amount of data produced by genome sequencing projects has allowed to highlight information on the genetic content of many organisms in the form of lists of genes they can express. Although necessary, this knowledge is not sufficient to understand the mechanisms regulating many events underlying life (i.e., cell growth, differentiation, development). In this sense, it is crucial to decipher the control mechanisms ruling the expression of genome in time and space. To address this problem we have developed a bioinformatic approach based on the use of data mining techniques to detect frequent association of regulatory motifs in untranslated regions (UTRs) of transcripts in Metazoa. The idea is that of mining frequent combinations of translation regulatory motifs, since their significant cooccurrences could reveal functional relationships important for the posttranscriptional control of genome expression. Methods: The experimentation has been carried out using as a test case UTRs sequences extracted from the MitoRes database, annotated with information available in UTRef and UTRsite databases and collected in a relational database named UTRminer, which supports the pattern mining procedure. The mining approach is two-stepped: first, patterns of regulatory motifs are extracted and annotated in the form of sequences of motifs with information on their sequence location and mutual distances (spacers), then the mutual distances are discretized and the most frequent sequences of motifs and spacers are discovered by means of an algorithm for sequence pattern mining. Frequent sequences have a support greater than a user-specified threshold and the procedure for the generation of frequent sequences is guaranteed to be complete. Results: The UTR sequences analysed concern ten different species. The total number of analysed sequences is 3896, among which 1944 5'UTRs and 1952 3'UTRs. Frequent motifs patterns, generated at first step, have a complexity ranging from 2 to 3 (number of distinct motifs detected on the same UTR) in 5'UTRs and from 2 to 5 in 3'UTRs. Preliminary results based on the observations and comparative analysis of discovered sequential pattern add new insights to our knowledge about posttranscriptional regulatory mechanisms controlling genome expression, while demonstrating the effectiveness of the bioinformatics approach presented in supporting discovery of motifs patterns.

Computational annotation of UTR cis-regulatory modules through frequent pattern mining

Grillo Giorgio;D'Elia Domenica
2008

Abstract

Motivation: The huge amount of data produced by genome sequencing projects has allowed to highlight information on the genetic content of many organisms in the form of lists of genes they can express. Although necessary, this knowledge is not sufficient to understand the mechanisms regulating many events underlying life (i.e., cell growth, differentiation, development). In this sense, it is crucial to decipher the control mechanisms ruling the expression of genome in time and space. To address this problem we have developed a bioinformatic approach based on the use of data mining techniques to detect frequent association of regulatory motifs in untranslated regions (UTRs) of transcripts in Metazoa. The idea is that of mining frequent combinations of translation regulatory motifs, since their significant cooccurrences could reveal functional relationships important for the posttranscriptional control of genome expression. Methods: The experimentation has been carried out using as a test case UTRs sequences extracted from the MitoRes database, annotated with information available in UTRef and UTRsite databases and collected in a relational database named UTRminer, which supports the pattern mining procedure. The mining approach is two-stepped: first, patterns of regulatory motifs are extracted and annotated in the form of sequences of motifs with information on their sequence location and mutual distances (spacers), then the mutual distances are discretized and the most frequent sequences of motifs and spacers are discovered by means of an algorithm for sequence pattern mining. Frequent sequences have a support greater than a user-specified threshold and the procedure for the generation of frequent sequences is guaranteed to be complete. Results: The UTR sequences analysed concern ten different species. The total number of analysed sequences is 3896, among which 1944 5'UTRs and 1952 3'UTRs. Frequent motifs patterns, generated at first step, have a complexity ranging from 2 to 3 (number of distinct motifs detected on the same UTR) in 5'UTRs and from 2 to 5 in 3'UTRs. Preliminary results based on the observations and comparative analysis of discovered sequential pattern add new insights to our knowledge about posttranscriptional regulatory mechanisms controlling genome expression, while demonstrating the effectiveness of the bioinformatics approach presented in supporting discovery of motifs patterns.
2008
Istituto di Tecnologie Biomediche - ITB
Bioinformatics
Frequent Pattern Mining
UTR
Regulatory Motifs
Translation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/170504
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact