The comprehension of the mechanism of gene expression is one of the more relevant fields associated with genome analysis. The experimental sequencing activities evidence the existence of control domains like promoter, enhancer and silencer. The promoter signals are well documented in various databases. Inside these regions are distributed a number of characteristic sequences involved in the interactions between the DNA and the transcriptional complex. Different approaches have been applied in order to detect these features in genomic sequences. In the present paper we investigate the statistical properties of a subset of the Eukaryotic Promoter Database (EPD using entropies as a measure of complexity. An efficient computational approach, suitable for a coarse-grain parallelization, is used for a fast processing of the symbolic sequence. Results show a large deviation from the randomness for the distribution of words in the EPD database.

Entropies and word frequencies in the Eukaryotic Promoter Database

P Arrigo;A Corana;L Milanesi
1997

Abstract

The comprehension of the mechanism of gene expression is one of the more relevant fields associated with genome analysis. The experimental sequencing activities evidence the existence of control domains like promoter, enhancer and silencer. The promoter signals are well documented in various databases. Inside these regions are distributed a number of characteristic sequences involved in the interactions between the DNA and the transcriptional complex. Different approaches have been applied in order to detect these features in genomic sequences. In the present paper we investigate the statistical properties of a subset of the Eukaryotic Promoter Database (EPD using entropies as a measure of complexity. An efficient computational approach, suitable for a coarse-grain parallelization, is used for a fast processing of the symbolic sequence. Results show a large deviation from the randomness for the distribution of words in the EPD database.
1997
Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
Istituto per lo Studio delle Macromolecole - ISMAC - Sede Milano
Istituto di Tecnologie Biomediche - ITB
3-8265-2657-0
genome sequencing; Eukariotic Promoter Database; entropies; word frequencies; coarse-grain parallelization; fast sequence analysis
File in questo prodotto:
File Dimensione Formato  
prod_355598-doc_115524.pdf

accesso aperto

Descrizione: Entropies and word frequencies in the Eukaryotic Promoter Database
Tipologia: Versione Editoriale (PDF)
Dimensione 2.31 MB
Formato Adobe PDF
2.31 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/317427
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact