One of the most relevant task in functional genomics is the discovery of the syntactical rules that drive the gene expression. Many tools based on matemathical and biophysical approaches was applied, these methods are able to detect the binding sites of DNA and transcriptional factors. More difficult is the discovery of functional correaltions between these features. Recently some authors consider the genome like a linguistic text an they applied methods derived from computational linguistic to the analysis of this kind of text. The main difference between linguistic and biolinguistic is the availability of dictionaries and grammatical rules in latter field, insted this knowledge is relatively scarce in biolinguistic. The first step for more complex analysis is the capability to recognize potential functional word along the linear genomic sequence, in other word we need to reduce the sequence redundancy. In this work a new combined methodology is applied to process a subset of g-protein coupled receptors in order to evaluate the possibility to detect nucleotide domains and test their relations with structural or functional region of the corresponding protein. The CDS can be considered like a 'noisless' text then is more easy to evaluate the correlations between features on genomic sequence and proteins. The method combine the potentiality of an unsupervided neural clustering and informational and statistical parameters in order to extract and select domains on nucleotide sequence, their translation in the corresponding peptide and their positioning along the protein sequence. The results obtained on this dataset evidence the a good correlation between the features selected on CDS and functional regions on g-protein coupled membrane receptor. The preprint of the paper is available at the following address: http://www.biocomp.unibo.it/piero/arrigo/title.html
Application of Conceptual Clustering to the Recognition of the Hierarchical Structure of Transcriptional Control Domains
arrigo P
1998
Abstract
One of the most relevant task in functional genomics is the discovery of the syntactical rules that drive the gene expression. Many tools based on matemathical and biophysical approaches was applied, these methods are able to detect the binding sites of DNA and transcriptional factors. More difficult is the discovery of functional correaltions between these features. Recently some authors consider the genome like a linguistic text an they applied methods derived from computational linguistic to the analysis of this kind of text. The main difference between linguistic and biolinguistic is the availability of dictionaries and grammatical rules in latter field, insted this knowledge is relatively scarce in biolinguistic. The first step for more complex analysis is the capability to recognize potential functional word along the linear genomic sequence, in other word we need to reduce the sequence redundancy. In this work a new combined methodology is applied to process a subset of g-protein coupled receptors in order to evaluate the possibility to detect nucleotide domains and test their relations with structural or functional region of the corresponding protein. The CDS can be considered like a 'noisless' text then is more easy to evaluate the correlations between features on genomic sequence and proteins. The method combine the potentiality of an unsupervided neural clustering and informational and statistical parameters in order to extract and select domains on nucleotide sequence, their translation in the corresponding peptide and their positioning along the protein sequence. The results obtained on this dataset evidence the a good correlation between the features selected on CDS and functional regions on g-protein coupled membrane receptor. The preprint of the paper is available at the following address: http://www.biocomp.unibo.it/piero/arrigo/title.htmlFile | Dimensione | Formato | |
---|---|---|---|
prod_248951-doc_66183.pdf
accesso aperto
Descrizione: Modeling and Simulation of Gene Regulation and Metabolic Pathways
Dimensione
384.03 kB
Formato
Adobe PDF
|
384.03 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.