Analysis and comparison of mutational spectra represents an important problem in molecular biology. To analyse a mutational spectra we apply an algorithm based on the SEM subclass approach (Simulation, Expectation, Maximization). The algorithm tries to classify the mutational sites according to different mutation probabilities, and each site should belong to one class. Each class is approximated by binomial distribution and thus any real mutational spectrum is regarded as a mixture of binomial distributions. The separation process runs iteratively. Each iteration includes the simulation, maximization and estimation procedures. To evaluate the quality of the classification results, the X-2 test is used. The algorithm has been checked on random spectra with preset parameters and on real mutational spectra. As has been shown, 17 out of 19 analysed real mutational spectra can be divided into two or more classes of sites, of which one contains hotspots of mutation. For the G:C --> A:T mutational spectra induced by Sn1 alkylating mutagens (11 spectral the classification accuracy was 0.95. To test different site volumes, each Sn1-induced spectrum was divided into the G --> A and C --> T spectra. The classification accuracy for these spectra was 0.96. From the analysis of classification errors it is possible to suggest that at least part of them cannot be ascribed to the faults of the algorithm but are caused by some special features of the mutagenesis itself. The results of the real data are in good relation with existing knowledge. The approach we present is an attempt to formalize the concept of a "mutational hotspot". The program implementing the SEM algorithm is available on the Web server (http://www.itba.mi.cnr.it/webmutation).

The subclass approach for mutational spectrum analysis: Application of the SEM algorithm

Milanesi L;
1998

Abstract

Analysis and comparison of mutational spectra represents an important problem in molecular biology. To analyse a mutational spectra we apply an algorithm based on the SEM subclass approach (Simulation, Expectation, Maximization). The algorithm tries to classify the mutational sites according to different mutation probabilities, and each site should belong to one class. Each class is approximated by binomial distribution and thus any real mutational spectrum is regarded as a mixture of binomial distributions. The separation process runs iteratively. Each iteration includes the simulation, maximization and estimation procedures. To evaluate the quality of the classification results, the X-2 test is used. The algorithm has been checked on random spectra with preset parameters and on real mutational spectra. As has been shown, 17 out of 19 analysed real mutational spectra can be divided into two or more classes of sites, of which one contains hotspots of mutation. For the G:C --> A:T mutational spectra induced by Sn1 alkylating mutagens (11 spectral the classification accuracy was 0.95. To test different site volumes, each Sn1-induced spectrum was divided into the G --> A and C --> T spectra. The classification accuracy for these spectra was 0.96. From the analysis of classification errors it is possible to suggest that at least part of them cannot be ascribed to the faults of the algorithm but are caused by some special features of the mutagenesis itself. The results of the real data are in good relation with existing knowledge. The approach we present is an attempt to formalize the concept of a "mutational hotspot". The program implementing the SEM algorithm is available on the Web server (http://www.itba.mi.cnr.it/webmutation).
1998
Istituto di Tecnologie Biomediche - ITB
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/257227
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 38
social impact