A large amount of gene expression data is available to bioinformaticians and biological scientists thanks to the great advances in microarray technology and in next generation sequencing techniques, e.g., RNA-Seq. Several biological databases and repositories containing raw and normalized gene expression profiles are accessible with up to date online services. The analysis of gene expression profiles from microarray/RNA-Seq experimental samples demands new efficient methods from statistics and computer science. In this report, two main types of gene expression data analysis are taken into account: 1)genes clustering; 2)experiments classification. Genes clustering is the detection of gene groups that present similar patterns. Indeed, several clustering methods can be applied to group similar genes in the gene expression experiments. The aim of experiments classification is to distinguish between two or more classes to which the different samples belong (e.g., different cell types or diseased vs healthy samples). This work first provides a general introduction of microarray and RNA-Seq technologies. Then, gene expression profiles are investigated by means of pattern recognition methods with data mining techniques such as classification and clustering. Additionally, the integrated software packages Gene Pattern, Gene Expression Logic Analyzer (GELA), TM4 software suite, and other common analysis tools are illustrated. As gene expression profiles pattern discovery and experiment classification, the software packages are tested on three real case studies: 1) Alzheimer's diseased (AD) vs healthy mice; 2) Multiple Sclerosis samples; 3) Psoriasis tissues. The performed experiments and the described techniques provide an effective overview to the field of gene expression profiles classification and clustering through pattern analysis.
Analysis of microarray and RNA-sequencing gene expression profiles through clustering and classification techniques
E Weitschek;G Fiscon;G Felici;P Bertolazzi
2014
Abstract
A large amount of gene expression data is available to bioinformaticians and biological scientists thanks to the great advances in microarray technology and in next generation sequencing techniques, e.g., RNA-Seq. Several biological databases and repositories containing raw and normalized gene expression profiles are accessible with up to date online services. The analysis of gene expression profiles from microarray/RNA-Seq experimental samples demands new efficient methods from statistics and computer science. In this report, two main types of gene expression data analysis are taken into account: 1)genes clustering; 2)experiments classification. Genes clustering is the detection of gene groups that present similar patterns. Indeed, several clustering methods can be applied to group similar genes in the gene expression experiments. The aim of experiments classification is to distinguish between two or more classes to which the different samples belong (e.g., different cell types or diseased vs healthy samples). This work first provides a general introduction of microarray and RNA-Seq technologies. Then, gene expression profiles are investigated by means of pattern recognition methods with data mining techniques such as classification and clustering. Additionally, the integrated software packages Gene Pattern, Gene Expression Logic Analyzer (GELA), TM4 software suite, and other common analysis tools are illustrated. As gene expression profiles pattern discovery and experiment classification, the software packages are tested on three real case studies: 1) Alzheimer's diseased (AD) vs healthy mice; 2) Multiple Sclerosis samples; 3) Psoriasis tissues. The performed experiments and the described techniques provide an effective overview to the field of gene expression profiles classification and clustering through pattern analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.