CNR Institutional Research Information System

Machine learning is a widely used technique in structural biology, since the analysis of large conformational ensembles originated from single protein structures (e.g. derived from NMR experiments or molecular dynamics simulations) can be approached by partitioning the original dataset into sensible subsets, revealing important structural and dynamics behaviours. Clustering is a good unsupervised approach for dealing with these ensembles of structures, in order to identify stable conformations and driving characteristics shared by the different structures. A common problem of the applications that implement protein clustering is the scalability of the performance, in particular concerning the data load into memory. In this work we show how it is possible to improve the parallel performance of the GROMOS clustering algorithm by using Hadoop. The preliminary results show the validity of this approach, providing a hint for future development in this field.

Clustering protein structures with Hadoop

G Paschina;L Roverelli;D D'Agostino;F Chiappori;I Merelli

2016

Abstract

Machine learning is a widely used technique in structural biology, since the analysis of large conformational ensembles originated from single protein structures (e.g. derived from NMR experiments or molecular dynamics simulations) can be approached by partitioning the original dataset into sensible subsets, revealing important structural and dynamics behaviours. Clustering is a good unsupervised approach for dealing with these ensembles of structures, in order to identify stable conformations and driving characteristics shared by the different structures. A common problem of the applications that implement protein clustering is the scalability of the performance, in particular concerning the data load into memory. In this work we show how it is possible to improve the parallel performance of the GROMOS clustering algorithm by using Hadoop. The preliminary results show the validity of this approach, providing a hint for future development in this field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Strutture organizzative
	
				Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI -
Istituto di Tecnologie Biomediche - ITB
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Angelini C., Rancoita P., Rovetta S.
			
	Titolo del Volume
	
				Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015
			
	Titolo del convegno
	
				Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB)
			
	Da pagina
	
				141
			
	A pagina
	
				153
			
	Codice ISBN
	
				978-3-319-44332-4
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-44332-4_11
			
	URL
	
				http://link.springer.com/chapter/10.1007/978-3-319-44332-4_11
			
	Nome Editore
	
				Springer International Publishing
			
	Città Editore
	
				Switzerland
			
	Nazione Editore
	
				SVIZZERA
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				10-12/9/2015
			
	Luogo del Convegno
	
				Naples, Italy
			
	Parole chiave
	
				Hadoop Clustering
protein structures
Molecular dynamics
Data parallel
			
	Codice Scopus
	
				2-s2.0-84981290238
			
	Numero autori
	
				5
			
	Fulltext
	
				restricted
			
	Tutti gli autori
	
						Paschina, G; Roverelli, L; D'Agostino, D; Chiappori, F; Merelli, I
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Identificativo progetto
	
	Titolo Progetto
	
									Methods for Integrated analysis of Multiple Omics datasets
								
	Acronimo
	
									MIMOMICS
								
	Finanziamento
	
									FP7
								
	N. Contratto
	
									305280
								
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_357543-doc_130978.pdf solo utenti autorizzati Descrizione: Clustering Protein Structures with Hadoop Tipologia: Versione Editoriale (PDF) Dimensione 3.05 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.05 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/320436

Citazioni

ND

1

ND

social impact