Protein-structure comparison (PSC) is an essential component of biomedical research as it impacts on, e.g., drug design, molecular docking, protein folding and structure prediction algorithms as well as being essential to the assessment of these predictions. Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity that naturally lead to the multicriteria PSC (MC-PSC) problem. Protein (Structure) Comparison, Knowledge, Similarity, and Information (ProCKSI) (www.procksi.org) provides algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods. Current MC-PSC works well formoderately sized datasets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned abovewould benefit fromthe ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real time, a capacity beyond our current technology. In this paper, we take a key step into that direction bymeans of a high-throughput distributed reimplementation of ProCKSI for very large datasets. The core of the proposed framework lies in the design of an innovative distributed algorithm that runs on each compute node in a cluster/grid environment to perform structure comparison of a given subset of input structures using some of the most popular PSC methods [e.g., universal similarity metric (USM), maximum contact map overlap (MaxCMO), fast alignment and search tool (FAST), distance alignment (DaliLite), combinatorial extension (CE), template modeling alignment (TMAlign)]. We follow this with a procedure of distributed consensus building. Thus, the new algorithms proposed here achieve ProCKSI's similarity assessment quality but with a fraction of the time required by it. Our results show that the proposed distributed method can be used efficiently to compare: 1) a particular protein against a very large protein structures dataset (target-against-all comparison), and 2) a particular very large-scale dataset against itself or against another very largescale dataset (all-against-all comparison). We conclude the paper by enumerating some of the outstanding challenges for real-time MC-PSC.

Towards High-Throughput, Multi-Criteria Protein Structure Comparison and Analysis

Gianluigi Folino;
2010

Abstract

Protein-structure comparison (PSC) is an essential component of biomedical research as it impacts on, e.g., drug design, molecular docking, protein folding and structure prediction algorithms as well as being essential to the assessment of these predictions. Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity that naturally lead to the multicriteria PSC (MC-PSC) problem. Protein (Structure) Comparison, Knowledge, Similarity, and Information (ProCKSI) (www.procksi.org) provides algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods. Current MC-PSC works well formoderately sized datasets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned abovewould benefit fromthe ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real time, a capacity beyond our current technology. In this paper, we take a key step into that direction bymeans of a high-throughput distributed reimplementation of ProCKSI for very large datasets. The core of the proposed framework lies in the design of an innovative distributed algorithm that runs on each compute node in a cluster/grid environment to perform structure comparison of a given subset of input structures using some of the most popular PSC methods [e.g., universal similarity metric (USM), maximum contact map overlap (MaxCMO), fast alignment and search tool (FAST), distance alignment (DaliLite), combinatorial extension (CE), template modeling alignment (TMAlign)]. We follow this with a procedure of distributed consensus building. Thus, the new algorithms proposed here achieve ProCKSI's similarity assessment quality but with a fraction of the time required by it. Our results show that the proposed distributed method can be used efficiently to compare: 1) a particular protein against a very large protein structures dataset (target-against-all comparison), and 2) a particular very large-scale dataset against itself or against another very largescale dataset (all-against-all comparison). We conclude the paper by enumerating some of the outstanding challenges for real-time MC-PSC.
2010
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/118999
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact