Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to beable to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.

A framework for reconfiguration-based fault-tolerance in distributed systems

2004

Abstract

Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to beable to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system's reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.
2004
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
System Reconfiguration
Model-based analysis
Petri Nets
File in questo prodotto:
File Dimensione Formato  
prod_43726-doc_124682.pdf

solo utenti autorizzati

Descrizione: A framework for reconfiguration-based fault-tolerance in distributed systems
Tipologia: Versione Editoriale (PDF)
Dimensione 381.81 kB
Formato Adobe PDF
381.81 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/36586
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 4
social impact