Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end, dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be still useful for the system and the other is to avoid to keep failed units whose usage may lead to a premature failure of the system. The proposed fault treatment approach is integrated with a mechanism for dynamic error processing in a complete fault tolerance strategy. Reliability analyses based on the Markov approach and an efficiency evaluation performed by simulation are carried out.

Efficient fault tolerance: an approach to deal with transient faults in multiprocessor architectures

Chiaradonna S;Di Giandomenico F
1994

Abstract

Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end, dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be still useful for the system and the other is to avoid to keep failed units whose usage may lead to a premature failure of the system. The proposed fault treatment approach is integrated with a mechanism for dynamic error processing in a complete fault tolerance strategy. Reliability analyses based on the Markov approach and an efficiency evaluation performed by simulation are carried out.
1994
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Fault tolerance
Software/program verification
File in questo prodotto:
File Dimensione Formato  
prod_409282-doc_143850.pdf

solo utenti autorizzati

Descrizione: Efficient fault tolerance: an approach to deal with transient faults in multiprocessor architectures
Tipologia: Versione Editoriale (PDF)
Dimensione 1.61 MB
Formato Adobe PDF
1.61 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/360140
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact