A method for fault handling is presented, designed for multiprocessor systems supporting concurrent processes co-operating through message exchange. The proposal is described in reference to a specific system, i.e. the MuTEAM prototype developed in Pisa. The requirement was that no erroneous output should be generated by the system under a single fault hypotesis. The fault-handling model adopted is based on backward error recovery. The set of all the application processes is partitioned into disjoint subset (called families), which represent the atomic unit of recovery. Recovery points are established on communications among families. A single consistent recovery line is manteined, thereby avoiding the domino effect. The model does not rely on the use of mass storage devices; rather, the recovery information pertinent to all the processes in kept in the distributed main memory of the system.

Fail-safeness in a multiprocessor system. A distributed strategy based on backward error recovery

1984

Abstract

A method for fault handling is presented, designed for multiprocessor systems supporting concurrent processes co-operating through message exchange. The proposal is described in reference to a specific system, i.e. the MuTEAM prototype developed in Pisa. The requirement was that no erroneous output should be generated by the system under a single fault hypotesis. The fault-handling model adopted is based on backward error recovery. The set of all the application processes is partitioned into disjoint subset (called families), which represent the atomic unit of recovery. Recovery points are established on communications among families. A single consistent recovery line is manteined, thereby avoiding the domino effect. The model does not rely on the use of mass storage devices; rather, the recovery information pertinent to all the processes in kept in the distributed main memory of the system.
1984
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
fail-safeness
backward error recovery
File in questo prodotto:
File Dimensione Formato  
prod_420630-doc_149157.pdf

solo utenti autorizzati

Descrizione: Fail-safeness in a multiprocessor system. A distributed strategy based on backward error recovery
Tipologia: Versione Editoriale (PDF)
Dimensione 2.2 MB
Formato Adobe PDF
2.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/375832
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact