Dynamic error processing approaches use an amount of redundancy dependent on the occurrence or absence of faults thus resulting in a low overhead in case of no faults. They are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and tries to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be useful for the system and the other is to avoid to keep failed units in the system whose usage may lead to a premature failure of the system itself. The reliability loss related to the latent faults' phenomenon, introduced by the diagnosis measures, is then evaluated through a Markov approach. Finally a mechanism for dynamic error processing able to address the latent faults' phenomenon and the proposed fault treatment approach are integrated in a complete fault tolerance strategy. The paper also contains an efficiency evaluation, performed by simulation, of the proposed fault tolerance strategy compared with more classical ones.
A fault treatment approach to support dynamic redundancy in multiprocessor architectures
Chiaradonna S;Di Giandomenico F
1994
Abstract
Dynamic error processing approaches use an amount of redundancy dependent on the occurrence or absence of faults thus resulting in a low overhead in case of no faults. They are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To this end dynamic error processing must be integrated with a fault treatment approach aiming at optimising resource utilisation. In this paper we propose a diagnosis approach that, accounting for transient faults, tries to remove units very cautiously and tries to balance between two conflicting requirements. The first is to avoid the removal of units that have experienced transient faults and can be useful for the system and the other is to avoid to keep failed units in the system whose usage may lead to a premature failure of the system itself. The reliability loss related to the latent faults' phenomenon, introduced by the diagnosis measures, is then evaluated through a Markov approach. Finally a mechanism for dynamic error processing able to address the latent faults' phenomenon and the proposed fault treatment approach are integrated in a complete fault tolerance strategy. The paper also contains an efficiency evaluation, performed by simulation, of the proposed fault tolerance strategy compared with more classical ones.File | Dimensione | Formato | |
---|---|---|---|
prod_408640-doc_143460.pdf
accesso aperto
Descrizione: A fault treatment approach to support dynamic redundancy in multiprocessor architectures
Dimensione
3.03 MB
Formato
Adobe PDF
|
3.03 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.