The APEmille, the third evolution of the APEfamily ofSIMD machines, is structured as a three-dimensional array of processors. In its largest configuration, the number of processors is 4096. Ics typical application range coversmassive comptuations (e.g., those neededto solve some problems in phisics research), which may requireas manyas 1017floating point operaiions. Given the long rimeneeded lo complete sucb. jobs, the machine shouldbe able to toleraie the occurrence of multiple jaults during che job execution. To this purpose, self-diagnosis capabilities have been incorporatedin its design, using an approach inspired by a family of algorithms recently introduced to perform the system-level diagnosis of regular architectures. Themachineispartitioned into three subsystems, each structuredas a threedimensionai array, which are diagnosed separately using s/ightly dlfferen: variants of the same diagnosis algorithm. The system units are tested by means of comparisons, either concurrently with che job execusion or during special diagnosis sessions. The strategy io test the units and the diagnosis algorithms are described, and the diagnosis correctess and completeness are evaluated both theoretically and experimentaliy.

Self-diagnosis of Apemille

1996

Abstract

The APEmille, the third evolution of the APEfamily ofSIMD machines, is structured as a three-dimensional array of processors. In its largest configuration, the number of processors is 4096. Ics typical application range coversmassive comptuations (e.g., those neededto solve some problems in phisics research), which may requireas manyas 1017floating point operaiions. Given the long rimeneeded lo complete sucb. jobs, the machine shouldbe able to toleraie the occurrence of multiple jaults during che job execution. To this purpose, self-diagnosis capabilities have been incorporatedin its design, using an approach inspired by a family of algorithms recently introduced to perform the system-level diagnosis of regular architectures. Themachineispartitioned into three subsystems, each structuredas a threedimensionai array, which are diagnosed separately using s/ightly dlfferen: variants of the same diagnosis algorithm. The system units are tested by means of comparisons, either concurrently with che job execusion or during special diagnosis sessions. The strategy io test the units and the diagnosis algorithms are described, and the diagnosis correctess and completeness are evaluated both theoretically and experimentaliy.
1996
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Fault-tolerance
System-level diagnosis
Self-diagnosis
SIMD machines
Grid interconnection
File in questo prodotto:
File Dimensione Formato  
prod_411377-doc_144851.pdf

solo utenti autorizzati

Descrizione: Self-diagnosis of Apemille
Tipologia: Versione Editoriale (PDF)
Dimensione 2.04 MB
Formato Adobe PDF
2.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/366655
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact