In scientific computing, systems often manage computations that require continuous acquisition of of satellite data and the management of large databases, as well as the execution of analysis software and simulation models (e.g. Monte Carlo or molecular dynamics cell simulations) which may require several weeks of continuous run. These systems, consequently, should ensure the continuity of operation even in case of serious faults. HAVmS (High Availability Virtual machine System) is a highly available, "fault tolerant" system with zero downtime in case of fault. It is based on the use of Virtual Machines and implemented by two servers with similar characteristics. HAVmS, thanks to the developed software solutions, is unique in its kind since it automatically failbacks once faults have been fixed. The system has been designed to be used both with professional or inexpensive hardware and supports the simultaneous execution of multiple services such as: web, mail, computing and administrative services, uninterrupted computing, data base management. Finally the system is cost effective adopting exclusively open source solutions, is easily manageable and for general use.
HAVmS Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime
Carlo Gaibisso;
2014
Abstract
In scientific computing, systems often manage computations that require continuous acquisition of of satellite data and the management of large databases, as well as the execution of analysis software and simulation models (e.g. Monte Carlo or molecular dynamics cell simulations) which may require several weeks of continuous run. These systems, consequently, should ensure the continuity of operation even in case of serious faults. HAVmS (High Availability Virtual machine System) is a highly available, "fault tolerant" system with zero downtime in case of fault. It is based on the use of Virtual Machines and implemented by two servers with similar characteristics. HAVmS, thanks to the developed software solutions, is unique in its kind since it automatically failbacks once faults have been fixed. The system has been designed to be used both with professional or inexpensive hardware and supports the simultaneous execution of multiple services such as: web, mail, computing and administrative services, uninterrupted computing, data base management. Finally the system is cost effective adopting exclusively open source solutions, is easily manageable and for general use.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.