Refine
Year of publication
- 1999 (2) (remove)
Document Type
- Preprint (2)
Language
- English (2)
Has Fulltext
- yes (2)
Faculty / Organisational entity
We present a new software architecture in which all concepts necessary to achieve fault tolerance can be added to an appli- cation automatically without any source code changes. As a case study, we consider the problem of providing a reliable service despite node failures by executing a group of replicat- ed servers. Replica creation and management as well as fail- ure detection and recovery are performed automatically by a separate fault tolerance layer (ft-layer) which is inserted be- tween the server application and the operating system kernel. The layer is invisible for the application since it provides the same functional interface as the operating system kernel, thus making the fault tolerance property of the service completely transparent for the application. A major advantage of our ar- chitecture is that the layer encapsulates both fault tolerance mechanisms and policies. This allows for maximum flexibility in the choice of appropriate methods for fault tolerance with- out any changes in the application code.
We describe a technique to make application programs fault tolerant. This techADnique is based on the concept of checkpointing from an active program to one ormore passive backup copies which serve as an abstraction of stable memory. Ifthe primary copy fails, one of the backup copies takes over and resumes processADing service requests. After each failure a new backup copy is created in order torestore the replication degree of the service. All mechanisms necessary to achieveand maintain fault tolerance can be added automatically to the code of a nonADfaulttolerant server, thus making fault tolerance completely transparent for the applicaADtion programmer.