Fault-tolerance in HLA-based distributed simulations

Sammanfattning: Successful integration of simulations within the Network-Based Defence (NBD), specifically use of simulations within Command and Control (C2) environments, enforces a number of requirements. Simulations must be reliable and be able to respond in a timely manner. Otherwise the commander will have no confidence in using simulation as a tool. An important aspect of these requirements is the provision of fault-tolerant simulations in which failures are detected and resolved in a consistent manner. Given the distributed nature of many military simulations systems, services for fault-tolerance in distributed simulations are desirable. The main architecture for distributed simulations within the military domain, the High Level Architecture (HLA), does not provide support for development of fault-tolerant simulations. A common approach for fault-tolerance in distributed systems is check-pointing. In this approach, states of the system are persistently stored through-out its operation. In case a failure occurs, the system is restored using a previously saved state. Given the abovementioned shortcomings of the HLA standard this thesis explores development of fault-tolerant mechanisms in the context of the HLA. More specifically, the design, implementation and evaluation of fault-tolerance mechanisms, based on check-pointing, are described and discussed.