Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fault-tolerance in HLA-based distributed simulations
KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
2006 (English)Licentiate thesis, comprehensive summary (Other scientific)
Abstract [en]

Successful integration of simulations within the Network-Based Defence (NBD), specifically use of simulations within Command and Control (C2) environments, enforces a number of requirements. Simulations must be reliable and be able to respond in a timely manner. Otherwise the commander will have no confidence in using simulation as a tool. An important aspect of these requirements is the provision of fault-tolerant simulations in which failures are detected and resolved in a consistent manner. Given the distributed nature of many military simulations systems, services for fault-tolerance in distributed simulations are desirable. The main architecture for distributed simulations within the military domain, the High Level Architecture (HLA), does not provide support for development of fault-tolerant simulations.

A common approach for fault-tolerance in distributed systems is check-pointing. In this approach, states of the system are persistently stored through-out its operation. In case a failure occurs, the system is restored using a previously saved state. Given the abovementioned shortcomings of the HLA standard this thesis explores development of fault-tolerant mechanisms in the context of the HLA. More specifically, the design, implementation and evaluation of fault-tolerance mechanisms, based on check-pointing, are described and discussed.

Place, publisher, year, edition, pages
Stockholm: KTH , 2006. , viii, 40 p.
Series
Trita-ICT-ECS AVH, ISSN 1653-6363 ; 06:03
Keyword [en]
HLA, fault-tolerance, distributed simulations, federate, federation
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-4063OAI: oai:DiVA.org:kth-4063DiVA: diva2:10600
Presentation
2006-06-13, Sal D, KTH-Forum, Isafjordsgatan 39, plan 4, Kista, 14:15
Opponent
Supervisors
Note
QC 20101111Available from: 2006-06-25 Created: 2006-06-25 Last updated: 2010-11-11Bibliographically approved
List of papers
1. NetSim: A Network-based Environment for Modelling and Simulation
Open this publication in new window or tab >>NetSim: A Network-based Environment for Modelling and Simulation
2004 (English)In: Proceedings of SimSafe Conference, 2004Conference paper, Published paper (Refereed)
Abstract [en]

Modelling and Simulation (M&S) is a powerful tool that is used to support training and analysis of military operations, development of military concepts and gradually, it is becoming an integral part of modern C3I systems. As the web has evolved, new ways of carrying out modelling and simulation and realizing C3I systems have emerged. These achievements address some of the research issues considered vital for future development of the M&S/C3I domain. Firstly, web related technologies provide means of overcoming the interoperability barriers, for example through standardized data exchange formats (such as XML), platform independent software (for example Java) and shared knowledge of a domain (semantics). Secondly, networked environments offer ways of setting up virtual organisations, sharing common goals and interests, to efficiently collaborate in problem solving. Finally, computer networks promote efficient sharing of resources, which for example could increase the reuse of existing models or utilize idle processing capacity of computers.

At the Swedish Defence Research Agency (FOI) there is ongoing research, targeting the role of network/web based technologies in M&S, to support defence communities in their work. Our vision comprises an environment supporting the entire M&S-process, including conceptualization, scenario definition, design, development and execution. All these tasks should be maintained by a framework for collaboration, which lets users; developers, analysts, administrators etc, jointly work on a project. During the first phase of this research focus has been on efficient resource sharing and means of collaboration. Through experimental research and implementation of a prototype (NetSim), methods and techniques have been identified to form a framework for collaborative work, resource management and distributed execution.

Following current trends within development of networked applications, decentralized (Peer-to-Peer) solutions were of primary focus when implementing the prototype. Based on the open source Peer-to-Peer platform JXTA, two distinct components of our envisioned system were implemented, namely; a decentralized resource management system deploying a network of workstation for execution of HLA federations and a collaborative environment for joint modelling of federations. Our results show that the utilization of Peer-to-Peer concepts for resource sharing and collaboration are favourable in terms of scalability, robustness and fault tolerance. The technology allows formation of virtual organisations without the need of intermediate resources like centralized and powerful servers. However, some aspects of our implementation temporarily rely on central control, thereby diminishing the benefits of the Peer-to-Peer paradigm. Future research will therefore address distributed algorithms for synchronisation of collaborative work and a more flexible and extendable approach to resource management. Furthermore, as many studies have pointed out before, one of the great challenges of any type of Peer-to-Peer system is discovery and matching of resources. This is an area that deserves great attention when planning for the next generation C3I/M&S tools.

National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-8512 (URN)
Conference
NATO Modeling and Simulation Group, Symposium on C3I and M&S Interoperability, Antalya, Turkey, October 2003.
Note
QC 20100830Available from: 2008-05-26 Created: 2008-05-26 Last updated: 2010-12-22Bibliographically approved
2. Peer-to-peer-based resource management in support of HLA-based distributed simulations
Open this publication in new window or tab >>Peer-to-peer-based resource management in support of HLA-based distributed simulations
2004 (English)In: Simulation (San Diego, Calif.), ISSN 0037-5497, E-ISSN 1741-3133, Vol. 80, no 4-5, 181-190 p.Article in journal (Refereed) Published
Abstract [en]

In recent years, the concept of peer-to-peer computing has gained renewed interest for sharing resources within and between organizations or individuals. This article describes a decentralized resource management system (DRMS) that uses a network of workstations for the execution and storage of high-level architecture (HLA) federations/federates in a peer-to-peer environment. The implementation of DRMS is based on the open-source project JXTA, which represents an attempt to standardize the peer-to-peer domain. DRMS is part of a Web-based simulation environment supporting collaborative design, development, and execution of HLA federations. This study evaluates the possibilities of using peer-to-peer technology for increasing the reuse and availability of simulation components within the defense modelling and simulation community. More specifically, it addresses the necessary adjustments of simulation components to conform to the requirements of the DRMS and shows that JXTA could provide the foundation for a distributed system that increases the possibilities for reusing simulation components.

Keyword
peer-to-peer environment, JXTA, high-level architecture (HLA), federation management, resource management
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-26062 (URN)10.1177/0037549704045050 (DOI)000222622100002 ()2-s2.0-3342915765 (Scopus ID)
Note
QC 20101111Available from: 2010-11-11 Created: 2010-11-11 Last updated: 2017-12-12Bibliographically approved
3. A framework for fault-tolerance in HLA-based distributed simulations
Open this publication in new window or tab >>A framework for fault-tolerance in HLA-based distributed simulations
2005 (English)In: Proceedings of the 2005 Winter Simulation Conference, 2005, 1182-1189 p.Conference paper, Published paper (Refereed)
Abstract [en]

The widespread use of simulation in future military systems depends, among others, on the degree of reuse and availability of simulation models. Simulation support in such systems must also cope with failure in software or hardware. Research in fault-tolerant distributed simulation, especially in the context of the High Level Architecture (HLA), has been quite sparse. Nor does the HLA standard itself cover fault-tolerance extensively. This paper describes a framework, named Distributed Resource Management System (DRMS), for robust execution of federations. The implementation of the framework is based on Web Services and Semantic Web technology, and provides fundamental services and a consistent mechanism for description of resources managed by the environment. To evaluate the proposed framework, a federation has been developed that utilizes time-warp mechanism for synchronization. In this paper, we describe our approach to fault tolerance and give an example to illustrate how DRMS behaves when it faces faulty federates.

Keyword
Computer architecture, Computer simulation, Computer software, Distributed computer systems, Military operations, Robustness (control systems), Semantics, World Wide Web, Distributed Resource Management System (DRMS), High Level Architecture (HLA), Semantic Web technology, Time-warp mechanism
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-8514 (URN)10.1109/WSC.2005.1574375 (DOI)000236253401068 ()2-s2.0-33751568837 (Scopus ID)0-7803-9519-0 (ISBN)
Note
QC 20100830Available from: 2008-05-26 Created: 2008-05-26 Last updated: 2010-11-11Bibliographically approved
4. Evaluation of a Fault-Tolerance Mechanism for HLA-Based Distributed Simulations
Open this publication in new window or tab >>Evaluation of a Fault-Tolerance Mechanism for HLA-Based Distributed Simulations
2006 (English)In: 20th Workshop on Principles of Advanced and Distributed Simulation, PADS 2006: Singapore; 24 May 2006 through 26 May 2006, 2006, 175-182 p.Conference paper, Published paper (Refereed)
Abstract [en]

Successful integration of Modeling and Simulation (M&S) in the future Network-Based Defence (NBD) depends, among other things, on providing fault-tolerant (FT) distributed simulations. This paper describes a framework, named Distributed Resource Management System (DRMS), for robust execution of simulations based on the High Level Architecture. More specifically, a mechanism for FT in simulations synchronized according to the time-warp protocol is presented and evaluated. The results show that utilization of the FT mechanism, in a worst-case scenario, increases the total number of generated messages by 68% if one fault occurs. When the FT mechanism is not utilized, the same scenario shows an increase in total number of generated messages by 90%. Considering the worst-case scenario a plausible requirement on an M&S infrastructure of the NBD, the overhead caused by the FT mechanism is considered acceptable.

Keyword
Computer architecture, Computer simulation, Distributed computer systems, Information management, Network protocols, Resource allocation, Distributed Resource Management System (DRMS), Network-Based Defence (NBD), Time-warp protocols
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-24253 (URN)10.1109/PADS.2006.18 (DOI)2-s2.0-33751436352 (Scopus ID)0-7695-2587-3 (ISBN)
Note
QC 20100830Available from: 2010-08-30 Created: 2010-08-30 Last updated: 2010-11-11Bibliographically approved

Open Access in DiVA

fulltext(2213 kB)1971 downloads
File information
File name FULLTEXT01.pdfFile size 2213 kBChecksum MD5
5974caffee8a2d5e7f5853685ce1fa5d1ba2dde855c5138476e9d8bc92050f2347cafe9c
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Eklöf, Martin
By organisation
Electronic, Computer and Software Systems, ECS
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 1971 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 550 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf