Securing your system when the roof is on fire with High Availability Clusters

Oct. 17, 2016

2 min read

Shared User Prccf3f308f11d49a78bf805bc10815131

Fault tolerance systems are defined by their ability to continue operating in the event of a component failure. Essentially, fault tolerant systems need to be able to continue processing data no matter the situation (even if the system is on fire). So how do we ensure the data processing continues?

System developers must build duplicate hardware of all critical components of a system and teach the software to re-route the data flow to the alternative hardware once a failure is detected. This comes with several challenges including ensuring the software reacts only when needed and is successful in transferring the software operations to the duplicate hardware.

In a High Performance Embedded Computing (HPEC) cluster, there are compute nodes and the cluster manager, which is also known as the head node. The “head node” is the connection between HPEC cluster and the external network. It controls all other devices and eases the administration of the compute nodes. This node provisioning by the cluster manager simplifies replacing a compute node in the event of a hardware failure. This decreases the risk of any errors and allows for a confident node replacement even when the rest of the system may be failing.

While the head node offers us a secure and reliable solution during a hardware failure, the downside remains that the head node is a single point of failure for the entire system.

What is the solution? A high availability setup derived from the HPC world. Download the white paper HPEC: High Availability by Design to learn more about:

High Availability clusters
Fault Tolerance Software
HPC applications for HPEC
Cluster Managers
The STONITH process

About the Author

Tammy Carter

Senior Product Manager – OpenHPEC

Tammy Carter is the Senior Product Manager for OpenHPEC products for Curtiss-Wright Defense Solutions, based out of Ashburn Virginia. She has over 20 years of experience in designing, developing and integrating real-time embedded systems in the Defense, Communications and Medical arenas, and a M.S. in Computer Science.

Securing your system when the roof is on fire with High Availability Clusters

About the Author

Tammy Carter

Senior Product Manager – OpenHPEC

Related

Case Study: Aegis Combat System Fire-Control Hardware Cabinet

Northrop Grumman picked to provide RF transmitter for electronic warfare (EW) system on B-1B bomber

Voice Your Opinion!

To join the conversation, and become an exclusive member of Military Aerospace, create an account today!

Trending

GA-ASI, U.S. Air Force demonstrate advanced crewed-uncrewed teaming with MQ-20 Avenger and F-35 in joint autonomy exercise

FAA seeks industry input for nationwide VHF/FM radio modernization effort

Navy awards $61.3 million Northrop Grumman contract for EA-18G ALQ-218 receiver upgrades

Sponsored Picks