By Courtney Howard
Technology focus -- High-performance computing (HPC) is well suited, and steadily being adopted, for myriad defense and aerospace applications. HPC currently is playing a role in: training and simulation; on-board systems for navigation, defense, and attack; and command, control, communications, intelligence, computers, surveillance, and reconnaissance (C4ISR), explains Bill Mannel, vice president of product marketing of SGI in Fremont, Calif.
“Pilots ‘fly’ a simulator before they take control of the actual aircraft and tank gunnery crews shoot at simulated enemy armored fighting vehicles (AFVs),” Mannel describes. “Weapons systems are increasingly complex and require lots of interaction with sensors as well as a ‘human in the loop.’
Flight control systems have been digital now for a couple of decades. Intelligence data from multiple spectra and media (radar, visual, signals, emissions, infrared, ultraviolet, etc.) is exploding in volume, requiring HPC systems to analyze quickly and provide decomposition in hours vs. days. Many times the data is gathered in real-time and analyzed on the fly, providing data immediately to warfighters in the middle of missions. Encryption/decryption is frequently carried out today on HPC-class machines, as well.”
The reason HPC is used in so many environments and applications is that it brings the capability to do fast computing with commodity or commercial off-the-shelf (COTS) hardware, says Gilad Shainer, senior director of HPC and technical computing at Mellanox Technologies in Sunnyvale, Calif. “You can put servers with memory and CPUs, connect them together, and build a very large-scale supercomputer to do very sophisticated simulations.”
In such an environment, Shainer credits the connectivity between the compute elements, between the CPUs or GPUs, for setting the efficiencies, speed, and compute power of the system. “Providing the capability to send a lot of data between those compute systems, and do it fast and without too much load on the CPUs (central processing units), is the secret sauce behind clusters,” he says.
GPUs (graphics processing units) and GPGPUs (general-purpose graphics processing units) are making HPC more accessible to more people, says Sumit Gupta, senior marketing manager for the Tesla GPU Computing HPC business unit at Nvidia in Santa Clara, Calif. “Every professor in every university around the world can buy a few GPU-based servers and get the performance to do the science that was otherwise restricted to people who could afford to buy big supercomputers. It’s the democratization of HPC -- that’s what GPUs are really enabling.”
Nvidia Tesla GPUs is employed not only in three of the top five supercomputers in the world (Tianhe-1A, Nebulae, and Tsubame 2.0), but also in systems from various defense and aerospace technology leaders, including BAE Systems, Boeing, Honeywell, NASA, Raytheon, Northrop Grumman, Lockheed Martin, and Thales Group. Nvidia equipment also plays a role in mil-aero research and development at SAIC, Sandia National Laboratories, Los Alamos National Laboratory, Lawrence Livermore National Laboratory, and Oak Ridge National Laboratory.
GPUs complement and accelerate CPUs, and help make applications run faster. “The proof is in these top, large supercomputers. They lead the market in new technology deployment, and then it filters down to the rest of the market,” Gupta continues. “That is why we are seeing such widespread use in so many places. HPC is becoming much more mainstream; it’s not just limited to science and engineering.”
Accelerators have been used in mil-aero applications for quite a while, describes Mannel. In fact, SGI engineers have co-developed a specific accelerator for a variety of mil-aero applications, and have deployed field-programmable gate array (FPGA)-based systems over several years. “Newer accelerators such as cGPUs and other custom accelerators (such as TIlera) are being tested, and in some cases deployed in a variety of mil-aero applications,” he says. “The world is still heavily oriented toward CPUs, with GPUs perhaps bringing new interest in the area of accelerations. Some organizations continue to swear by FPGAs but, generally, the use is not growing because of complexity of the programming environment, the need to ‘re-port’ code to later generations of FGPAs.”
Diving into data processing
Gupta and his Nvidia colleagues are seeing a lot of activity in key mil-aero areas, including satellite image processing, signal and intelligence processing, and video analytics. A wealth of image and video data is being collected via satellite, unmanned aerial vehicles (UAVs), and other devices. “Just being able to process this data in real time or near real time is a big challenge,” Gupta says. “A lot of people have been talking about how they are drowning in data. The challenge is: You can’t hire enough analysts today to sit in front of these displays, watch videos or images, and be able to respond in real time.” Such applications have to rely on compute-intensive analytics methods and high-performance computing.
“This is the data deluge,” Gupta quips. “Traditional technologies just don’t cut it; they aren’t fast enough. All this processing requires a much higher-performance computing solution. This is where the GPU comes in and accelerates all this processing, whether video, image, or signal data. A lot of mil-aero domains are extremely computationally hungry and HPC is needed.”
UAVs continue to shrink in size, weight, and power (SWaP), yet “the problem with [compact and lightweight vehicles] is they are so small that they get jerked around in the wind a lot and the video they acquire is extremely shaky,” Gupta explains. “It is really hard to determine what is happening.” Ikena software from MotionDSP in Burlingame, Calif., performs real-time video stabilization, so it corrects the video and makes it steady. “They could only do this using GPU technology; they just couldn’t do it with traditional processors,” Gupta says.
Military professionals want to deploy high-performance computing and data-intensive processing in ground vehicles. “They want to be able to get the data in real time, instead of sending the data over satellite to a processing center in Germany, for example. They really are trying to do real-time processing in the field,” Gupta adds.
To that end, GE Intelligent Platforms in Charlottesville, Va., delivers embedded systems that are starting to take advantage of GPUs. “I like to think of them as embedded GPU supercomputers,” Gupta says. “They go in ground vehicles, ships, submarines, airplanes, and even missiles and smaller devices, and they can be used for IED (improvised explosive device) detection, image processing, sonar processing -- all kinds of things in each of these vehicles.
“It brings so much computation right to the vehicle, rather than transferring data to some other place,” Gupta adds. “When you are in a hostile environment, you can’t even transfer that much data because you risk detection and interception of the data, so you want to be able to process the data on the vehicle itself. GE IP has been a player in this space for a long time, but being able to use GPUs has really enabled them to go to the next level.”
Mercury Computer Systems in Chelmsford, Mass., has developed a rugged, mobile radar subsystem with very high performance using two new innovations: a GPGPU product based on the Nvidia Fermi architecture, and a 10 Gigabit Ethernet (10GE) standards-based, real-time sensor interface module, explains Anne Mascarin, solution marketing manager at Mercury Computer Systems.
“These products enable unprecedented levels of SWaP optimization for radar applications through the highest TeraFLOP-per-slot compute performance metric and the highest I/O channel density per slot available in the defense industry today,” Mascarin says. “This extraordinary level of performance is required to meet the stringent demands of modern radar, including the ability to search and track smaller, more numerous, and faster targets in the harshest environments.”
Mercury’s application-ready subsystem is based on a 16-slot OpenVPX chassis, while several Intel-based building block modules provide the system’s signal processing capabilities. Its Ensemble 6000 Series Intel Core i7 Dual Core LDS6520 Module acts as the system single-board computer, performs low-end signal processing tasks, and hosts two I/O mezzanine cards. The Ensemble 6000 Series OpenVPX Intel Xeon Dual Quad-Core HDS6600 Module is a high-density signal and data processing engine, harnessing the latest generation of server-class, Nehalem-based, quad-core Intel Xeon processors. Both the LDS6520 and the HDS6600 feature Protocol Offload Engine Technology (POET), Mercury’s protocol-agnostic, multi-standard, switch-fabric technology for system interconnects.
The sensor interface is performed by the Ensemble IO Mezzanine Series IOM-200, a 4x10 Gigabit Ethernet FPGA XMC card resident on the LDS6520. “Together these properties enable fast system throughput required by radar applications,” Mascarin says. “Much of the radar processing is performed by the Ensemble 6000 Series 6U OpenVPX GSC6200 GPU Processing Module. This Nvidia GPU-based module harnesses the tremendous compute power of GPUs for rugged, high-performance radar, speeding through essential radar algorithms, such as adaptive beam-forming, filtering, and pulse compression. Finally, the entire system is switched by the Ensemble 6000 Series OpenVPX SFM6100 module, which provides full inter-board serial RapidIO and Gigabit Ethernet connections in an OpenVPX system.”
“We are leveraging the high-performance, rugged, and upgradeable aspects of this GPGPU innovation into our next-generation radar subsystems and extending it with massive I/O,” says Didier Thibaud, senior vice president and general manager of the Mercury Computer Systems Advanced Computing Solutions business unit. “Together with our industry-leading rugged OpenVPX Intel modules, these new capabilities enable our SWaP-optimized radar subsystems to ‘do more with less’ so we can help our customers meet the challenges of the modern battlefield.”
BAE Systems Mission Systems officials selected integrated Application Ready Subsystems (ARSs) and system integration services from Mercury for its Advanced Radar Target Indication Situational Awareness and Navigation (ARTISAN) 3D Naval Radar Program.
“Mercury has delivered advanced ARS solutions for critical multifunction radar systems like BAE Systems SAMPSON, a key component of the UK’s Royal Navy’s Type 45 destroyer Sea Viper system,” Thibaud says. Mercury’s ARS solutions also power BAE Systems’ ARTISAN 3D radar, the next generation of medium-range radars for the majority of the U.K. Ministry of Defense Royal Navy surface fleet and future aircraft carriers.
“ARTISAN is designed as a main surveillance and target indication radar for surface vessels, from offshore patrol vessels to major warships,” says Chris Jones, ARTISAN project team leader, BAE Systems Mission Systems in the U.K. “It is critical that the signal processing system not only provide enhanced computing performance, but also a clear upgrade path for technology insertions.”
Mercury’s ARSs for the ARTISAN program combine open-architecture, high-density VXS processing modules, a Serial Front Panel Data Port (sFPDP) sensor interface, and RapidIO-based switch fabric with the MultiCore Plus software suite for multicore processing environments.
High-performance computing is a critical capability for many of today’s radar, sonar, and signals intelligence (SIGINT) applications, says David Pursley, product line manager at Kontron America in Poway, Calif. “Almost any application that needs digital signal processing (DSP) could provide more functionality -- higher speed, finer resolution, and more accuracy -- with more computing horsepower.
“The need for high-end processing for these types of applications has always existed, but until recently, compromises had to be made by giving up performance and/or moving to proprietary architectures. In the past few years, there has been an emergence of technologies, such as GPGPU, and open standards, such as VPX and ATCA, supporting HPC. These advances remove the need for such compromises,” Pursley says.
Kontron has been providing high-end, multiprocessor-based HPC systems for mil-aero applications for years in a variety of architectures, including VME, CompactPCI, ATCA, MicroTCA, and, most recently, VPX, explains Pursley. A recent VPX deployment for a mil-aero HPC application employs 15 Kontron VX6060 dual Intel Core i7 blades in each system. The blades communicate to the external world via Fibre Channel and 10 Gigabit Ethernet; amongst each other, they use Gigabit Ethernet as the control plane and a combination of PCI Express and 10 Gigabit Ethernet as the data plane to enable maximal throughput. The application software’s view of the hybrid communication topology is simplified by use of Kontron VXFabric, a lightweight API that allows high-speed, socket-based communication between blades.
The Kontron VX6060 has two Intel Core i7 CPUs onboard. Each CPU—and, in fact, each core—can be used for DSP via the Core i7’s SSE4 Streaming SIMD Extensions and/or as a x86 general-purpose processor, Pursley notes.
“For high-end parallel processing, there is no doubt that GPGPUs and FPGAs have a significant role,” Pursley says. “We seem to be integrating boards with OpenCL-based GPGPU processing more often with each passing month. This does not mean that the CPU or GPP is de-emphasized. In fact, we are seeing the opposite: a growing trend toward using one type of CPU blade as both the GPP blade(s) and the signal processing blades.
“The ability to use a single CPU blade to implement both the GPP and DSP blades improves maintainability, simplifies logistics, and reduces the total cost of ownership,” Pursley continues. “In large part, this is possible due to the SSE4 SIMD extensions to the x86 architecture and the prevalence of high-speed PCI Express interconnect on x86 blades.”
HPEC heats up
Curtiss-Wright Controls is focused on high-performance embedded computing (HPEC), “in which compute technology and performance levels typically associated with HPC environments is ruggedized and compactly packaged to make it appropriate for the most challenging deployed radar processing, communications intelligence, signal intelligence, and situational-awareness applications,” describes William Pilaud, HPEC systems architect at Curtiss-Wright Controls in San Diego. “Increasingly, customer requirements for new embedded radar/COMINT (communications intelligence)/SIGINT/situational-awareness systems demand HPC technology levels of performance in order to achieve the real-time requirements for the platforms.”
The company is leveraging advanced commercial HPC technology and development tools designed for use with clustered Intel-based HPC systems to provide HPEC solutions for the military’s next-generation, high-performance systems, Pilaud says. A recent situational awareness system combines clusters of GPGPUs, Intel processor-based single-board computers, Intel digital signal processors, and FPGAs to deliver real-time visual sensor information to helmet displays. The HPEC system aggregates data from the platform’s external visual sensors are delivers the appropriate visual data, based on which direction the individual user is looking, to his helmet/goggles, augmenting his vision with 360-degree awareness.
“This challenging application requires several teraflops of processing and many gigabytes-per-second bandwidth, which was essentially unachievable in a SwaP-constrained platform as recently as just two years ago,” Pilaud affirms. “In recent years, GPGPU technology has become increasingly dense and powerful per a given watt, making it much more attractive for rugged, embedded, situational-awareness applications, such as this vision system.”
The system uses a combination of GPGPUs, Intel Power Architecture processors, and FPGAs with Serial RapidIO (SRIO) interconnects between all the different modules. “SRIO is an ideal fabric for an HPEC architecture because, while it looks very much like high-speed Ethernet or Infiniband connections to general-purpose processors, it provides a very low-latency, high-bandwidth interconnect that FPGAs can use to stream data directly into the processors -- representing a significant performance advance in the mil-aero embedded world.”
The HPEC system’s single-board computers handle the SRIO, its FPGAs handle the unique application algorithms, and the GPGPUs handle the sensor data analysis. It is deployed in multiple rugged racks and comprises eighteen 6U OpenVPX boards, more than half of which are GPGPUs, to deliver more than 7 Teraflops of compute power. Pilaud expects to see Curtiss-Wright Controls’ base HPEC platforms adapted to address many types of mil-aero, SWaP-constrained opportunities, because “they are uniquely able to deliver a huge amount of compute power in a very small space,” he says.
Curtiss-Wright’s approach involves: aggregating and packaging the heterogeneous HPEC environment of single-board computers, FPGAs, and GPGPUs, and taking HPC-type technologies originally developed for enterprise back-office environments and, using COTS techniques, providing their benefits to the warfighter, Pilaud says. “In essence, we will be taking supercomputers that used to literally require rooms full of equipment and integrating them into a 19-inch deployable rack.
“One key difference between the HPC and the HPEC world is that mil-aero applications are much more latency sensitive than in the commercial world,” Pilaud continues. “Our tools focus on the fabric interconnects to the module and verify latency differences at the packet level into the different subsystems in a rack. Another very big difference between HPC and HPEC is that we, at the rack level, synchronize our modules to a common clock, which is not done in HPC, to tightly couple boards together so that time latency differences in the fabrics can be identified.”
Finding an efficient way to move data between the processing nodes poses one of the largest hurdles in HPC. “Whether the HPC nodes are architected as a data pipeline, a mesh, or something in between, the architectural bottleneck often comes down to communication throughput,” Pursley describes. Kontron’s open-infrastructure VXFabric enables hardware-speed, board-to-board communications while abstracting away the low-level implementation of the hardware interface. “For example, VXFabric allows the use of today’s ubiquitous high-speed interface, PCI Express, in arbitrary communication topologies via a simple, lightweight socket-based API.”
“Over the past five years, the amount of data coming from sensors, high-speed video, radar installations, etc. has exploded,” says Ken Owens, CEO of Conduant Corp., a high-performance, high-speed data recording and storage company in Longmont, Colo. High-performance computing creates challenges, as data rates increase from megabytes to gigabytes per second. Conduant engineers design with newer-generation FPGAs and depend on the latest bus architectures, such as PCI Express Gen 2, to meet performance hurdles.
The company is shipping equipment that is able to record data streams at rates that approach 1 gigabyte per second, as well as designing systems that use multiple recorder subsystems to attack multi-stream applications where the aggregate data recording and playback requirements exceed 250 gigabits per second. “Our systems are used by mil-aero users to drive simulation test beds and capture critical radar traces, and as important building blocks in high-speed video capture and playback systems,” Owens mentions. “Working with this spectrum of applications has given us the opportunity to design interface products to support sFPDP, LVDS, and cabled PCI Express. Gone are the days of FPDP and MIL-standard 1553.”
HPC applications can operate on large dataset spread across dozens, hundreds, or thousands of computing nodes, and can result in data starvation, whereby one or more nodes in a cluster are waiting for data. Prism FP technology from Samplify Systems in Santa Clara, Calif., can losslessly compress floating-point values and is used to accelerate HPC applications which are input/output (I/O) bound. “Our technology applies to a variety of applications from car crash simulations, computational fluid dynamics (CFD), weather forecasting, or any problem using finite-element analysis (FEA) to solve physics equations,” says Allan Evans, vice president of marketing, Samplify. “Samplify's Prism FP accelerates the data transfers between computing nodes, decreasing data starvation and increasing sustained throughput.”
Interconnects are integral to the efficiency of HPC systems. Connecting compute elements, the CPU and GPU, in a very efficient way enables effective simulations and computations, and provides the ability to run more simulations a day and reduce time to market with newly designed mil-aero devices. Mellanox provides fast interconnect solutions, whether Internet- or Infiniband-based, for connecting servers and storage and forming high-performance computing clusters.
“Infiniband has become the de-facto connectivity for HPC because it delivers the highest throughput in the market in a very low CPU overage, so you can really get a lot of CPU efficiency or simulations productivity for the end user,” Shainer says. Mil-aero customers are adopting Infiniband for many reasons: to run many more simulations because the system is faster; increasing system efficiencies, which results in faster ROI; and to have simulations that scale to detailed, sophisticated designs for next-generation products. “The models are going to be much more complex; therefore, you’re going to want to go to a bigger system to do those kinds of simulations -- and the interconnect is critical. Many HPC simulations in this area are being done on Infiniband-based systems.”
“HPC is going ‘commodity,’” Shainer insists. “In the past, for HPC, you had to buy very expensive proprietary, symmetric multiprocessing (SMP) solutions that weren’t within reach for many people. It’s now a commodity -- more folks can use it and do much better designs, reduce time to market, and build safer products. HPC capabilities will continue to drive higher-performance, faster compute systems, which is important, he says. “We’ll also see HPC go to other places, such as enterprise data centers, because of the efficiencies it provides.”
All high-performance technologies will be required to do more with less, per recent defense acquisition reform policies, Mascarin predicts. “They must be more versatile, able to operate in multiple environments and to perform multiple tasks simultaneously. Going forward, the requirement for compliance to open standards will become even more stringent, as the requirements for interoperability and upgradeability increase.”
“As we look to the future, SWaP can no longer be the singular decision criteria in the development of systems. While these remain critical, these systems must now be safe and secure,” says Glenn Beck, industrial segment marketing manager, Aerospace & Defense, Freescale Semiconductor in Austin, Texas. “They must provide protection against theft of functionality, theft of data, and theft of uniqueness.
“Presently systems that have this capability have to be implemented utilizing multiple devices, which exposes busses between devices and drives up cost,” Beck adds. Freescale's QoriIQ family is designed to protect against these threats by implementing Secure Boot, strong multi-core domain separation, Tamper detection of physical and network attacks, and remote secure debug and device updates. “This comprehensive trust architecture provides assured computing without compromising the performance of the mission.”
Owens detects various trends: faster data-rate requirements, rugged and solid-state storage requests, and multi-data channel applications. In aerospace and defense, he says, “the market demands remain the same: faster, cheaper, smaller!
GE Intelligent Platforms
Mercury Computer Systems Inc.
Silicon Graphics International Corp. (SGI)