The general-purpose graphics processing unit (GPGPU) from companies like NVIDIA and AMD is bringing high-performance embedded parallel processing to SWAP-constrained signal processing in unmanned vehicles and other persistent-surveillance applications.
The general-purpose graphics processing unit-a computer chip better-known as the GPGPU-represents one of the biggest breakthroughs in years for high-performance embedded computing (HPEC) in aerospace and defense applications.
This powerful chip, which began life in the last decade as a graphics- processing engine aimed at high-end computer gaming, has emerged as a powerful massively parallel processor, which not only lends itself to complex floating-point processing, but also is proving itself easy enough to program to appeal to a broad range of military embedded systems.
In addition, GPGPU technology is progressing on a technological trajectory roughly equal to that of Moore's Law, meaning its power doubles and physical size shrinks ever couple of years.
The primary designers of GPGPU chips in the U.S. today are NVIDIA Corp. in Santa Clara, Calif., and Advanced Micro Devices Inc. (AMD) in Sunnyvale, Calif. Much of AMD's GPGPU expertise came from the company's 2006 acquisition of ATI Technologies Inc. in Markham, Ontario.
Put all this together, and the GPGPU is becoming the cornerstone of digital signal processing in aerospace and defense applications like radar and sonar signal processing, image processing, hyperspectral sensor imaging, signals intelligence, electronic warfare, and persistent surveillance.
|The 6U OpenVPX GSC6201 module from Mercury Systems with dual general-purpose graphics processing units (GPGPUs) is for high-end radar, electronic warfare (EW), and image processing.|
The reference to "graphics" in GPGPU has become somewhat of a misnomer-particularly in aerospace and defense applications where the chip's parallel processing nature is its biggest attraction.
"The GPU is a massively parallel device that is strictly focused on many threads and parallel large data sets," explains Marc Couture, director of product management at Mercury Systems in Chelmsford, Mass. "This is versus a CPU [central processing unit] where they are focused on single-threaded applications." The GPGPU, Couture points out, has a much higher rate of processing floating-point operations per second than a CPU can.
"Companies like Mercury are always looking at different types of compute engines, and it was apparent from the start that GPUs are collections of many cores that can do floating-point math, almost like a large grid," Couture explains.
The device "is a high-performance embedded parallel processor, which is a better description of what we are trying to achieve with this, than GPGPU," says Dustin Franklin, GPGPU applications engineer at GE Intelligent Platforms in Charlottesville, Va.
As an embedded parallel process- ing engine, the GPGPU is particularly well suited to data-intensive sets of information, such as digital images made of many pixels. "With thousands of processors you can process an image pixel-by-pixel," says Roger Stein, product marketing manager at Themis Computer in Fremont, Calif. "You can directly bring those pixels into a GPGPU and process those pixels in parallel. That approach involves a lot of the activity going on now."
At one time, parallel processing computers were large, complex, and often fragile hardware with architectures that only the most dedicated computer scientists could program. Until the GPGPU burst on the scene, parallel processors were out of the question for a vast majority of embedded computing.
Driven by the computer gaming industry, GPGPU technology is becoming an affordable alternative for a growing number of embedded computing applications. One expert says GPGPUs fall in a market niche of medium computing power at medium pricing. Perhaps they are not for low-end, cost-sensitive applications, but they also are not in the realm of the most expensive parallel processing architectures.
"They are very good at doing floating point in a constant stream through radar, surveillance, or information functions," says William Pilaud, high-performance computing architect for Curtiss-Wright Controls Defense Solutions in Ashburn, Va.
Not for everyone
Pilaud cautions that GPGPUs are not one-size-fits-all approaches to high-performance embedded computing. "If you need to do a lot of FFTs [fast Fourier transform calculations], then this technology is great," he says. "But if you need to make decisions in this stream, GPGPUs fall apart. They don't have the decision mechanisms in each core that the CPUs have; they do not have the ability to change on the data stream like CPUs do."
What that means is that GPGPUs never will match the ability of CPUs, such as the Intel Core i7 microprocessor, to run interrupt-driven applications like missile-warning systems and smart motion control. Still, in applications that need to work on large chunks of data, few embedded computing technologies can match the GPGPU.
"The GPU isn't for everybody-only those that can exploit parallelism in their data and meet their thresholds," says GE's Franklin.
"The caveat is GPGPUs have massive throughput, but they can't turn around results in microseconds or nanoseconds," says Mercury's Couture. "In electronic warfare, you have to turn things around in nanoseconds or single-digit microseconds. We need to look at GPUs in terms of what limitations they have."
One other important design concern with the GPGPU involves power and heat. Compared one-for-one with CPUs and field-programmable gate arrays (FPGAs), GPGPUs consume a lot of power and generate a lot of waste heat.
"These GPGPUs burn 250 watts themselves, and dissipating that heat is another issue," says Themis' Stein. Mobile military embedded computing applications, moreover, use DC power supplies that can have trouble supplying the kind of power levels that GPGPUs need, he says.
Typically, the more powerful the GPGPU, the more electricity it needs to run, and it may be some time before GPGPUs are available that have the performance and lower power consumption for some of the more demanding embedded computing applications.
"There are GPGPUs that are more intensive, but at some point we can no longer embed them, because they are too power-hungry and generate too much heat," says Mercury's Couture.
Despite that, embedded computing designers should weigh the potential benefits of some of the more exotic thermal-management approaches as they decide whether or not to use GPGPUs. "We have been doing things like Air Flow-By cooling, where we use air in a different way to cool some of these hotter processors and multicore devices," Couture explains. "Some of those more esoteric cooling technologies are viable today for some of the more powerful GPGPUs."
Mercury's Air Flow-By cooling technique is for air-cooled, con- duction-cooled and VITA 48 subsystem chassis in high-powered radar, electro-optical, signals intelligence, and electronic warfare applications on ground vehicles and aircraft.
Graphics processor roots
Even though digital signal processing with GPGPUs on the surface has little to do with graphics processing, the original graphics nature of these chips has a fundamental influence on signal processing for imagery, radar, sonar, SIGINT, and similar complex calculations. Applying GPGPUs to signal processing "is like reversing the architecture of a graphics card," explains Themis' Stein.
"We use the GPGPU for deconstructing the world for exploitable information that conveys useful stuff about the environment you're in," says GE's Franklin. "They are good at both things: rendering the world, and deconstructing it."
It's a fortunate thing that GPGPUs lend themselves so well to military signal processing applications, because systems designers are able to capitalize on massive commercial investments in graphics processing technology. It's almost like getting tremendous embedded parallel processing capability for free, and the rise of the GPGPU in aerospace and defense applications follows exactly the letter and spirit of using commercial off-the-shelf (COTS) technology for military uses.
"The day job of graphics processors is still doing graphics," Franklin says. "Billions of dollars are made on graphics for gaming. Every family of chips that NVIDIA makes costs $2 billion in research and develop- ment. We're riding the wave of graphics and consumer graphics."
Not only the GPGPU chips them- selves are broadening from graphics-only devices to signal processing, but GPGPU software programming languages also are moving toward signal processing and general-purpose processing.
"GPU devices were focused on graphics, and you were able to use general-purpose processing by using graphics languages like OpenGL," says Devon Yablonski, product systems engineer at Mercury Systems. "NVIDIA and AMD have embraced the general-purpose computing world."
|Curtiss-Wright Controls Defense Solutions offers the 3U VPX3-491 NVIDIA Fermi GPGPU-based compute engine.|
GPGPU design advantages
Software programming languages are playing a big part of the growing popularity of GPGPU technology for aerospace and defense digital signal processing-namely the Open Graphics Library (OpenGL) language, the CUDA parallel-processing programming language created by NVIDIA, and finally to the latest Open Computing Language (OpenCL).
Before OpenGL, CUDA, and OpenCL, programming massively parallel- processing computers was a difficult task involving a small cadre of experts and arcane programming languages.
The new languages-particularly OpenCL-are helping make GPGPU technology accessible to software programmers familiar with the C and C++ languages. Moreover, OpenCL is evolving such that eventually this language may be in common use not only for GPGPUs, but also for CPUs and FPGAs.
Such a development in the future may lend itself to heterogeneous embedded computing architectures involving combinations of CPUs, FPGAs, and GPGPUs, all program-med and maintained in the same software language.
"A great combination is FPGAs and GPGPUs, with an Intel CPU in the mix," says Mercury's Couture. GPGPUs and FPGAs don't change directly very quickly. That's where an Intel CPU could help in the overall picture. You could program a GPGPU, CPU, and FPGA as an open slice. This is moving toward programming all these different processors in a heterogeneous computing slice with OpenCL."
The GPGPU content of open software libraries also are growing to make GPGPU software even more widely available. "There's lots of off-the-shelf Linux that I can download and add to my GPGPU stuff," says Curtiss-Wright's Pilaud.
"The types of processing work being deployed in embedded computing has classically been done by specialized processing elements like FPGAs and DSPs [dedicated digital signal processors]," says Themis' Stein. "These types of systems take some specialized processing expertise." Open programming languages for GPGPUs are changing much of that, he says. "GPGPUs, although you program them similarly to DSPs and FPGAs, are less specialized by using OpenCL."
This kind of programming software, moreover, lends itself to a relatively painless evolution of GPGPUs in embedded computing over time. "Because the GPGPU is a regular arrangement of cores, it basically scales over time as they provide more and more cores inside the devices," Stein says. "Software doesn't have to be rewritten as GPGPUs scale up in cores over time."
Performance per watt
One of the most attractive aspects of GPGPUs for embedded systems designers is its performance per watt, which is a primary consideration for systems that require small size, weight, and power consumption, better-known as SWAP.
"The SWAP potential of GPGPUs is great," says Themis' Stein. "The GPGPU uses a lot of power, but in a way it can replace a lot of general-purpose processors because it has so many cores. You could replace six 1U servers with one serving using a GPGPU. You could consolidate a lot of the processing you need to do."
Small size, weight, and power consumption is particularly important to current and future applications like unmanned vehicles that require big processing power in small dimensions.
"If you have a size issue-which every airplane or UAV [unmanned aerial vehicle] does-and you don't have a lot of area, and you could effectively cool these things, then a GPGPU has a lot of attraction," says Curtiss-Wright's Pilaud. "If you're not scared of a couple of hundred watts, you will look at the GPGPU seriously.
"The GPGPU can make a system more efficient," Pilaud continues. "It can give you the same per- formance-if not better-but flatten your power profile. Everybody's game now is to increase the computational power per watt."
As if parallel processing, software programmability, SWaP, and power per watt weren't enough, the GPGPU is making improvements that will make it even faster for digital signal processing. Previously one complaint about GPGPUs was its latency-or the extra time it takes to move data between CPUs and GPGPUs in integrated systems.
Mercury, for example, uses an approach called StreamDirect that enables data to move quickly directly from the I/O source such as a sensor to the GPU to improve system latency. "GPUs depend on an Intel processor as their proxy, but the Intel was always getting in the way," says Mercury's Yablonski.
NVIDIA's GPUDirect, introduced in 2010, does much the same thing. At first, GPUDirect supported accelerated communication with network and storage devices supported by InfiniBand. Since its introduction, NVIDIA experts have added support for peer-to-peer communication between GPUs and optimized APIs for video solutions, and support for RDMA between GPUs and third- party devices.
GPUDirect enables third-party network adapters, solid-state drives, and other devices to read and write CUDA host and device memory directly to eliminate CPU overhead.
"GPUDirect does DMA memory directly to and from the GPGPU without having the CPU in the middle," says GE's Franklin. Now the GPUDirect RDMA allows an FPGA, disk, or Ethernet to stream data directly to the GPGPU with no CPU middleman whatsoever."
This capability will offer direct benefits to aerospace and defense systems, such as electronic warfare. "It used to take a millisecond, but now takes 20 to 50 micro- seconds to do a signals intelligence or EW benchmark calculation," Franklin says. "Before GPUDirect we couldn't do that," he says. "After GPUDirect, it was 20 microseconds, and this was acceptable for certain EW applications."
Advantech Corp., Industrial Automation Group Cincinnati, Ohio
AAEON Electronics Inc. Hazlet, N.J.
ADLINK Technology Irvine, Calif.
AMD Embedded Products Sunnyvale, Calif.
Animated Media Inc. Toronto
Aspen Systems Inc. Wheat Ridge, Colo.
Asus Computer International Inc. Fremont, Calif.
Cavium Networks Mountain View, Calif.
Corvalent Cedar Park, Texas
Creative Electronic Systems SA Geneva, Switzerland
Curtiss-Wright Controls Defense Solutions Ashburn, Va.
EIZO Tech Source Altamonte Springs, Fla.
Eurotech Columbia, Md.
EVGA USA Brea, Calif.
Galaxy Microsystems Hong Kong
GE Intelligent Platforms Charlottesville, Va.
HABEY Intelligent Technology Co. Ltd. Walnut, Calif.
Kontron Poway, Calif.
Matrox Graphics Inc. Dorval, Quebec
MEN Micro Inc. Ambler, Pa.
Mercury Computer Systems Chelmsford, Mass.
MotionDSP Inc. Burlingame, Calif.
NVIDIA Corp. Santa Clara, Calif.
One Stop Systems Inc. Escondido, Calif.
Parvus Corp. Salt Lake City, Utah
Quantum3D Inc. San Jose, Calif.
RadiSys Corp. Hillsboro, Ore.
Scan Enginering Telecom Voronezh, Russia
Systerra Computer GmbH Wiesbaden, Germany
Themis Computer Fremont, Calif.
VersaLogic Corp. Portland, Ore.
Vision4ce LLC Severna Park, Md.
WOLF Industrial Systems Inc. Uxbridge, Ontario