Systems designers once were ready to write the obituary of the dedicated digital signal processing chip as they grew to be enamored with the latest generations of field-programmable gate arrays, yet the high costs and difficulty programming FPGAs are giving engineers reason to take a second look at DSPs.
By Ben Ames
With applications such as radar and image processing, speed used to mean everything in digital signal processing (DSP) so military designers chose specialized DSP chips. Then Moore's Law pushed prices down and performance up and designers found they could save money by using a general-purpose processor like the PowerPC. They also found that they could boost speed in certain functions by building their own field-programmable gate arrays (FPGAs).
After a honeymoon period, the gloss is wearing off the FPGA. Designers still love its speed and flexibility but they are growing weary of its cost and complexity. Some vendors are building hybrid boards, combining the best features of FPGAs and PowerPCs or dedicated DSPs. And other vendors are selling designer DSPs, each built to serve a single niche market.
Designers demand easy-to-use, low-cost processors, even as they push for greater computing speed and performance density. On most platforms those are conflicting goals so some vendors are building specialized digital signal processors. Instead of focusing solely on raw speed, these engineers are building chips that maximize on-chip memory, I/O bandwidth, or power efficiency.
A typical speed benchmark measures the time it takes a DSP to perform one fast Fourier transform (FFT), says Chuck Millet, TigerSHARC product manager for Analog Devices Inc. (ADI) in Norwood, Mass. That grade relies heavily on a chip's clock speed, such as 600 megahertz or 1.2 gigahertz. But in real-world applications, engineers often perform many FFTs sequentially, an operation that demands high I/O capability as well as clock speed.
Mercury Computer Systems' MCJ6 FCN module is optimized for applications in front-end data processing for signal intelligence and radar. It consists of a 6U VME board with two Virtex II Pro P70 FPGAs connected to a RACE++ switch fabric.
ADI officials sell their new 600-MHz TigerSHARC processor (ADSP-TS201) as "the industry's best floating-point performance per watt," a measure of efficiency as well as speed. The company also says the TigerSHARC is designed to network well with additional processors for applications in high-performance computing such as communications, defense, medical imaging, and industrial instrumentation.
For example, they tout another TigerSHARC strength in its 24 megabits of onboard memory. For jobs such as synthetic aperture radar, that capability allows the chip to avoid the bottleneck of off-chip memory and run faster than a PowerPC or Pentium, Millet says.
Likewise, ADI competitor Intel announced in March that it would cease naming its processors by their clock speeds. To emphasize other performance metrics than simple speed, company marketers will label their products with numbers from 300 to 700, with larger numbers connoting more features.
Designers of defense and telecommunications electronics are also striving for higher performance density, packing powerful computers into small spaces. "You have to keep the power down below five watts," Millet says. "This gives the customer more freedom on where he can place the components because they're easier to cool and to pack in more densely."
At the same time, those designers are trying to keep their design cycles fast and cheap, to get their products in the field quickly. For most applications, that means staying away from specialized chips such as application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs), Millet says. Most military programs simply do not have the economies of scale to justify the investment.
FPGAs will always have a place in digital signal processing. They are very fast at performing discrete operations and work well in the data path between processors, Millet says. The downside is that they draw a lot of power, are difficult to cool, and tough to program and debug.
Board vendors downplay FPGA
Putting an FPGA alongside other processors in your DSP path is not a panacea, some designers warn.
"In the military space, heterogeneous processing is becoming all the rage, but I've never been a big fan of it," says Jeff Milrod, president of Bittware in Concord, N.H. "Applications for FPGAs are important but limited; it's just very hard to implement algorithms on an FPGA. I think the pendulum will swing back when customers find how hard it is and they'll come back to using a DSP for 95 percent of the job and using an FPGA only when they absolutely have to."
In fact, that is a good description of Atlantis, the board architecture that drives Bittware's new T2-PCI. The product is a PCI-X plug-in card featuring a Virtex-II Pro FPGA from Xilinx and four TigerSHARC 201 "Danube" DSPs from Analog Devices. "Danube is still in prerelease but we've shipped hundreds of them now and it just cranks," Milrod says. "It's faster than a PowerPC straight up and also it uses less power so you can fit more of them on a board."
That is a different from the common design approach of using one costly FPGA to replace most of the dedicated DSPs on a board, then splitting up computing tasks between the remaining chips.
"We're somewhat conflicted because usually people use the FPGA in heterogeneous designs for preprocessing but we use it only for control, I/O, switching, or routing," Milrod says. "Preprocessing isn't the strength of the FPGA in our architecture because we already have so much stinkin' power in the TigerSHARCs."
Of course, a fast board needs a fast data feed so the other major industry trend is fabrics. "Switch fabric is here. We've been looking at it for five years as a disruptive technology but we've been holding off on picking a path, because it was too amorphous. But now it's here," he says. "The market's still fragmented, just as the parallel-bus market was in its early days. But with the ability to make reconfigurable interfaces in FPGAs, all our interfaces will eventually use Serial RapidIO, Stargen, PCI Express, or whatever."
DSPs fit niches
Customers are clamoring for specialized DSP designs, agreed Wallace Scott, strategic marketing manager for military DSPs at Texas Instruments in Dallas. In March, company planners released "the industry's first digital signal processor qualified for space-based applications." The SMV320C6701 combines high performance with radiation tolerance for applications in satellite communications and control systems, planetary landers and rovers, and sample collectors.
With a 140-MHz clock rate, it offers 840 million floating-point operations- per-second (mega FLOPs) performance and 100 kilorads of total-dose radiation tolerance. The device meets military performance standard MIL-PRF-38535, qualified manufacturer list, Class V (space).
Another DSP flavor favored by designers of military and aerospace electronics is plastic packaging. "Enhanced plastic is the hot button in the industry now for customer interest," Scott says. "Eighty to 90 percent of new military designs are incorporating plastic integrated circuits."
That growth is largely cost driven, since plastic parts are cheaper than comparable ceramic components, he says. Still, planners at Texas Instruments will continue making ceramic packaged chips to support relatively old platforms. In fact, they have recently released improved products in that line such as the fixed-point digital signal processors SMJ320C6415 and SMJ320F2812.
The T2-PCI board from Bittware, in Concord, N.H., uses a Xilinx Virtex-II Pro FPGA for I/O and routing but uses four TigerSHARC 201 DSPs from Analog Devices for most of the computing.
DSP board vendors who formerly made general-purpose components are now making more specialized products, agrees Manuel Uhm, senior manager of strategic marketing at Spectrum Signal Processing in Burnaby, British Columbia.
These changes are more about packaging than technology, as they integrate more functionality onto their boards to make them a better fit for vertical niche markets. "They're not trying to meet everyone's requirements because then you usually meet nobody's," Uhm says. "They're targeting the markets with greatest opportunity, including commercial wireless and military communications."
Spectrum engineers created the SDR-3000, an integrated signal-processing platform optimized for software-defined radio. Based on the CompactPCI form factor, it uses Serial RapidIO data fabric. Its flexible API (application programming interface) lets customers use different types of DSP devices on a single algorithm, combining FPGAs, dedicated DSPs, and PowerPCs on one platform.
In August, the U.S. Department of Defense chose the SDR-3000 as the hardware set for the Joint Tactical Radio System (JTRS). DOD researchers now use it to validate JTRS waveforms for acceptance in the Software Communications Architecture (SCA).
Other customers include the U.S. National Aeronautics and Space Administration (NASA), which chose SDR-3000 as a prototyping platform for the Goddard Space Flight Center's satellite-to-satellite communication network, called the Cross Link Integrated Development Environment (CLIDE).
Thales Underwater Systems will use an SDR-3000 to detect mines from an unmanned underwater vehicle (UUV). The remote-controlled vehicles use sonar to search for explosives in dangerous, shallow water.
DSPs deliver speed
Speed and efficiency have always been goals for DSP boards, but high-performance computing — known as HPC or supercomputing — is restricted to rooms packed with servers.
That may change soon. Vendors are now producing silicon designed for high-performance computing on DSP boards, says Steve Paavola, chief technical officer of Sky Computers in Chelmsford, Mass. "FPGAs were designed to be general-purpose glue, programmable for many tasks. But other silicon is more purpose-built for computation. Silicon is cheap and now people are using novel architectures to tie it together."
The new platforms will produce significant performance at a low cost of wattage. It is true that the DSP architecture is less flexible than FPGA design, yet the new platforms will run 64 processors all performing operations in the same clock; that much parallelism delivers very high bandwidth, Paavola says.
These high-performance DSP units will be good fits for applications in radar (like pulse compression), medical imaging, and supercomputers. Engineers at the main vendor firms and at DARPA are just starting to sample the chips now, so designers of military devices have not yet adopted them.
One thing is sure — there will never be a silver bullet for DSP. "There's no one right approach," Paavola says. "It depends on the applications, the platform, and skills at the customer site."
FPGAs stay flexible
Meanwhile, users are still trying to program their FPGAs.
"Let's face it, it's not as easy to program FPGAs as DSPs," says Rodger Hosking, vice president of Pentek Inc. in Upper Saddle River, N.J. "There are lots of C coders out there, but you need a hardware engineer to program an FPGA. It's more specialized skill to handle the I/O pins and load skewing between data and clocks."
One solution is modeling tools like Matlab and Simulink from The Mathworks, in Natick, Mass., or Getae from Blue Horizons, in Mount Laurel, N.J. Companies like Celoxica, in Abingdon, England, offer code compilers that translate C code into FPGA language.
Customers can also buy off-the-shelf intellectual property (IP) cores and save the time of reinventing traditional signal-processing functions. Pentek itself offers three different IP cores and other FPGA vendors offer more.
Still, military users are loyal to the technology because many have found that the parallel architecture of FPGAs is good for high-level, intensive signal processing.
They put an FPGA in series with high-speed peripherals so each board has a mix of different signal-processing technologies. That is crucial for applications such as software-defined radio because the FPGA does intensive signal processing on data to digest or massage it so the general-purpose processor can handle the volume.
In fact, FPGAs are so fast that a traditional bus architecture presents a data- flow bottleneck. So customers who are building high-density, embedded, real-time systems are demanding fast switched-network fabrics such as RapidIO, built on the VME bus under the VITA 41 (VXS) standard. Another option is the new VITA 46 standard.
In the long term, fabrics like VITA 46 will mature to offer higher bit rates and FPGA vendors will perform more processing on the chip itself, such as the PowerPC built in to Xilinx's Virtex-II Pro FPGA, Hosking says.
FPGAs find a sweet spot
Designers should not write off the FPGA yet. Customers have been increasing their demand for digital signal processors since the terror attacks of Sept. 11, 2001, using them for jobs such as image recognition and security, says Robert Bielby, senior director of strategic solutions marketing at FPGA designer Xilinx Inc. in San Jose, Calif.
At the same time, Pentagon designers are using more DSPs for initiatives like meshing their shared common database and the accompanying need to filter, encrypt, and classify packets. And designers at the U.S. Department of Homeland Security's TSWG (technical support working group) are busily using DSPs to create tools for facial recognition, audio authentication, and explosives detection.
To meet that demand, vendors are building more specialized chips. "General purpose microprocessors are good at doing many different jobs, but not great at any one. FPGAs can be great at doing a single job," Bielby says.
Pentagon planners' goal of designing a digital battlefield demands even higher performance in signal processing. "They want to stream audio and video from the battlefield to a UAV [unmanned aerial vehicle] to the Pentagon and back again, in real time," says Narinder Lall, Xilinx DSP marketing manager. "To meet that bandwidth requirement, you need a highly-parallel architecture."
For parallel architectures in signal processing, there is only one solution: the FPGA, experts say. One challenge is the struggle of many systems with bottlenecks of data flow on and off the board. Two years ago, Xilinx engineers solved this problem by embedding a PowerPC in the fabric of their Virtex-II Pro FPGA.
In April, they released a ruggedized version for applications in military and aerospace designs. The Q-Pro version is qualified for military packaging, temperature, and radiation, Lall says. These FPGAs are a good way to reduce cost in low-volume military projects and are designed for applications like satellite communications, automated target recognition, and radar beam forming, he says.
FPGAs play to their strength
"FPGAs are good for parallel processing and power PCs are good for sequential processing," says Joe Jacob, technical product manager for DSP products at Dy 4 Systems in Leesburg, Va.
"There will always be a need for heterogeneous systems with a PowerPC at the back end for decision making and an FPGA at the front end for repetitive jobs like filtering and transforms."
As their density and complexity rise, battlefield electronics need ever-faster digital signal processing. Platforms like radar, imaging, and signal intelligence all rely on high-end DSPs. So industry engineers are looking for ways to fit FPGAs into their processing streams.
Dy 4 designers have an answer — the CHAMP-FX card. Each has two Xilinx Virtex-II Pro FPGAs (each containing its own PowerPC), backed up with high-capacity I/O interfaces and onboard memory.
Those FPGAs are fast but they also are hot. "We have to do innovative things to get the heat off the card," Jacob says. The CHAMP-FX includes thermal sensors for die and board temperatures and a current sensor on the power supply. Together, those built-in features prevent a system from running too hot and damaging the chip.
For data flow, the CHAMP-FX uses StarFabric and a flexible IPC (interprocessor communications) interface. "When the next fabric comes along, we'll port our interface to that," he says. "The customer will be minimally affected by underlying hardware and we'll help preserve their investment in software."
The CHAMP-FX is designed to quickly filter torrents of data. That means it's appropriate for tasks such as electronic warfare countermeasures, where a ship or plane processes an enemy's radar signal and transmits a false reflection.
Another typical job would be automatic target recognition for high-resolution cameras, in which the chip would analyze patterns like paint color on a jeep, quickly determining if a target is friend or foe.
FPGAs stay in the mix
Engineers continue to build mixed boards for digital signal processing in applications such as radar and signal intelligence. "FPGAs still have their largest value as a high-speed spigot on the front end, doing data reduction before sending bulk data to PowerPCs on the back end," says Richard Jaenicke, director of product marketing for Mercury Computer Systems in Chelmsford, Mass.
Designers can have the best of both worlds if they divide the labor for computing signal-processing algorithms between different breeds of chip. The best example is the Xilinx Virtex-II Pro, an FPGA with an onboard, embedded PowerPC, he says. It offers such low latency through its high-speed serializing and deserializing that designers can even use it for electronic warfare functions like signal jamming.
Of course, nothing comes for free. "FPGAs today are very difficult to program," Jaenicke says. "There's a big carrot sitting there, like running your algorithm 20 times faster. But there's also a big price to pay."
To solve the problem, FPGA vendors are offering standard development tools and companies like The Mathworks make system modeling and code compiling tools.
Another FPGA drawback is power. "The FPGA today is very power hungry," he says. "But that's coming down over time. The good news is that performance per watt is increasing. Whereas PowerPC watts per processor is going up over time and performance per watt is growing very slowly."
Poor power management can have a big influence on certain platforms. On a UAV, it could decrease the time an aircraft could spend over its target. Mercury engineers balance these design tradeoffs by integrating FPGA computing into its RACE++ VME systems with the new MCJ6 FCN (FPGA compute node), scheduled to ship in May.
The MCJ6 FCN module consists of a 6U VME board with two Virtex-II Pro P70 FPGAs connected to a RACE++ switch fabric via an on-board crossbar. That enables customers to run compute-intensive, repetitive portions of applications on the FPGAs, while sending complex applications to a PowerPC processor via the fabric.
FPGAs stay flexible
Battlefield sensors can soak up vast amounts of information, flooding digital signal-processor boards with data. That presents a special challenge for designers of military electronics who must find a balance point between latency and throughput, says Andrew Reddig, president of TEK Microsystems Inc., Chelmsford, Mass.
"It's more efficient to send big, fully buffered packets, but there's less latency when you simply send each bit as soon as you receive it," he says. "When digital signal processing is deployed on an FPGA, you can tweak that balance. You can even adjust it on the fly, for instance if you're changing your radar from acquisition to tracking."
Tekmicro designers build devices with PowerPCs and FPGAs, not dedicated DSPs. They make rugged, air-cooled platforms for applications in fighter jets and UAVs.
"We're focused on relentless data flowing in fast from the outside world when there's no way for the card to say 'Stop, I'm busy.' We work in cases where people are saturating the system to 95 percent of a board's capacity," Reddig says.
In one application, a customer had to perform image correction for the reconnaissance camera on an F/A-18 jet fighter-bomber. Their challenge was how to calibrate pixels for the high-speed digital photography without adding extra weight to the plane. So they replaced the existing five PowerPC cards in the sensor pod with three FPGAs. The resulting product saved both slots and weight.
That sounds great, but to achieve such a result the customer must be skilled enough to program an FPGA. "We have two different kinds of customers; one who thinks of algorithms as their secret sauce, and another who just wants a black box that spit out FFTs," Reddig says.
Many customers feel burned by the complexity of FPGA development. "People coming down from using an ASIC are happy with saving money. But people coming up from using a PowerPC are used to typing six lines of code. Now they have to type another 10,000 lines after the board vendor sold them as FPGA as being simple to program," he says.
"It's a mismatch of resources and expectations driven by overselling the tools. Customers who are not used to making their own hardware say 'We bought this FPGA and expected to have it out in a month but now it's six months later and we're still working on it'."
Designers at Tekmicro solve the problem with the PowerRACE-3, a high-performance I/O processor equipped with both FPGA and PowerPC processors. It performs FFTs, pulse compression, and image processing 10 to 15 times faster than straight PowerPCs. And its FPGA developer's kit makes it easy to program, especially combined with the company's "tagged and bagged" IP cores of the basic signal processing functions.