Systems integrators still grapple with issues that call for dedicated DSPs, high-end general-purpose processors, or field-programmable gate arrays in today’s signal- and data-intensive computing applications.
By Ben Ames
Hardwired digital signal processors (DSPs) are still the cheapest way to handle small computing loads, but they lack the power to process loads of data on today’s digital battlefield.
A proliferation of sensors, networks, and portable computers has boosted the importance of digital signal processing. Although many designers have turned to field-programmable gate arrays (FPGAs), these devices are not the perfect solution; they can take up extra size, weight, and power. These considerations are crucial because engineers are constantly shrinking the size of military platforms, from handheld software defined radios to unmanned vehicles.
When it comes to building an entire electronic system, efficiency is just as important as pure power for today’s designers of DSP systems. Transformational programs like the joint tactical radio system and the future combat systems need computing with small size and high performance. This demand for size, weight, and power-called SWAP-creates a new balancing act for designers of DSP technology.
More to DSP than pure speed
Military customers of DSP boards always want the same thing. “They are pragmatic; they want to get the most done within their cost budget, power budget, or weight budget. You hit one first, always,” says Ian Stalker, product manager for advanced multicomputing with Curtiss-Wright Controls Embedded Computing in Kanata, Ontario.
That means speed is not everything. There are three legs to the stool of DSP performance, he says: raw numerical floating-point performance, the processor’s bandwidth into memory, and processor-to-processor speed.
As the speed of processors rises, communication speed between boards must also rise. To meet that constant demand, Curtiss-Wright engineers are working on products in the VITA 46 standard, adopting it for single-board computers and especially for multiprocessor signal-processing boards.
“The standard is truly high-speed switched serial technology. We’ve been doing that with StarFabric for a number of years, and that remains an excellent technology for signal processing,” Stalker says. “In two or three years, it will be eclipsed for new designs by the new fabrics, such as Serial Rapid IO, PCI Express, or Advanced Switching.”
Either way, it will be a large degree of change. StarFabric runs at 600 megabits per second-good enough for the VME 64x standard-but Serial Rapid IO runs at 3.125 gigabits per second.
“With a VME card we can do a gigabyte per second between cards. VITA 46 will open that up to 10 gigabytes per second between cards,” Stalker says. “We will build multiprocessor boards with Serial Rapid IO fabric and a Freescale 8641 dual-core processor. Those products are about a year away.”
Using those PowerPC processors, the company makes the CHAMP-AV IV, a VME processor board with four 1.5-GHz, model 7448 processors running over a StarFabric interconnect. Its newest product is the Manta QX3, a quad processor card that also uses StarFabric as its board-to-board interconnect, uniting four model 7457 PowerPCs.
The Manta uses an architecture called symmetrical multiprocessing (SMP), which connects all the processors on a board to one shared memory block. In contrast, a CHAMP board assigns each processor its own memory bank, a design called independent node architecture. The total amount of memory is about the same-up to 2 gigabytes-but the products have different strengths.
Curtiss-Wright engineers also are designing signal-processing boards with FPGAs. The company’s CHAMP-FX board uses FPGAs from Xilinx Inc. in San Jose, Calif., and Curtiss-Wright plans to stick with that brand, he says.
“The FPGA is a great answer for customers with concerns about size, weight, and power; the classic example is a fast jet radar. It is worth the additional effort to do the computing in logic and replace a lot of PowerPCs. That technology will grow at a faster rate than conventional processors,” Stalker says.
Digital signal processing without a DSP
The growth of FPGAs does not mean they will swamp the marketplace. Customers still want choice. “We see a strengthening of the same trends from a year ago-people want to do their DSP chores on a variety of platforms,” says Rodger Hosking, vice president at Pentek Inc. in Upper Saddle River, N.J.
The most popular thrust is FPGAs, but people also use nondigital signal processors like PowerPCs. The Altivec processor is not built specifically for real-time DSP applications, but designers often use the processor in those kinds of applications anyway.
“DSP is not necessarily a device anymore, but a method or a technology being implemented on many different types of devices,” Hosking says. “We used to talk about a chip, but now we’re talking about operations. Some customers are even doing it on Pentiums. They’re getting fast enough now that if you can get away with the rate. It could be a lot cheaper.”
FPGAs are a common choice for many platforms, but they are famously complex to program. Pentek engineers, as a result, try to ease the process by offering a hardware platform and software tools to support customer-developed IP (intellectual property) cores and custom algorithms.
The company has added a line of FPGA processing platforms to its family of high-speed A-D converters. The models 6821 and 6822 are single- and dual-channel, 12 bit, 215 Megahertz VME boards with two Xilinx Virtex-II Pro FPGAs, each with 6 million gates of configurable logic. They use switched serial fabric interfaces in the VXS (VITA 41) standard.
“We’re processing data ASAP upfront to avoid shipping data across the backplane, with the DSP located immediately behind the A-D converters at 100 GHz bandwidth,” Hosking says.
The company’s model 6826 is similar-a 2 Gigahertz, dual-channel A-D board. “It’s the same concept but 10 times faster. If you’re dealing with data at that rate, you don’t want to send it through too many interfaces,” he says.
In response, Pentek designers include large memory banks on each board to capture all new data within 10 percent of the allotted time. This enables the processors to run for 90 percent of the time budget. This “elastic buffer” is a good match for jobs like post-acquisition processing of radar pulses.
For work with software-defined radio, Pentek designed the model 7140 PMC module, using a Virtex-II Pro unit as a software radio transceiver, sandwiched between two 105 MHz, 14 bit, A-D converters for input, and two 500-MHz, 16 bit D-A converters for output. To move data on and off the mezzanine card, the board uses the XMC (VITA 42) switched serial fabric interface, in addition to a standard PCI bus.
FPGAs are not the best solution to every problem. Telecommunications customers favor a classic digital signal processor like Texas Instruments’ C64 family, because its engines are tailored to specific jobs, such as Viterbi and Turbo decoding. That means they do not have to spend cycles on the main processor doing signal processing. Pentek designers are planning to include Texas Instruments processors on some future boards, Hosking says.
“Customers won’t have to spend dollars or development time putting the task in an FPGA, because it’s already in an ASIC [application-specific integrated circuit] that is inherently lower power, cheaper cost, and takes up less space. FPGAs are wonderful things. They can do almost anything, but you will pay the price,” Hosking says. “You’ll find there are a lot more C programmers out there than true FPGA gurus. So if you can do it in an ASIC, you should. But if the algorithm is too complex or the data rate is overwhelming, then use an FPGA. Throw parallel hardware at the algorithm and get it done.”
If the designer’s real-time budget is less than 10 microseconds, an FPGA is often the only way to do it, he says. If 10 microprocessors are used in parallel, much more memory, power, interface, and other peripherals would be needed. The numbers are clear: most general-purpose processors have four hardware multipliers available, while the Virtex-II Pro 50 has 232 hardware multipliers, and they work in parallel, not sequential order.
Board vendors try to save size
Choosing a DSP platform is also about size. Many COTS board vendors are looking for business with military transformation projects. But platforms like the joint tactical radio system (JTRS) and future combat systems (FCS) present a demanding technology challenge-they have a low size, weight, and power (SWAP) budget, yet they require high-performance signal processing.
FPGAs are the only answer, says Manuel Uhm, DSP marketing manager at Xilinx. One FPGA does consume more power than an ASIC, but FPGAs are so powerful that a single unit can replace a collection of processors.
“Those programs require significant digital signal processing-much more than a dedicated DSP or general-purpose processor can provide. I can replace two racks of PowerPCs with one board of FPGA,” he says. “If you can do it on a single DSP, you’re going to do that. But JTRS and FCS need higher performance.”
In the future, engineers will handle these challenges with computer-on-module design. “People usually think of this solution with general-purpose processors like a Pentium or PowerPC, but system-on-chip with FPGA is already there,” Uhm says.
Since FPGAs are not efficient at handling basic control functions, system-on-chip (SOC) devices also include an embedded general-purpose processor.
For example, on a JTRS platform, the software communication architecture (SCA) framework runs on top of CORBA (the common object request broker architecture) and a real-time operating system. If an engineer can integrate all those things on a single device, he will save space, heat management, power, and cost.
JTRS Cluster 1 devices use an IBM 440 as the embedded general-purpose processor, while the Xilinx Virtex-IV uses the less powerful, more efficient IBM 405, Uhm says. Another popular fixed-point processor choice is the Intel XScale processor, while the FreeScale PowerQuick line-the 8540 or 84x-is popular for floating point jobs.
This approach saves power and footprint for a military market that is always looking to save size-from manpack JTRS Cluster 5 radios to unattended ground sensors and missile nosecones.
A leader among COTS board vendors making system-on-chip solutions is ISR Technologies in Montreal, Uhm says. That company produces an SOC software-defined radio modem called the ACM-1000.
Especially for radios, saving power is more than just a way to dissipate heat-power management can also lead to better endurance. “You just can’t have radios out there that die after an hour in the field,” he says. That is one reason Xilinx’s new Virtex-IV runs with 50 percent less power than the Virtex-II.
Power-efficient processing
Military designers are creating increasing numbers of platforms for portable use, in everything from unmanned aerial vehicles to man-portable, battery-powered devices. That means board vendors are struggling to control size, weight, and power-called SWAP in the business-as opposed to blindly boosting processing power.
“Power creates cooling requirements: as we’re packing more and more into deployment platforms, you have to ask how much heat can you get off the platform? How can you manage the batteries? So we’re trying to stay within customers’ power budget,” says Lee Pucker, chief technology officer at Spectrum Signal Processing, Burnaby, British Columbia.
Two customers recently moved from the Virtex-II to Virtex-IV model FPGA, solely to save power. Another way to perform efficient computing is with a dedicated digital signal processor such as Texas Instruments’ C64. That ASIC has great signal-processing horsepower per watt, compared to a general-purpose processor, he says.
“Customers don’t always need the latest and greatest processor technology. A lot of the latest processors coming out are power hogs, and people are aware of it. So customers are choosing a mix of digital signal processors and general-purpose processors, and mapping their applications onto that.”
Military electronics designers also are in a hurry. They frequently demand more integrated products, to save time compared to sourcing a collection of board-level components, he says.
That makes the company’s SDR-3000 MRDP popular. Customers use the MILCOM Rapid-Prototype Development Platform to speed development with “platform-aware software tools.” The product provides a software-defined radio platform with a commercial-off-the-shelf (COTS) “RF to Ethernet” radio platform for military communications.
Such integrated solutions are crucial in this era of changing standards. “There is a plethora of new standards for moving data, and customers are a bit scared. There is no favorite to be winner, whether it’s PCI Express, Rapid IO, or Infiniband. As long as standards are in flux, there’s a potential you could choose a technology that will leave you stranded. So we enable the customer to follow an evolutionary path, and avoid a forklift upgrade.”
One Spectrum customer recently chose Gigabit Ethernet instead of the VME or PCI bus architectures, simply because Gigabit Ethernet can communicate with nearly any other standard.
Video processing demands FPGAs
Some of the most challenging signal-processing applications are software-defined radio and video compression, says Brian Tithecott, director of sales and marketing for FPGA computing products at SBS Technologies in Waterloo, Ontario.
Cameras on unmanned vehicles and battlefield sensors create huge amounts of data. Onboard computers then combine that video with graphical images to overlay it with metadata such as GPS, time, and targeting.
“They must do this in hardware because they need it in real-time,” he says. “We use PMCs and FPGAs for data acquisition from a camera or imaging sensor, condition and compress the signal, and then move it onto the vehicle’s network to share it either on or off the vehicle.”
The programming flexibility of FPGAs is also a strength for those applications, because cameras have so many unique qualities, from frame rates to signal formats.
FPGAs are a better choice than general-purpose processors or dedicated DSPs for customers worried about size, weight, and power, he says. Some high-end commercial ASICs could handle the data flow in a laboratory, but they are not rugged enough for battlefield applications.
Many military designers today choose FPGAs from Xilinx, but SBS engineers use components from Altera Corp. in San Jose, Calif. That company’s Stratix II product is built on 90-nanometer geometry, has 9 megabits of embedded memory, and typically runs at 250 MHz.
Other companies using Altera FPGAs for military applications include BittWare in Concord, N.H., and Penguin Telecom in Washington, which uses Altera hardware in its Joint Tactical Radio System (JTRS) Cluster 5 product.
Looking for efficiency
“We’ve always been a purely Analog Devices house, because TigerSharc offers fast floating-point performance with low power draw,” says Darren Taylor, BittWare’s vice president of sales and marketing.
That is a sign of its history; TigerSharc was first developed for the wireless infrastructure market, such as 3G base stations and telecommunications. BittWare is still following the Analog Devices roadmap, and planning to use its efficient TigerSharc model 202 and 203 processors for multiprocessor military applications.
“The TigerSharc alone is slower than a PowerPC, but it uses less wattage, especially when they’re combined together on a big board. You could use eight TigerSharcs in the place of two PowerPCs,” he says.
For applications in video sensors and battlefield cameras, BittWare engineers will use the ADI BlackPoint processor to build a board called BlackFin, says Jeff Milrod, BittWare’s president and chief executive officer. “BlackPoint is like TigerSharc’s little brother; it’s good at video compression so it will compete with Texas Instruments for military applications.”
For the very fastest digital signal-processing jobs, the company builds boards with Altera FPGAs. “I don’t think things will go 100 percent FPGA or 100 percent DSP. For floating-point operations, FPGA doesn’t make sense, but TigerSharc is very compelling,” Milrod says.
Still, customers have many choices. Options like the PowerPC have a larger “ecosystem” of code libraries than the TigerSharc. So last fall, the company made its first acquisition. BittWare bought EZ-DSP of Belfast, Ireland, to take advantage of its TigerSharc libraries.
Now company engineers are taking another step to make it easier for military designers to choose Analog Devices processors; it will soon launch a new real-time operating system based on ADI’s real-time kernel called VDT.
Another sign of shifting customer demand is form factor. BittWare has always made PCI and Compact PCI boards, but soon will move into VME. Next, it will add a new ruggedized line. In fact, it already has a customer-a high-speed French railway will use a ruggedized BittWare board for braking.
For future platforms, BittWare designers are experimenting with MicroTCA, a new design that would use the Advanced TCA Mezzanine Card (AMC) and Serial Rapid IO to create a small box for military applications. Just as the 3U VME shape is popular for airborne applications, MicroTCA could find a home with space-limited, high-performance tasks. The standard is not yet ratified by IEEE.
Board vendors fighting physics
Military planners use a baseline for embedded DSP performance called the “DARPA performance continuum.” The scale measures the ability of high-performance multicomputing DSP systems to fulfill changing demands over the lifetime of a given platform, such as Joint STARS or Rivet Joint surveillance aircraft.
“That measure has been stuck in the mud for the last three to five years; you ask suppliers what they expect from the next generation and it used to be a 2× to 4× improvement, but is now 1.6× to 1.7×,” says Ed Hennessy, vice president of North American operations for Nallatech in Nashua, N.H.
The problem is that board vendors are constrained by the laws of physics. “I went to HPEC last year, and we discussed how to get back on track with Moore’s Law. Cooling, packaging, and power consumption have all become really tough challenges,” he says, referring to the High Performance Embedded Computing conference.
Engineers in the COTS embedded market simply don’t have enough resources to solve those problems, so they will have to look to the larger commercial market for answers, he says.
For highest performance, military engineers could soon adopt the commercial reliance on blade-server-based solutions. They will also look to that market for a packaging solution, to ease the complex move from air-cooled to liquid-cooled boards.
Technology pushes processor speed
Digital signal processors will keep getting denser and faster. Texas Instruments recently broke ground on a 300-millimeter- wafer fabrication plant in Richardson, Texas, that will enable its engineers to push below 90-nanometer process geometries, says Wallace Scott, strategic marketing manager for military DSPs at Texas Instruments, Dallas, Texas.
Designing at that tiny scale will let them pack more wires and gates on each microchip, keeping up with Moore’s Law of escalating processor power.
Military designers view this progress with a mix of admiration and suspicion. They are constantly asking for faster computers, and yet they question the reliability of nanometer-scale integrated circuits.
Those thinner oxides and smaller geometries can lead to faster device wear-out, radiation sensitivity, and electro-migration issues, Scott says. That is especially true for long-life applications in space and for high-altitude avionics. For example, Air Force officials working on the Space-Based Radar say that instrument could remain in orbit for 10 to 14 years, which is far beyond the usual lifetime for radiation-sensitive electronics, he says.
In the meantime, Texas Instruments engineers continue to improve the current generation of processors built on the 90-nanometer scale. The latest fixed-point DSP from Texas Instruments is the C6455, a 16-bit processor now sampling at speeds of 720 and 850 MHz and 1 GHz. To speed up decoding operations, it has two high-performance embedded coprocessors, a Viterbi and a Turbo decoder. Using a Serial Rapid IO interface, it will perform best in military applications with multiple processors, such as radar and imaging, as well as commercial tasks like telecom infrastructures, imaging/medical, and video conferencing.
For operations that demand a floating- point DSP, Texas Instruments has released the C672x series, designed for applications such as radar, sonar, and image processing. The TMS320C672x is the next generation of the company’s C67x family of high-performance 32-/64-bit processors.
In August 2004, the company released the DMV320 C6701, which it calls the fastest QML Class V (space qualified) floating-point DSP on the market. Optimized for applications on satellites, interplanetary probes, and satellite relay stations, the unit is rad-tolerant and runs at 140 MHz. Customers say its power consumption is low per computational horsepower, though details depend on the specific peripherals, Scott says.
Design tools ease engineering
Military electronics designers want it all-they ask board makers for high-end compute power at low cost and low heat dissipation. They need those processors for sensor-laden platforms on the network-centric battlefield-especially unmanned aerial vehicles-says Stuart Heptonstall, DSP product marketing manager at Radstone Technology in Towcester, England.
At the same time, they are frustrated by the increasing complexity of digital signal-processing systems. So they ask board vendors to build more fully integrated products, assembling more complete computers instead of delivering components.
To meet that need, Radstone has launched Axis, advanced multiprocessing integrated software that helps customers build better DSP applications. It is produced at a new facility in Billerica, Mass.
The software suite includes two choices for signal image-processing libraries. The Vector library is open-standard, easing code portability between platforms. And the Radstone library is proprietary, optimized for quicker execution times.
“We used to rely on third-party libraries, but found that we couldn’t modify the code to optimize it for algorithms,” Heptonstall says.
Future releases in the product line will include Axis Flow, an interprocessor communications tool that will use StarFabric as a medium between boards and between processors. And Axis GUI, a graphical user interface to tie the whole package together.
Radstone designers also created StarSwitch, a 6U VME-format, rugged, StarFabric switch board. Intended for large radar applications, multiple antennas, and beam forming, it combines four StarFabric ports from the backplane, tying together four of Radstone’s standard G4DSP-XE boards. Users can combine multiple boards in powerful, flexible configurations including any-board-to-any-board topologies. In addition, future Radstone boards will include programmable logic.
The FreeScale PowerPC is also a popular option for digital signal processing, despite the recent move by Apple Computer, Cupertino, Calif., to begin using Intel processors in Macintosh computers.