New interconnects give DSPs a whole new level of flexibility

Digital signal processing is not what it used to be, as new generations of programmable devices, backplane databuses, and switched-network fabrics give systems designers more options than ever before to create fast and accurate processors.

Th 92634

Digital signal processing is not what it used to be, as new generations of programmable devices, backplane databuses, and switched-network fabrics give systems designers more options than ever before to create fast and accurate processors.

By J.R. Wilson

The technology for digital signal processing (DSP) continues to advance dramatically on all fronts, including predictions the next significant breakthrough could lead to a major restructuring of the entire industry.

Th 92634
These single-board digital signal processors from Spectrum Signal Processing are part of a new generation of high-throughput devices.
Click here to enlarge image

"Someday we will see one chip that can process the entire channel, straight out of RF to the baseband and back, then you just scale up by adding more processors," predicts Manuel Uhm, senior manager for strategic marketing at Spectrum Signal Processing Inc. in Burnaby, British Columbia.

"Today, though, you are better off breaking up the processing chain by using those elements that are better suited to individual tasks," Uhm says. Designers today are well-advised to continue building hybrid systems of field programmable gate-arrays (FPGAs) combined with dedicated DSPs or PowerPC general-purpose processors for at least the next couple of years until the long-promised system on a chip (SOC) becomes reality, he says.

"SOC could definitely be a disruptive technology, but it isn't there yet as a feasible solution," Uhm says. "It will have a big impact on the Texas Instruments and Motorolas and Xilinxs of the world, which probably are working on their own solutions, because what we have out there today — PowerPC, FPGAs,and dedicated DSPs — could all be superceded by SOC."

Experts at the Texas Instruments (TI) DSP Group in Houston and the Motorola Networking and Computing Systems Group in Austin, Texas, say they agree to the general concept, but have a slightly different view of where system-on-a-chip technology currently stands.

Th 92637
Pentek's Model 4292 VME board uses four TMS320C6203 DSP processors.
Click here to enlarge image

"Not only does Motorola recognize SOC as the future, but we already are moving in that direction," says Raj Handa, director of business development and technology marketing for the company's PowerQUICC (quad integrated communications controller) line.

"The PowerQUICC 8540 is essentially an SOC design," he maintains, adding that company engineers designed it as an integrated processor, while the PowerPC is a stand-alone host processor. "We recognized that, moving into the future, we have to have a very flexible architecture and a platform that can rapidly comprehend additional IP. If you look at the 8540 architecture, it is platform-based, SOC, and we have a very fast switched fabric setting in the middle of it that lends itself to different kinds of IP on this platform."

Ray Simar, TI's chief DSP architect, whose team primarily looks at advanced architectures, says system-on-a-chip is the natural progression of migrating what currently is on a board onto a chip — and TI, he asserts, is there.

"We often see multiple DSPs and FPGAs and sometimes general-purpose processors in a base station application, sometimes with a PowerPC running as a general systems processor," Simar says. "It's not that one is better than another, but rather the right partitioning for an overall system design." In the next level of integration, Simar says customers will seek to migrate board-level designs onto chips by integrating several different DSPs and an application-specific integrated circuit (ASIC) on one piece of silicon.

"We think multiple DSPs on a single piece of silicon will be in great demand in the future, first in base stations, in signal processing around the antenna or transceiver, and in the transcoder, where all the voice channels are processed," Simar says. "In both those, we see multiple DSPs needing to be integrated. We really don't see it as DSP vs FPGA vs ASIC; we see engineers in general looking at overall system requirements. The stand-alone DSP won't go away, but you will see an increased use of multiple DSPs on a single piece of silicon, which will itself become a stand-alone element."

Such an evolution, he adds, will define system-on-a-chip somewhat differently from how some people currently see it: "The idea that SOC will displace everything is not as meaningful as it might appear because, if you look at the block diagrams of what is being done with integration on a chip, they look remarkably like the diagram of the board. So you need to define the system to determine the right mix of DSP and interface and I/O, but it will be done at the chip rather than board level. That has led us to do the multi-DSP design."

But with increasing demands for power, data streaming, and complex operations for signal processing, others believe the future for the dedicated chip is limited.

"The dedicated DSP is becoming more suited for more heavily embedded applications that don't need to benefit from custom programming or need to be quite as configurable," says Rodger Hosking, vice president of Pentek in Upper Saddle River, N.J.

Heat and power
In the meantime, the DSP world continues to confront how to best deal with power requirements, heat dissipation, data density and processing speed.

Th 92648
The Motorola MSC8101 DSP blends a SC140 DSP core, programmable communications processor module, and 60x bus interface.
Click here to enlarge image

One of the new things in DSP cooling involves new single-board processors that are compatible with the most advanced multiprocessor networks. "For the first time you can now get conduction-cooled DSP boards that are connected by a switched fabric," says Richard Jaenicke, marketing director for Mercury Computer Systems in Chelmsford, Mass. "It was only a few months ago that Mercury announced conduction-cooled DSP boards connected by RACE++ switched fabric. Before that, if you were trying to build a DSP system using more than a handful of processors, you couldn't do it with conduction-cooled systems, so you had to build a pressurized cabin to hold air-cooled boards. Conduction cooling enables you to put these boards in any nook and cranny or deployed in a pod."

Other solutions involve spray cooling, heat sinks, and fan-cooled chasses; some are considered more optimal than others. Part of the cooling solution may also lie with backplanes that use switched fabric, which will change some of the requirements. Nonetheless, power consumption remains a significant issue that new technology can aggravate, at least in the short-term.

"PowerPCs are great processors, but are running about 10 watts apiece and pretty soon you're out of the spec as to what a PCI or CompactPCI chassis can support," says Spectrum's Uhm. "You have similar problems with FPGAs, so you have a lot of horsepower at your disposal, but things are starting to melt." Uhm points out that chip designers built dedicated DSPs for low power consumption, yet developed the PowerPC for the Macintosh desktop computer where power consumption was not a serious issue.

Th 92649
This Mercury RACE++ Series conduction-cooled Quad G4 module contains four PowerPC 7410 microprocessors connected via a switched fabric.
Click here to enlarge image

"We are still getting calls for dedicated DSPs for military applications because, MIPS per watt, they provide the best solution available today for many uses where power consumption is critical," Uhm says. "I don't think the dedicated DSP chips will disappear anytime soon. The roadmap doesn't call for lower-power-consumption devices until mid to end 2003, so it tops out right now in terms of processing performance per watt."

TI's Simar says the effort to push more and more capability onto a single chip is greatly increasing the designer's power and heat requirements and concerns. Energy per function calculations are increasingly aggressive. As a result, TI officials have moved to protect their share of the DSP market — which currently stands at a more than 40 percent share — with new products. Those include the programmable TMS320C64x DSP, with speeds as fast as 1.1 GHz and code-compatibility with earlier chips, and the programmable TMS320C55x, touted as requiring only 15 percent the power of the most power-efficient DSP available today while delivering five times the performance.

"Another thing we've seen is a bigger need for software programmability in the field," Simar says. "So if a base station is deployed, it often includes extra performance headroom that will enable remote diagnostics and downloading new functionality. You also see that in cellphones. So programmability in DSP is a big issue for rapid deployment and to future-proof them."

Despite the growing popularity of their PowerPC line for DSP applications, Motorola executives say they also believe their own line of dedicated DSPs will remain viable for some time to come. Motorola designers built their line of dedicated line of DSPs in a joint venture with Agere Systems in Allentown, Pa., which is spinning off from Lucent Technologies.

"The StarCore digital signal processor, such as the SC140, has a communications processor module core from the PowerQUICC family and an external PowerPC bus interface," notes C.V. Shridhar, strategic marketing manager for the PowerQUICC line. "The core itself runs at 300 MHz and the power dissipation is .5251, which compares well to any competitor."

Complete systems
Another growing trend for DSP manufacturers is the demand for complete subsystems rather than just boards, a trend Jaenicke says that has become standard in Europe. There, and increasingly so in the United States, customers are demanding not only hardware and low-level software, but also an overall multiprocessor framework for their applications.

Th 92650
The Model 4294 Quad G4 PowerPC board from Pentek targets signal processing applicatons.
Click here to enlarge image

Mercury's Jaenicke conceeds that cooling is a continuing issue with the PowerPC, yet he says improvements in the current fourth-generation (G4) family of Motorola 7400 series PowerPC microprocessors (MPC7400) with AltiVec technology offer short-term solutions to that and other problems cited by Uhm.

The power consumption and heat generation of the G4.5 processor are decreasing, Jaenicke says. Nevertheless, the recently announced 7455 processor is about double the power of the original G4 — even in the version to be used for embedded applications, he points out. Still, the new processor has significant advantages.

"As part of the maturing of AltiVec as DSP, we're beginning to use it for more than just floating-point apps and working it into fixed-point applications," Jaenicke says. "The beauty there is the 128-bit AltiVec register that can hold four floating point numbers also can be used for sixteen 8-bit numbers. That makes AltiVec a very good processor for image processing on pixel data. So where AltiVec has always been aimed at the high end of DSP — floating point — it now can be effective at the lower end, as well, so it is making broader inroads into DSP."

The PowerPC G4 family offers a RISC processor that can run any type of operating system, as well as perform single instruction multiple data (SIMD) multiprocessing on digital signals, says Vincent Chuffart, single board computer product manager for Thales in Raleigh, N.C.

"For signal processing, you have to do the same operation many times on large data sets. The SIMD helps do that one operation simultaneously on multiple data at the same time," he says. "DSP dedicated chips can only do SIMD and are very specialized for things like cell phones, but for military DSP applications are not as efficient as the G4. So companies with DSP expertise are now being asked to apply that capability to bigger efforts, such as sonars, rather than continuing with dedicated DSP chips. Now they can use the same chip throughout a system. Because it is running a Unix-type OS, standardized RISC software can be used throughout instead of designing their own tools from the ground up for each platform."

Motorola's Networking and Computing Systems Group in Austin, Texas, tends to announce the next implementation of that architecture at international microprocessor forums and is expected to do so again this fall. Company leaders already have said that subsequent Motorola processors with AltiVec technology could address markets and applications where designers must balance performance with power, price, and peripheral integration.

"They have already announced 1 gigahertz versions of the G4.5," Jaenicke says, "and when you multiply the impact of AltiVec on that, with four floating point numbers at a time, that's a pretty good effective clock rate."

Altivec and RapidIO
There has been disappointment in the DSP world that the current proposed architecture for G5 (the MPC8500 series) is not AltiVec-enabled, leading many to continue banking on further improvements in the G4 line until release of the G6 — perhaps by the end of 2003. The G5 architecture does, however, call for a built-in RapidIO interface, which also is something many in the industry are anticipating; the G6 is expected to have AltiVec and RapidIO, although performance parameters are defined by the telecommunications and computer industries, not the military.

Th 92651
The Texas Instruments' graph below shows the roadmap of the company's TMS320C6x family of DSPs.
Click here to enlarge image

"The intention is for RapidIO to propagate across all product lines, but we can't get into timeframes," says Motorola's Handa. "We are making a very strong investment in the G4 architecture and will take it all the way through the next process revisions to provide higher and higher performance levels. I can't comment about what comes beyond that, except to say we will draw out from the G4 architecture, while maintaining software and hardware compatibility, over the next 12 to 36 months. So those who are designing us in at up to 1GHz, can expect much more of the same while maintaining pin and software compatibility. As we map the G4 family into the next generation, for example, there will be dramatic decreases in heat at equivalent megahertz."

Systems designers consider RapidIO and Infiniband as complementary as well as competitive as new entrants into the switched serial interface standards for future systems. RapidIO, for example, was designed for processor-to-memory transactions. It has very low latency, without all of the protocol specifications necessary for system-to-system communications provided by Infiniband. Thus, developers are looking to parallel RapidIO for chip-to-chip communications, serial RapidIO to move data from board-to-board over the backplane, and Infiniband for system-to-system communications.

"The parallel RapidIO standard is set and serial RapidIO is very close," Uhm says. "We are able to offer increasingly dense solutions, so you can do a lot more in a chassis than ever before, which will reduce the requirement for data to move from chassis to chassis. For the foreseeable future, chassis-to-chassis will remain a requirement, but with serial and parallel RapidIO, you have a lot of efficiencies from using the same protocol stack as opposed to trying to interface to a completely different protocol stack for Infiniband."

That is especially true for military DSP applications, where systems often contain many boards, and rarely contain only one board.

"In that case, it makes more sense to connect RapidIO across the backplane between boards because the types of transactions being done between DSP chips on different boards are the same as those being done between chips on the same board," Jaenicke says. "The collection of boards is acting as one big system. So it really comes down to whether a given set of boards is acting as a single system — which means connecting with RapidIO — or is each board acting as a separate system, in which case you would use Infiniband to connect them."

As designers compact more and more system functions onto single boards — or even single chips — these distinctions begin to blur.

Sonar systems, for example, offer an insight into the evolution of DSP applications, explains Thales's Chuffart.

"Initially, sonar was made of a mix of RISC PowerPC boards and DSP. Now they can run RISC boards everywhere, using AltiVec," he says. "Big radar and sonar shops now do not have to go to a black box approach — which is not exactly standard — but can now choose COTS elements to run their machines, using PowerPC tools, libraries, OS and sometimes middleware."

Radar and sonar applications typically separate into three processes, Chuffart explains. "You digitize the signal, with a massive amount of data flowing into the machine at high rates," he says. "The first element that takes care of that data flow is systematic signal processing. On the other end of the machine you have the user of the result, such as a radar image. That involves data processing, which for years has been using a standard RISC chip to output the results to a network.

"In the middle, you have what some people call heterogeneous signal processing, which involves signal processing computations that are dependent on what the data contains — the number of aircraft being tracked, for example," Chuffart continues. "That traditionally used only DSP chips and boards. Now the FPGA is relevant for the first part, because they are not intelligent, but are very fast and efficient. But by using PowerPC AltiVec, you can get rid of the middle part and do both signal and data processing together. That is key, because it helps resolve bottlenecks that usually are not discovered until you have the system fully operational with a full input of data."

Thales engineers have spent the past year creating a core machine approach to deal with this new paradigm, Chuffart says. Thales experts have tested the various parts together and delivered a pre-integrated, standards-based machine to which the customer can add his own parts, without extensive customization, company officials say. However, it also means a change of mindset in the customer community.

Software issues
"Our customers must integrate how they design the application software, which not all of them are ready to do because of the initial investment required, not to mention changing from the way things have always been done. That's not easy," Chuffart notes. "You can almost predict from blueprints how a dedicated DSP will behave. But now you are using chips designed originally for desktop computing, with L1 and L2 caches. Those are less predictable and that means relinquishing some predictive engineering control and going more to trial and error. But it is possible because they can do that research using commodity parts. It's a big change in culture for all of these people. And all because of a single chip — the PowerPC AltiVec."

As a result of its growing importance to signal processing, experts are developing a lot of third-party software to support the PowerPC. That includes support from Wind River for the full VxWorks realtime OS and by library developers, such as VSIPL (vector signal image processing library), a collection of digital signal processing algorithms defined by a consortium of industry, government and academia.

Hybrid approaches to DSP are becoming far more common, especially with the PowerPC taking on more of the heavy duties. Those include systems with FPGAs and ASICs, but new technologies also are on the horizon for future integration.

"We (Pentek) have incorporated FPGAs into all of our new products introduced in the past year, using the (Xilinx) Virtex II FPGA most recently," Hosking says. "The most effective place we've placed that is between the front end and the general-purpose processor, such as the PowerPC. This allows us to take a signal from an ASIC at the front end, for example, and run it through the FPGA to do decoding or real-time FFT (fast Fourier transform) for SIGINT (signals intelligence), which allows identification of the frequency of an unknown radio signal you want to intercept.

"The inclusion of FPGAs in the system components we build, between the front end I/O data converters or software radio components and the processors, has given us the ability to do intensive, focused signal processing operations more efficiently than burdening the general-purpose processor with that operation," Hosking says. "It's a division of labor issue and the FPGAs are more well-suited to some of these very intensive signal processing tasks that otherwise would consume huge chunks of processing capability of a general-purpose processor."

Backplane evolution
Drawing sufficient power through the VME or PCI bus in an embedded application to run these new, more powerful components has been one of the challenges of the new designs. Spectrum, whose engineers prefer CompactPCI, is using switched-fabric solutions.

"We believe CompactPCI is still the superior form factor, especially when you have to consider such things as hot-swap," Uhm says. "In the switched fabric domain you have PICMG 2.16 (PCI Industrial Computer Manufacturers Group), which provides a standard interface for ethernet, which is well understood, with a switched fabric. We have serial RapidIO running over a CompactPCI backplane today and a lot of switched fabrics are just not designed to run over VME yet.

"The goal is open standards, but the reality is we aren't there today and there is still a huge legacy base of VME that will have to be supported for the foreseeable future," Uhm says. "It's complicated, but it will get better. Once the switched-fabric war settles down — probably in the next year or two — it will reduce a lot of the confusion. That will mean going from some 60 standards to perhaps 3. Then you will see solutions that are more integrated, although that is probably at least three years away."

Th 92652
The Motorola MPC7455 combines high performance with power efficiency for a wide range of host processor applications.
Click here to enlarge image

Jaenicke argues that legacy VME makes switching to CompactPCI more difficult because it changes the backplane.

"A better solution is a new proposal called VXS, which is the VME switched serial specification," he says. "That proposal has been drafted by Motorola Computer Group, Mercury, and a few other companies to principally add a switched serial interconnect to the VME spec. With the combination of a switched serial backplane, VME has more power per slot and therefore a bright future, which is good news to military system designers who depend on VME for rugged systems deployment as well as backward compatibility to legacy systems."

Uhm agrees, insofar as military applications are concerned, but notes the central fact of all modern technology development — the military is a marginal factor: "The biggest advantage VXS has is addressing the legacy needs of customers, which is significant in the military space. But we see a lot of solutions that make more sense if legacy is not an issue," Uhm says.

Most of these solutions, he says, are based on CompactPCI today and other standards tomorrow, such as PICMG 3.0 advanced telecommunications architecture (Ad-vancedTCA), which is a whole new standard that leverages off CompactPCI but addresses almost all the major issues of the VME/CompactPCI bus. This approach, he says, will accommodate larger board sizes, which gives the designer more area to work with when building a system. It also provides greater space between the slots (more airflow for cooling), and allows the system to draw more power through the backplane.

Jaenicke notes VXS will work with any of the new switched serial standards, including RapidIO and Infiniband, another plus given the questions over which to use as connected clusters of servers evolve from stand-alone boxes into rack-mountable, two 1U-high boxes, with several in a single rack (1U = 1.75").

"The next step in this trend is for each of these systems to turn into a blade, a vertically mounted system, with several of them next to each other in a single box," he says. "So instead of being 19 inches wide and very thin, you would have multiple servers in a box, vertically mounted, like boards in a chassis, with each board having system-level functionality. That configuration could be 3U high and 1.2 inches wide, with multiples of that then fit into a 19-inch wide chassis. In that configuration, each blade would still be a system and the communications between systems would be across the backplane, using Infiniband."

More in Computers