Liquid cooling enters the mainstream

Oct. 16, 2019
Until recently, the use of moving liquid to remove excess heat from embedded computing processors was considered an expensive and unreliable luxury, but escalating demand and maturing technologies finally are putting this technology within reach.

Thermal management technology for high-performance embedded computing — or the need to cool or sometimes heat components to keep them within their stated performance parameters — can be slow moving and sometimes a design afterthought. There are exceptions, however, and over the past year those changes have involved liquid cooling.

Removing excess heat from electronics components like central processing units (CPUs), data converters, general-purpose graphics processing units (GPGPUs), and some optical interconnects today is falling into the realm of liquid cooling, where only a few years ago this approach widely was considered exotic, expensive, risky, and out of bounds for most applications.

Today it’s different. “Liquid cooling is more accessible today,” says Jim Shaw, executive vice president of engineering at rugged computing specialist Crystal Group Inc. in Hiawatha, Iowa. “The whole industry is gaining much more experience in liquid cooling; it’s not just the F-35. It’s becoming more widely used, and we are getting better at it.”

The military embedded computing industry today is seeing explosive growth in liquid cooling — particularly the approach of getting liquid into the cold plate itself with quick disconnects, says Shaun McQuaid, director of product manager at Mercury Systems in Andover, Mass. “It is so much more efficient to cool with liquid — if you have liquid available,” McQuaid says.

Despite the growing popularity, affordability, and reliability of liquid cooling, however, many other thermal management techniques are available to remove the ever-growing amount of heat generated in high-performance embedded computing systems. These techniques range from traditional conduction and convection cooling, to hybrid approaches that blend conduction and convection cooling, new heat-transfer materials, custom approaches, and good old-fashioned engineering to make the most of heat transfer in advanced computing architectures.

The heat problem

Despite the relatively mundane nature of electronics thermal management when compared to the latest powerful processors, heat represents an issue that no systems designer can ignore — and the problem grows by the day.

“Heat never sleeps. It’s harder to deal with this year than it was last year,” says Chris Ciufo, chief technology officer at General Micro Systems in Rancho Cucamonga, Calif. “The military wants to apply more processing at the tip of the spear, which means operating in harsher environments that we didn’t do even three years ago.”

As examples, Ciufo cites land vehicles that collect data by driving behind enemy lines, then downloading the data, and later deploying to deal with the problem. “Today there’s more processing in that vehicle,” Ciufo says. “It may link up with the Continental U.S., or with operators in other vehicles looking at actionable intelligence and deciding what to do with it. There are more GPGPUs, more rackmount servers, and more processing required on these sensor platforms. These trends in basic semiconductors are making heat much more difficult to deal with.”

To deal with customer demands for increased processing power on deployed military systems, embedded systems are moving to 200 Watts and more per 6U slot, and perhaps half that per 3U slot, says Ivan Straznicky, chief technology officer of advanced packaging at the Curtiss-Wright Corp. Defense Solutions segment in Ashburn, Va. “There are even chips out there dissipating in the 100-to 200-Watt range.”

The problem is growing at such a pace that within one or two years current embedded computing thermal management approaches could become inadequate to handle the heat from high-performance GPGPUs. “By then we probably will run out of headroom with the ANSI/VITA 48 and conduction-cooling standards,” Straznicky warns.

The cooling demands of high-performance CPUs are not far behind. “The last two or three years have been extraordinary in the amount of power the Intel Xeon processors can dissipate,” says Crystal Group’s Shaw. “Customers are very interested in having that core speed and that number of cores for parallel processing in mission-critical systems. We’re talking about 250-Watt CPUs in a relatively small space.”

Also driving demands for powerful processing components in embedded computing are interest in artificial intelligence and machine vision, which use many NVidia GPGPUs. “Combine that with the CPU challenge, and all of a sudden you hare in a whole new category in thermal management challenges, while trying to keep system size and weight under control,” Shaw says.

“We are not at a crisis point yet, but there is continued appetite for high-end GPGPUs, where the chip itself will dissipate 200 Watts,” says Mercury’s McQuaid. “Combining GPGPUs and CPUs on a board, and increasing the speed of Ethernet, and now you have to think about cooling those switches themselves. In the transition from optics, the transceivers require more cooling, which will push the cooling envelope.”

Liquid to the rescue

Fortunately, increasing pressures for thermal management in embedded computing are coming at the same time as increasing accessibility of liquid cooling, which removes heat either by flowing liquid near hot components, or channeling liquid through system cold plates to quicken the removal of heat.

Liquid cooling not only is a solid choice for new systems designs, but also for upgrades in which hot processors force a migration away from tradition conduction or convection cooling. “The move to liquid is being driven by the need for additional capability in the same footprint that originally had conduction-cooled modules,” says Mercury’s McQuaid. “It’s also driven by the maturing of the liquid cooling infrastructure; today, for example, we have quick disconnects that prove there will not be leaks. With the ANSI/VITA 48.4 standard, we have seen a dramatic increase in the number of systems that are using that technology directly at the module level.”

The ANSI/VITA 48.4 Liquid Flow Through (LFT) standard of the VITA Open Standards, Open Markets embedded computing trade association in Oklahoma City, defines the basic dimensions, heat exchanger, mechanical assembly, and chassis interface for LFT cooling for 6U VPX plug-in modules. An ANSI/VITA 48.4 design involves an integrated liquid-to-air heat exchanger that provides the needed fluid for cooling via quick-disconnect coupling assemblies. The system’s plug-in modules use liquid that flows through a core heat exchanger located within the heatsink of the module to cool the electronic components on the circuit boards. The quick disconnects provide the chassis coolant inlet and outlet.

Earlier this year, embedded computing chassis and enclosures specialist Elma Electronic Inc. in Fremont, Calif., added ANSI/VITA 48.4 liquid flow through cooling capabilities to company’s line of OpenVPX embedded computing development platforms.

“As new industry standards are developed, our customers are looking for products that will help them put these standards into practice,” said Ram Rajan, senior vice president of engineering at Elma in a prepared statement. “The new LFT development platform is intended for proof of concept testing as well as development tasks, giving designers the ability to test and incorporate this new liquid cooling methodology successfully.”

Elma’s liquid-cooled chassis has a built-in heat exchanger that uses either water or a high-flow-rate water/glycol mixture to provide cooling capacity of 500 Watts, and includes a flow indicator shows whether liquid is circulating.

A separate industry trend — growing acceptance of 3-D printing — also is adding momentum to the popularity of liquid cooling. “If you have a module with liquid cooling, you have a path for the liquid to flow through the module — a tube through the middle,” says Mercury’s McQuaid. In the past that had to be drilled or built in multiple pieces. With additive manufacturing, you can 3-D print that, and it’s a lot more cost effective, quicker, and more efficient.”

Crystal Group is offering the FORCE line of rugged rackmount servers with a self-contained thermal management option with a pump, heat radiation, and liquid reservoir that pulls liquid over CPUs and over heat plates. This is only one example of the company’s growing offerings in liquid cooling. “I think Crystal Group is now an expert in liquid cooling for embedded cooling challenges,” Shaw says. “Now we are rock-solid.

Hybrid cooling approaches

Just five years ago the embedded computing landscape was dominated by two kinds of thermal management: conduction cooling, which conducts heat away from components with cold plates, heat pipes, and wedge locks; and convection cooling, which cools with fan-blown air and heat sinks.

Conduction cooling was considered the more rugged approach, as it involved no moving parts and could be ruggedized
against the effects of shock and vibration. Convection cooling, on the other hand, was relative inexpensive, but introduced the risks of failing fans and particulate contamination from blown air. Each was sufficient for removing high heat levels of the day.

Contemporary heat levels from modern CPUs, GPGPUs, and other components, however, have forced the conduction and convection camps to join forces. The result involves three open-systems thermal management standards: ANSI/VITA 48.5 Air Flow Through (AFT) cooling; and the ANSI/VITA 48.7 and 48.8 Air Flow By cooling for VPX systems.

These two designs combine the air-tight protection of conduction cooling, and the efficient and flexible cooling of convection cooling. AFT cooling was pioneered by Curtiss-Wright Defense Solutions and Northrop Grumman Corp., while Air Flow-By cooling started at Mercury Systems.

AFT offers cooling capacity of as much as 200 Watts per card slot to support high-power embedded computing applications like sensor processing; it’s environmentally sealed to accommodate harsh military operating conditions. AFT passes air through the chassis heat frame, preventing the ambient air from contacting the electronics, but decreasing the thermal path to the cooling air dramatically, Curtiss-Wright officials say.

A gasket mounted inside the chassis seals the card’s internal air passage to the chassis side walls, and shields the internal electronics from the blown air. Each card has an isolated thermal path, rather than sharing cooling air among several cards.

Air Flow-By cooling, meanwhile, cools both sides of each module to balance cooling performance. It encapsulates circuit boards in heat-exchanger shells that cool both sides of the board by flowing air across each side. The heat exchanger shell protects against airborne contaminants, electromagnetic interference (EMI), electrostatic discharge (ESD), and provides an extra layer of physical security.

Air Flow-By maintains the card’s standard 1-inch pitch, and offers a 25-percent reduction in processor temperature for dual Intel Xeon processors; a 33-percent increase in processor frequency at that reduced temperature; five times increase in mean times between failures (MTBF); and a 25-percent reduction in weight of the processor module, according to Mercury.

Last August Mercury Systems officials started offering full ANSI/VITA 48.7-ratified Air Flow-By cooling technology design packages through VITA for the efficient cooling of high-performance embedded computing modules that dissipate in excess of 200 Watts of heat.

To aid other companies in the design and conversion of modules to Air Flow-By technology, Mercury is making available detailed design packages on the VITA website giving members access to this technology.

Mercury’s McQuaid points out that there is room in the embedded computing business for ANSI/VITA 48.5 and ANSI/VITA 48.7 designs, which he says are not competing standards. “The key thing industry needs to know is there is a place for both of them,” McQuaid says. “In the 3U world, Air Flow Through might be more efficient, and in 6U the Air Flow By might be more efficient.”

In the future, the Air Flow Through and Air Flow By standards may prove insufficient to handle new generations of hot-running high-performance computing. “In the 3U world we feel the pain of not being able to bring out enough I/O,” McQuaid explains. “I anticipate a combination of thermal requirements that drives us to addition I/O on the backplane connectors, and we’ll have to revisit the standards while still preserving the VITA ecosystem.”

Materials and architectures

Aside from direct industry-standard electronics cooling, thermal management also can involve new engineering techniques. “Sometimes we may have to adapt other factors like more air flow, widening the pitch between cards, and using more cooling fins,” McQuaid says.

General Micro Systems uses a proprietary thermal-management technology called RuggedCool for high-performance embedded computing applications to cool 300-Watt Intel Xeon processors, which relies on liquid silver not only to move heat away from the processor, but also to cushion the processor from the effects of shock and vibration.

Essentially the RuggedCool approach uses one surface of copper, one surface of aluminum, and sandwiched in-between is a layer of silver. Using materials like liquid silver makes it clear that the RuggedCool technology is expensive, but is intended for applications for which nothing else will suffice.

Extreme cold operations

The notion of electronics thermal management typically involves how to cool hot components, but what about electronics that must operate in extremely cold environments? “The opposite of cooling techniques is at the bottom-end of temperature; what do you do when the system gets very, very cold,” asks GMS’s Ciufo. GMS engineers have an architectural approach to speed the warming of electronics to operating parameters.

“We found a way to provide self-heating of electronics such that the non-race-critical parts are not operating in an unreliable state, but other components work and start heating-up the system until the critical components com up to their minimum temperatures,” Ciufo explains. Race conditions involve components that operate at slightly different clock speeds because of low temperatures, which cause them to operate unreliably.

“Our systems can power-up much more quickly, so the time to wake up is much shorter because of the way we have intelligently built the system,” Ciufo says. 

Voice your opinion!

To join the conversation, and become an exclusive member of Military Aerospace, create an account today!