Today's most pressing design issues in rugged rackmount mobile servers

Advanced surveillance, reconnaissance, and situational awareness on the battlefield demands more powerful computing than ever before. The challenges of moving computer servers from the data center into the mobile command post on the front lines. To do this requires new innovations in ruggedization, packaging, and thermal management.

An interview with Chris A. Ciufo, chief technology officer and vice president of product marketing at General Micro Systems Inc. in Rancho Cucamonga, Calif.

Question: What are some of today's most pressing environmental conditions and design issues when it comes to rugged servers for the battlefield?

Answer: The "battlefield" refers to an environment that includes everything from land to sea to air, and even space. The short answer is that servers need to operate in all of these domains and survive the operating environment. Without going into the specifics of each let me drill down into a few examples that are representative. I'll also omit Space for the sake of brevity, because we understand that a rackmount server like our S2U wouldn't fly in a space station or in a spacecraft; instead a form factor like our ultra-small mobile class SG502-LP is needed. In fact, it is destined to fly on a planned mission to Mars as a sort of communications server.

On land, servers are most commonly battlefield deployed in three "platform" types: in fixed buildings; in temporary command posts; and installed in mobile vehicles like Humvees, MRAPs, Strykers, and other ground vehicles.

Note that in buildings, tents, and trailers, servers are typically air-cooled rackmount equipment, 19-inches wide and stacked with other gear such as RAID drives, power supplies, Ethernet switches and sometimes rackmount radios. In vehicles, servers might be rackmount and installed in suitcase-like transit cases; however, increasingly they are fully conduction-cooled small form factor sealed chassis that don't use fans for cooling like rackmount equipment. They are more robust and purpose built, perform Xeon-class workloads, and the assurance of extreme ruggedness comes at a price.

These environments are all very different. Air-cooled (convection) servers use fans and use pure commercial temperature components; in fact, this equipment might be the same gear used in enterprise installations that run the Internet "cloud". But every image you have ever seen of a computer server room shows a hot-side/cold-side row of servers that suck in chilled air and blow out hot air -- in fact, very hot air. Intel Xeon processors run as hot as 120 Watts each so a typical multi-CPU server puts out as much heat as a hair dryer. That's fine in an air-conditioned server room but not so ideal in burned-out battlefield command post buildings or in mobile operations tents. In these locations, services like the Army or Air Force need to bring portable air conditioners-big trailers with generators and chillers used to keep the operators--and especially the equipment-cool enough to operate without overheating.

Server heat is a killer

And for servers, heat is a killer. While the components (likely) won't burn up, commercial temperature components operate at zero to 70 degrees Celsius and when too hot they start to misbehave. Intel processors are designed to not exceed about 100 C on the die (which is the lid of the IC package). When they get close to their maximum temperature, the CPU starts to "throttle" itself, slowing down the clock to lower the workload and the device temperature. When this happens, the server slows down and its performance suffers. You don't want a powerful server operating in "limp" mode; what a waste. Lives could be lost.

For rackmount servers, they are by definition air cooled. And when operated without air conditioners -- say, in that building, tent, or Humvee transit case -- how efficiently air is blown across the system can greatly influence the server's ability to avoid throttling. As well, it's equally important to get the heat from the hot components (like the processors) onto the heat sinks across which the air is going to blow.

The General Micro Systems S2U rugged rackmount server is air cooled. If it is operated in an air-conditioned environment that's all the better; however, it is designed to operate at full speed in non-air-conditioned environments and will operate at hotter temperatures for a longer period of time before the Xeon processors reach their maximum temperature. Here's why.

There are two hot-swappable fan tray assemblies that each contain six independently controlled fans. At 10,000 rpm per fan, hundreds of CFM are available to the entire 19-inch chassis to keep the system cool. But besides adequate airflow, it's important to get that air where it's needed: to the heat sinks on top of the dual Xeon CPUs. GMS includes a patented heat sink assembly that is nearly 6-inches by 9-inches, or almost the entire surface of the 6U VPX "motherboard". This much surface area -- plus the height of the vertical fins -- is a massive area available for the air to cool. The more area available, the more cooling potential.

Additionally, the S2U uses "Two Cool" technology: one set of fans pushes air across the heat sink assembly, while a second set pulls air out another part of the system. Along the way, additional cooler inlet air is inter-mixed, which counterbalances the amount of now-warmer air that has just moved across the heat sink. Additionally, the twelve fans are individually controlled via an in-system Baseboard Management Controller that monitors multiple in-system temperature sensors. This allows the fan speed to be increased or decreased as a way of "tuning" the airflow in the chassis for maximum cooling.

Metallic bath to remove heat

Besides the fans themselves, GMS has adapted the company's patent-pending RuggedCool technology from conduction-cooled systems to air-cooled systems. This system uses a viscous metallic "bath" in which the processor's contact "slug" sits, creating a very low thermal path from the hot processor package (which is the IC die) to the final air-cooled heat sink. The concept is simple, the thermodynamics complex, but the result is that there's less than a 10-degree heat rise from the hot die to the heat sink -- a very efficient thermal path. This means that the over 90 percent of the heat from the processor makes it to the heat sink and into the air stream. The GMS S2U therefore can operate in a hotter ambient battlefield environment without throttling the CPUs. For an air-cooled server operated on a battlefield that may not always have air conditioning: this is an ideal design.

Note that for conduction-cooled battlefield servers like the GMS Xeon E5-based SO302 or S402-LC, there are no fans but the RuggedCool technology similarly moves heat directly to the box's mounting cold plate. Again, the processors can run hotter and avoid throttling since the GMS server design gets the heat away very efficiently. The conduction-cooled servers are also designed for harsher treatment, including MIL-S-901 shock and MIL-STD-810 vibration, to name a few.

Question: How have your customers' requirements for rugged battlefield servers influenced your design choices concerning the King Cobra S2U rugged server?

Answer: Earlier we discussed three common battlefield scenarios, but those were only for ground-based systems. It's important to also include ship- and airborne-based systems into this mix. When considering customers' requirements for the full battlefield of Land, Sea and Air (we'll again omit Space), there are more design criteria and tradeoffs to consider.

Earlier, we spoke extensively about how the GMS rackmount servers, such as the air-cooled S2U "King Cobra" or 1U multi-domain S1U-MD, are successfully cooled using air that might not even be "chilled" from air conditioning. This requirement to use ambient uncooled air to cool servers drove the GMS Two Cool and RuggedCool technologies.

But in other areas besides environmental (temperature, shock, vibration, water ingress, EMI, etc.), battlefield servers have unique requirements. One of them is reliability. For rackmount servers like S2U, having the ability to quickly replace a module -- either due to failure or an upgrade -- drives the need for modularity and hot swap line replacement units (LRU).

S2U mounts in a rack, but every module of the system from power supply and fan assemblies, to VPX-based "motherboard" and drive assemblies, can be yanked out and replaced in seconds. This "100 percent LRU" capability makes possible battlefield two-level maintenance. In contrast, a typical COTS 1U or 2U server has the whole server as "the unit". If there's a failure, it's the server that's replaced. The GMS S2U design planned for everything in the server to be swapped out while on the battlefield, in the ship and underway, or in the air on a reconnaissance mission. This is particularly important in a submarine, for example, where carrying a large quantity of spare servers is just impractical. More useful is carrying a few replacement modules should a replacement swap out become necessary.

Application code reuse

There's another battlefield requirement that's not so obvious but is a huge factor for customers. It's application code reuse. Many large defense contractors have multi-platform systems-say a command module with moving maps, sensor fusion, and database retrieval that overlays data on the unfolding mission scenario. In one instance this command system may reside in an ATR or vetronics chassis and be mounted in an armored vehicle or in a wide-body aircraft. In other instances, it may be an air-cooled rack on a ship. And in still another instance, it may need to be shoe-horned into a small form factor system on a multi-mission ground vehicle like a Stryker, MRAP or future JLTV-like vehicle.

It's important that the same application software be portable across many different server types so the customer merely chooses the format of the server based upon the installation. GMS rugged servers are code compatible from one type to the next within the same processor family. The GMS S2U is VPX-based, and that same VPX motherboard and VPX Ethernet switch/storage module can be deployed in an ATR chassis. Or that same Xeon E5-based processing engine can be used in our S402-LC conduction-cooled small form factor box.

You asked about our "design choices", and this application portability example is a key point. The GMS rugged servers are based upon a computer-on-module "engine" that houses the processor or processors subsystem: Xeon E5, Xeon D, and future processor types. The engine is the same be it used in a VPX server blade, a small form factor conduction-cooled chassis, an air-cooled 19-inch rackmount, or even sandwiched in our super-thin RuggedView smart panel PC displays with Intel Xeon processor D CPUs.

Question: Historically what are some of the industry's lessons learned when it comes to rugged battlefield servers; what has not worked in the past, what has worked, and how do those lessons learned translate into the King Cobra S2U design?

Answer: We've intimated some of the "lessons learned" in the earlier questions. But let me rephrase my answers to very specifically answer your question -- including what has and hasn't worked.

Firstly, pure commercial rackmount servers are widely deployed on the battlefield. Besides continental U.S.-based DoD installations for enterprise, systems integration labs (SIL) and for training purposes, the biggest user of rackmount servers is the US Navy. And it's because the shipboard environment is very tolerant of commercial-type, commercial temperature equipment. Below decks on ships it's cramped but air conditioned and full of equipment performing myriad tasks. Commercial servers are cost-effective, work well enough, and certainly handle the workload of large networks, storage, and processing found aboard.

Throw-away servers

But at a not-long-ago visit to a Navy installation, GMS CEO Ben Sharfi noticed a stockpile of new, brand name servers. When he asked his escort what they were for, the reply came that they were all spares for a certain ship. They were needed, Sharfi learned, because they were widely deployed on the ship and spares were constantly being swapped in for units needing repair. This lesson is what catalyzed GMS to extend our rugged, conduction cooled expertise into the rackmount space. If we could be "ballpark competitive" with the pure commercial vendors while bringing higher MTBF, ruggedization, and the ability to operate longer in hotter ambient environments, then customers would find value in GMS products. That has indeed proven to be the case.

Another lesson learned is easy enough to see in any enterprise server installation. There are rows and rows of 19-inch racks filled with servers, Ethernet switches, storage arrays, power supplies, and in the case of purpose-built data centers such as central office switches or shipboard command management systems-specialty I/O interfaces. This is the model that runs the Internet as well as Navy ships and DoD operations centers. Clearly this works well.

Yet even an aircraft carrier -- a "city" with over 6,000 people -- has limited space and cooling capacity for 1500-Watt servers. Separating all of these functions into separate boxes (server, network, storage, etc.) is convenient for logistics but it wastes space. The typical server has lots of "air" inside of it; it's not typically densely packaged. Yet one of GMS's key differentiators in all of our product lines is maximizing packaging density: more functions per cubic inch per watt. We do this in our VPX blade servers, our smart panel PC products, and in our small form factor conduction-cooled systems.

So we applied the same principle to the S2U in a 19-inch rack. Instead of a 1U, dual socket Xeon server with a couple of disk drives and several Ethernet LAN ports, we "crammed" 15U of equivalent rackmount functions into only 2U of height: CPUs, 22 1 Gigabit Ethernet ports, 4 10 Gigabit Ethernet ports, 12 drive bays, auxiliary power supply or 4-slot PCI Express add-in cage, router, GPGPU processors, and more. The list is extensive. So basically, while the enterprise server room model works well and is very applicable to some battlefield installations like ships, it definitely has room for efficiency improvements.

One final lesson learned pertaining to small-form-factor-systems is in ground vehicles. The U.S. Army, when upgrading some Stryker variants to the WIN-T Increment 2 (data on the move) configuration, was forced to remove some crew seating to add the vetronics computer equipment -- some of which came from GMS. Adding WIN-T is essential to the Army's battlefield network, but giving up crew seating is a high price. This was not an easy decision because transporting fewer crew in exchange for on-the-move network connectivity slightly reduces the soldier effectiveness per vehicle.

In collaboration with prime contractor General Dynamics Mission Systems and the US Army, GMS was able to help collapse up to five separate WIN-T boxes down to a single GMS small form factor, conduction-cooled server. This was not only a packaging/density exercise, but was a testimonial to General Dynamics' creativity in virtualizing their system using Intel's Xeon processor D. So now one box has multiple LAN ports and many VMs running what used to be spread across several discrete boxes. The COTS technology plus ingenuity made this a lesson learned.

Question: With such a wealth of high-performance embedded computing (HPEC) options available, why are rugged servers necessary on today's battlefield, and how do you differentiate rugged battlefield embedded computing from rugged battlefield servers?

Answer: You're asking for the practical, on-the-battlefield differences between HPC/HPEC, embedded computing, and servers. "Embedded" computers really encompass all three, yet for convenience, the market chooses to divide them. If every apple, orange, and nectarine was simply called "fruit" all the time it would be hard to have a conversation about them without confusion.

High Performance Computing (HPC) or specifically the "Embedded" version of this (HPEC) generally describes either extremely powerful multiprocessor systems -- be they conduction cooled in small form factor boxes, ATR chassis, or in rackmount systems -- or processor systems with single-function co-processors. HPC/HPEC systems are meant for the very highest horsepower computing jobs, such as data mining a large database or culling through gigabits of image pixels or RF sensor waveforms to find patterns. These examples are target tracking, image processing, synthetic aperture radar, or artificial intelligence. HPC/HPEC systems are distinguished usually by fat data pipes feeding in sensor data, plus a combination of (usually) Intel-type server processors -- Xeon or Xeon D -- combined with FPGAs, DSPs, or GPUs plus large memory arrays. Specialty algorithms are run on the combination processors/co-processors to perform the system's main tasks. These might be real-time systems or Windows-based systems, but the algorithms are always highly analytical complex math functions such as FFTs, DCTs, or waveform processing like OFDM.

In defense, these systems are found in deployed platforms like reconnaissance aircraft or latest-generation fighter aircraft, as well as certain types of mission-specific ground vehicles. In all cases, space and power are at a premium, and convection air for cooling is a rarity. Conduction-cooled platforms are the norm here, in either small form factor like GMS builds, or very dense VPX ATR/vetronics-style chassis which GMS also builds. These systems can be based upon a server architecture, but the need for algorithm co-processors to meet the performance needs tends to favor purpose-built architectures like our SB2002-SW "Blackhawk" with add-in FPGA co-processor, or server-class Xeon-equipped VPX single-board computers with one or two CPUs plus co-processor cards.

The role of co-processors

On the other hand, in stationary (or slowly mobile op center) installations, plus ships, rackmount servers can perform HPC/HPEC when add-in co-processors do the heaviest algorithm lifting. Servers like our Intel Xeon E5-based S2U accept a variety of co-processors from Altera/Intel, Xilinx, Nvidia and Texas Instruments. These algorithm processors plug into I/O slots we have for PCI Express, XMC, PCI Express-Mini, 3U VPX and GMS's SAM I/O formats. In this case: a traditional rackmount server can be configured as an HPC system, although the market tends not to call rackmount equipment "embedded". So I suppose this configuration is only "HPC", not "HPEC" although it meets the same system function.

Lastly, "plain old embedded computing" systems can indeed be configured as an HPC/HPEC system as described above, but that distinction of adding the algorithm co-processor typically redefines that embedded system as "HPEC". But to further confuse the reader, if a general purpose embedded system eliminates some of the common PC-like architecture -- Intel CPU, USB, moderate performance Ethernet LAN, and so on -- but chooses a purpose-built processor that's geared for, say, image processing such as an ARM Cortex A10 and InfiniBand inter-box connectivity, then that embedded computer is no longer "generic". It's not the usual Windows- or Linux-based command processor system but is now performing a specific function-in this case, algorithm processing -- so it's designation changes to HPEC.

Let's put one last fine point on the differences between battlefield computers and battlefield servers, regardless of if the form factor is small, conduction- or air-cooled, or rackmount. What makes an embedded computer into a "server" isn't well defined, but GMS defines micro-server or server as:
-- More than eight (8) virtual machines (VMs)
-- Sufficient Ethernet ports utilize at least 50 percent of the available VMs
-- Sufficiently huge memory to service all VMs at full clock speed
-- Network-attached storage (NAS) disk drives (at least two and typically four or more)
-- Runs a server operating system in Windows, Linux, or other; plus supports a virtual machine hypervisor such as VMware or equivalent.

Using this definition, a rugged embedded Windows 10 PC with Intel's latest Core i7 Kaby Lake CPU might be considered a "server", but it would be a low performing one with just barely enough memory or horsepower to handle eight VMs (using two threads per core). As well, more drives and LAN ports would need to be added. This loaded-up embedded computer could be made to meet our definition of "server", but it wouldn't be an ideal machine.

Micro-server technology

Instead, GMS offers "micro-server" embedded products based upon Intel Xeon processor D with up to 16 cores (32 VMs) and 64 gigabytes of ECC RAM. Up to 16 one Gigabit Ethernet ports and two 10 Gigabit ports are sufficient to service the VMs, while up to four removable drives cover the NAS. And these products can be as small as 5.4 x 6.5 x 3.5 inches at only 7 pounds.

"Server" based products -- in conduction or air cooling -- typically use one or two Intel Xeon E5 v4 server processors with up to 18 cores/36 VMs each. The GMS 2U rackmount S2U "King Cobra" server, for example, has up to 512 gigabytes of memory, 15 removable front panel drives, and a whopping 24 Ethernet ports. Clearly this is in a whole different class than a four core Kaby Lake processor. We also build a similar type of server that's entirely conduction cooled and slimmed down in size and features -- but still boasts one or two processors, up to four removable drives, and an incredible 26 Ethernet ports. This one called the S402-SW is used in vetronics systems for ground vehicles as literally an on-the-go rugged vehicle server.

Question: What kinds of military design-ins do you expect for the King Cobra S2U rugged server, and what time frame are you anticipating for these design-ins?

Answer: Battlefield servers (rack) in TOC, terrestrial, vehicles, ships, subs, wide-body, interdiction.

GMS has had excellent success with the S2U rackmount server, mostly because it follows from the company's established differentiators in the defense market:
-- Modular, processor "engine" based design
-- Ultra-high density: we quote "15U of functionality in only 2U of rack space"
-- It's unique, mating multiple functions that are normally separate into a single box
-- The flexibility of so many add-in I/O modules means that what is normally an inflexible rackmount server can now accept program-specific technology cards in a rack configuration. This preserves our customers' investments in legacy hardware and software.

Military applications

We're seeing design wins today in the following defense applications: forward-deployed operations centers; mobile tactical command posts; vehicle-mounted network infrastructure for semi-permanent battlefield operations; shipboard; wide-body C4ISR and EW platforms; and airborne command infrastructure that links to onboard and SATCOM networks. Every one of these platform types is an existing S2U design win in various stages of funding.

It would be redundant of me to repeat here again why S2U is seeing such success in the market, but GMS has long been the battlefield rugged server company. We've done it with VME and VPX, we've done it with small form factors, and we've done it when Intel announced the micro-server class Xeon processor D two years ago. All GMS did with S2U was apply our engine-based approach -- and our propensity to be the "Swiss Army knife of embedded systems" -- to the design philosophy behind S2U.

But there is another reason for the current design wins, and we believe it will help us as we go forward. We've made every component of S2U modular, replaceable, and hot-swappable. This "LRU approach" includes the VPX blade "motherboard" and the network COM board. I've been quoted saying "no one ever got fired for designing in VPX" and that lowers the barrier to the S2U. Customers recognize that a VPX-based server is more costly than a Taiwan-sourced motherboard server, but it's designed from the ground up for defense systems. That has a calming effect on our customers, knowing that our VPX blade is also software compatible with the S2U and can be essentially reused in an ATR.

We expect more S2U design wins in the markets and platforms listed above. And of the current crop of customers and prospects, we find that rackmount servers tend to go into production faster than conduction-cooled small form factor servers. So we'll continue to press our advantages in this market. And we have more new product variants on the roadmap, too.

Chris A. Ciufo is Chief Technology Officer and VP of Product Marketing at General Micro Systems, Inc. As founder or co-creator of defense industry publications COTS Journal and Military Embedded Systems, Ciufo's role at GMS keeps a finger on the industry's technology pulse to recommend the company's technology direction and roadmaps. He serves as technical intermediary between Sales and Engineering, and works closely with the company's CMO on strategic direction. Ciufo is a veteran of the semiconductor, COTS, and defense industries, where he held engineering, marketing and executive level positions. He has published over 100 technology-related articles. He holds a BS EE/Materials Science and participates in defense industry organizations and consortia. His hobbies include boating and antique mechanical systems.