Notes from a parallel universe II

Oct. 17, 2017

4 min read

Shared User Pr0804682d9bf942ceaab1a8ff0bf02c38

Last time, we looked at the progression of devices from Intel that have dominated the rugged embedded computing market at the high performance end, both for SBCs and for signal processing. A graphic showing the story to date was omitted, so here it is (below). It depicts the progression of processors from 1^stGen through the current 7^thGen, and shows example SBCs in both3Uand6Uformats, plus the timeline for significant architectural introductions that gave us inflection points on the performance curve, especially in the single precision floating point (SP FP) vector operations that tend to dominate embedded signal processing.

We also looked forward to the introduction of AVX512 to this line of processors. Since then, Intel has made some more details public. Particularly relevant is the fact that some CPUs (in the ‘server’ line) will have both AVX512 and two fused multiply-add (FMA) execution units, while some (in the ‘client’) line will get AVX512, but only one FMA unit.

This has some implications for our applications. If you consider purely peak theoretical GFLOPS, then client CPUs will have equivalent numbers (assuming the same base operating frequency) as previous processors with AVX2. Meanwhile, server chips will get double the peak performance (as current AVX2 CPUs have two FMA units, but the pipeline is 256 bits wide).

However: a developer needs to consider how achievable this performance is, as it assumes that the algorithm being performed can issue back-to-back dual FMA operations - which is only possible in certain circumstances.

Our initial testing has shown that we can achieve the doubling in performance when executing a complex matrix product which heavily utilizes FMA instructions. Other algorithms may not fare so well depending on the instruction mix. On the other hand, some algorithms will run faster on a 512-bit pipeline with one FMA unit than on a 256-bit pipeline with two FMA units. As the saying goes: your mileage may vary.

Enter ARM

What, then, is going on in the non-Intel world? To date this has been mostly dominated by Power Architecture. Power PCs were long the chip of choice for power-efficient processing due to the AltiVec vector engine, as well as applications requiring safety certification due to architectural features, including memory structure. There was a hiatus in the availability of AltiVec that reduced the attractiveness for signal processing for a few years, and now we are faced with a sparse roadmap.

Enter ARM. Like Power, ARM is a set of architectures for RISC processors that are licensed and manufactured by a number of vendors, including NXP/Qualcomm, NVIDIA (embedded in Tegra GPUs), TI, Broadcom and many more. In addition, FPGA vendors including Xilinx, Altera/Intel, and Microsemi have ARM cores hardwired into some of their system-on-chip products.

ARM devices can include the NEON SIMD extension which provides 128-bit wide processing of various data types including SP FP. Some have the Mali integrated graphics processor which can support GPGPU processing via OpenCL as well as graphics with OpenGL. Add to this large core counts, multiple PCIe lanes, multiple Ethernet ports supporting rates up to 100GbE, and DDR4 memory support; do all this in a power envelope just over 30W, and it gets interesting.

It should also be noted that many ARM implementations also include features that are expected for modern, secure systems, such as hardware support for virtualization, secure boot, and so on.

Abaco has a rich portfolio of board and system products based on high-performance Intel processors and power-efficient ARM processors to support a wide variety of rugged, embedded applications. We would love to hear what you need for your next program.

About the Author

Peter Thompson

Sr Bus Dev Mgr

Peter Thompson is senior business development manager for High Performance Embedded Computing. He first started working on High Performance Embedded Computing systems when a 1 MFLOP machine was enough to give him a hernia while carrying it from the parking lot to a customer’s lab. He is now very happy to have 27,000 times more compute power in his phone, which weighs considerably less.

Case Study: Aegis Combat System Fire-Control Hardware Cabinet

Northrop Grumman picked to provide RF transmitter for electronic warfare (EW) system on B-1B bomber

Sponsored

What is a Private Cellular Network?

Sponsored

Notes from a parallel universe II

About the Author

Peter Thompson

Sr Bus Dev Mgr

Related

Case Study: Aegis Combat System Fire-Control Hardware Cabinet

Northrop Grumman picked to provide RF transmitter for electronic warfare (EW) system on B-1B bomber

What is a Private Cellular Network?

CIMPOR uses private 5G to improve safety, efficiency, and sustainability of cement plants

Voice Your Opinion!

To join the conversation, and become an exclusive member of Military Aerospace, create an account today!

Trending

NASA awards $180.4 million CLPS contract to Intuitive Machines for lunar south pole payload delivery

Navy surveys industry for reusable uncrewed test aircraft to develop air-launched munitions

NASA seeks industry input on multimodal transport contract for flown space vehicles

Sponsored Picks

What is a Private Cellular Network?

Designing for Magnetic Neutrality: How Non-Magnetic RF Components Improve Signal Integrity

Why Non-Magnetic RF/Microwave Components Matter - Benefits, Applications & Use Cases