TRENDS: Board vendors push for new COTS reliability standards

Many experts in the single-board computer board industry have abandoned the MIL-HDBK STD 217 standard for calculating mean-time-between-failure (MTBF)

May 1st, 2000

By John McHale

Many experts in the single-board computer board industry have abandoned the MIL-HDBK STD 217 standard for calculating mean-time-between-failure (MTBF). This guideline, experts say, does not adequately account for plastic parts and custom application-specific integrated circuits - better known as ASICs.

Yet these same board designers express frustration when they turn toward their own methods to determine reliability. Often they find that great disparities in MTBF appear between board products that have essentially the same components.

"It's pretty simple - everybody does their own thing," says Glenn Benninger, senior engineer at the Commercial Technology Support Branch of the Naval Surface Warfare Center division in Crane, Ind. "A standard is needed to even out the playing field."

There is a "helluva lot of interest for a standard methodology for qualifying MTBF," says Ray Alderman, executive director for the VME International Trade Association (VITA) in Scottsdale, Ariz.

Electronics engineers who deal with telecommunications and military equipment must deal with crucial systems, and as such need some way to measure aberrant behavior in board products, Alderman says. Then everybody in the industry must conform to it, he adds.

"It is absolutely essential that we have an industry standard," says Duncan Young, director of marketing at DY 4 Systems Inc., a supplier of rugged VME boards in Kanata, Ontario. Commercial vendors who do not understand extreme environmental conditions sometimes generate ludicrous MTBF numbers, he continues.

"I think [a new MTBF standard] would be wonderful," says Richard Copra, senior vice president of marketing at Vista Controls in Santa Clarita, Calif. Although Copra admits that some competing boards have the same components as Vista boards "their MTBF numbers are an order of magnitude better than ours," he adds. It does not make sense unless they are exaggerating their numbers, Copra claims.

"Some companies use a lot of physics and science and come up with a reasonable number, while others throw out a wild guess," Benninger says.

The wild guess is called marketing swag where companies pick an MTBF number that although it looks good, it nonetheless is an exaggeration of their product's actual reliability, Benninger explains.

This practice often works to the disadvantage of companies that do a good job getting an exact MTBF number. Benninger says. These companies might not get the contracts because their competitors might have prettier numbers, he adds.

This becomes a problem when ordering spares. The company with the more precise failure rate will require more spares. Meanwhile, the one chosen for the contract has a better number on paper, and leads the customers to believe they need fewer spares.

It should be noted that the company with the incorrect failure rate number usually promises to deal with any problems, he adds.

Before microprocessors had a standard for performance, the competing vendors would cheat like crazy, Copra says. Once standards came out companies had to be honest, he adds.

The current military standard, MIL-HDBK 217, is too conservative and is not friendly toward custom technology such as ASICs, Benninger says.

Benninger says he would like to see something like the Bellcor standard originally developed at Bell Laboratories and currently popular in the telecommunications industry to determine MTBF. Bellcor is more understanding of technologies like ASICs, he adds.

MIL-HDBK 217 also hammers the suppliers on plastic parts, Copra says. Whatever the new standard is, it must take into consideration plastic devices, he adds.

MIL-HDBK 217 is too theoretical, contends Jerry Gipper, director of business development and planning at the Motorola Computer Group in Tempe, Ariz.

There are three ways to calculate MTBF, Gipper says. The first is using theoretical formulas such as MIL-HDBK 217, which uses many formulas without many actual tests, he adds.

In defense of military standards, new versions do come out as data comes in, but they still tend to be conservative," Copra notes.

Then there is actual data, which uses results from the actual lifetime of a product, Gipper explains. Actual MTBF calculations can be 10 times better than theoretical, he says. However, a product needs to be out in the market for a few years before actual data emerges, Gipper notes.

Motorola Computer Group officials prefer the third way, which is demonstration, Gipper says. His company uses accelerated life testing - also known as highly accelerated life testing (HALT) - which simulates the lifetime of a product over several weeks through increased environmental and stress testing, Gipper explains.

Motorola engineers sometimes then use actual data to verify their demonstration calculations, Gipper says.

Vista also uses HALT to determine a product's MTBF, says Gorky Chin, vice president of advanced technology at Vista. If you really want to understand the physics of failure HALT is the right choice, Chin adds.

It uses actual data extrapolation and is the ideal test for any product going into a mission- or life-critical application, Chin claims.

DY 4 engineers also perform HALT on their products "as a test of design integrity," Young says. It is not useful in determining lifetime reliability, Young notes. However, it does give a view of the design margins, which enables engineers to select the right components for particular environments, Young explains.

More in Computers