Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 2, April 2001

Reliability Basics

Specifications and Product Failure Definitions

Before life data can be analyzed or even collected, the definition of failure must be established. While this may seem like a simplistically obvious piece of advice, lack of generally accepted definitions for performance-related failures can result in misunderstandings over validation and reliability specifications, wasted test time, and squandered resources. The process of characterizing the reliability and defining the failures for a product is directly related to the product's mission.  

A textbook definition of reliability is: 

The conditional probability, at a given confidence level, that the equipment will perform its intended functions satisfactorily or without failure, i.e., within specified performance limits, at a given age, for a specified length of time, function period, or mission time, when used in the manner and for the purpose intended while operating under the specified application and operation environments with their associated stress levels[1] 

With all of the conditions removed, this boils down to defining reliability as the probability of the product to perform its intended mission without failing. The definition of reliability springs directly from the product mission, in that product failure is the inability of the product to perform its defined mission.

Reliability Specifications
In order to develop a good reliability program for a product, the product must have good reliability specifications. These specifications should address most, if not all, of the conditions in the reliability definition above, including mission time, usage limitations, operating environment, etc. In many instances, this requires a detailed description of how the product is expected to perform reliability-wise. Use of a single metric, such as MTBF, as the sole reliability metric is inadequate. The specification that a product will be “no worse” than the previous model is also insufficient. An ambiguous reliability specification leaves a great deal of room for error, and this can result in a poorly understood and unreliable product reaching the field. Of course, there may be situations where an organization lacks the reliability background or history to easily define specifications for a product’s reliability. In these instances, an analysis of existing data from previous products may be necessary. If enough information exists to characterize the reliability performance of a previous product, it should be a relatively simple matter to transform this historical product reliability characterization into specifications of the desired reliability performance of the new product. Financial concerns will definitely have to be taken into account when formulating reliability specifications. Planning for warranty and production part costs is a significant part of financially planning for the release of a new product. Based on financial inputs such as these, a picture of the required reliability for a new product can be established. However, financial wishful thinking should not be the sole determinant of the reliability specifications, as it can lead to problems such as unrealistic goals, specifications that change on a regular basis to fit test results or test results that get “fudged” in order to conform with unrealistic expectations. It is necessary to couple the financial goals of the product with a good understanding of product performance in order to get a realistic specification for product reliability. A proper balance of financial goals and realistic performance expectations is necessary to develop a detailed and balanced reliability specification. 

Universal Failure Definitions
Another important foundation for a reliability program is the development of universally agreed-upon definitions of product failure. This may seem a bit silly, in that it should be fairly obvious if a product has failed or not [2], but it is quite necessary for a number of different reasons.

One of the most important reasons is that different groups within the organization may have different definitions as to what sort of behavior actually constitutes a failure. This is often the case when comparing the different practices of design and manufacturing engineering groups. Identical tests performed on the same product by these groups may produce radically different results simply because the two groups have different definitions of product failure. For a reliability program to be effective, there must be a commonly accepted definition of failure for the entire organization. Of course, this definition may require a little flexibility depending on the type of product, development phase, etc., but as long as everyone is familiar with the commonly accepted definition of failure, communications will be more effective and the reliability program will be easier to manage.

Another benefit of having universally agreed-upon failure definitions is that it will minimize the tendency to rationalize away failures on certain tests. This can be a problem, particularly during product development, as it is a tendency of engineers and managers to overlook or diminish the importance of failure modes that are unfamiliar or not easily replicable. This tendency is only human, and a person who has spent a great deal of time developing a product is sometimes justified in writing off an oddball failure as a “glitch” or as being due to some other external error. However, this type of mentality also results in products being released into the field that have poorly defined but very real failure modes. Having a specific failure definition that applies to all or most types of tests will help alleviate this problem. However, a degree of flexibility is called for in the definition of failure, particularly with complex products that may have a number of distinct failure modes. For this reason, it may be advisable to have a multi-tiered failure definition structure that can accommodate the behavioral vagaries of complex equipment. The following three-level list of failure categories is given as an example: 

  • Type I - Failure – Severe operational incidents that would definitely result in a service call, such as part failures, unrecoverable equipment hangs, DOAs, consumables that fail/deplete before their specified life, onset of noise, and other critical problems. These constitute “hard-core” failure modes that would require the services of a trained repair technician to recover.
  • Type II - Intervention - Any unplanned occurrence or failure of product mission that requires the user to manually adjust or otherwise intervene with the product or its output. These tend to be “nuisance failures” that can be recovered by the customer, or with the aid of phone support. Depending on the nature of the failure mode, groups of the Type II failures could be upgraded to Type I if they exceed a predefined frequency of occurrence.
  • Type III - Event - Events will include all other occurrences that do not fall into either of the categories above. This might include events that cannot be directly classified as failures, but whose frequency is of engineering interest and would be appropriate for statistical analysis. Examples include failures caused by test equipment malfunction or operator error.

During testing, all of these occurrences should be logged with codes to separate the three failure types. Other test-process-related issues, such as deviations from test plans, should be logged in the a separate test log. There should be a timely review of logged occurrences to insure proper classification prior to metric calculation and reporting.  

[1] Kececioglu, Dimitri, Reliability Engineering Handbook, Vol. 1, Prentice-Hall, 1991.
[2] This is closely related to the concept of product mission. A good baseline definition of failure is the inability of the product to perform its mission. From that basic definition, more detailed categorizations of failure can be developed.

ReliaSoft Corporation

Copyright © 2001 ReliaSoft Corporation, ALL RIGHTS RESERVED