Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 1, March 2001

Reliability Basics
Characterizing Your Product’s Reliability

The most general purpose of life data analysis is to characterize the life behavior of a product. We will assume that the product in question is non-repairable (i.e. that it is used from its initial turn-on, and is run until it fails, at which time it is replaced). The behavior of repairable units is generally more difficult to analyze, and will be dealt with at a later time. 

In more simple terms, the purpose of reliability analysis is to indicate the probability of success for a specified time. This probability is called the reliability, and is always associated with a given time. That is, the given percentage representing the probability of success is a function of time, and is essentially paired with an associated time. For example, a specification may call for a 90% reliability at 100 hours of operation. This means that the product has a 90% probability of running for 100 hours without failure. It can also be interpreted as meaning that 90% of a population of such products will run for 100 hours, while the other 10% will have failed before 100 hours.

Other reliability/time combinations will hold true for the same product. For example, the products in the previous example may have a reliability at 200 hours of 75%. The relationship between reliability and operation time for a product can generally be characterized by a continuous reliability function or curve, which represents reliability as a function of time. This function is usually denoted as R(t), with R representing the dependent variable reliability and t representing the independent variable time. A graphical representation of such a function is shown in the following figure:

Reliability vs. Time Plot

This represents the probability of failure over the lifetime of the product, and is one of the fundamental measures in life data analysis. (There are other reliability metrics that are closely related to the reliability function as well, and we will discuss these in future issues.)

One other reliability metric that merits quick discussion is that of the mean life, or MTBF/MTTF. This is widely used as a reliability metric due to its simplicity. However, it is very easy to become overly reliant on this metric, which is often thought to be synonymous with a reliability of 50%. However, this is not always the case, and the use of the MTBF in these circumstances may result in misleading characterizations of a product’s reliability. For a detailed discussion on the unsuitability of the MTBF as the sole reliability metric, see "The Limitations of Using the MTTF as a Reliability Specification" published in Volume 1, Issue 1 of Reliability Edge. This characterization is the result of the analysis of life test data, or from field failure data. This data would take the form of the amount of time it took for a number of units to fail. This concept sometimes does not sit well with those involved in the product development process who, quite understandably, feel uncomfortable with associating their carefully-designed products with failure. However, the fact remains that all products will eventually fail if operated for a long enough period of time. In order to characterize when this failure time is likely to happen, failure data are required.

These failure data may be the result of a reliability or life test conducted in a controlled environment, the purpose of which is to operate units to failure in order to obtain data for reliability analysis. Ideally, all of the units put on the test should be operated until they fail, resulting in a data set comprised of complete data. Sometimes this is not possible due to time and budgetary constraints, and there will be accumulated test time for units that did not fail. This is known as suspended data, and while not as important as complete failure data, it should not be discarded. This is because the information it contains –- the amount of time units have run without failing –- is also important in the assessment of a product’s reliability. Further, one must take care that the conditions under which the data set is obtained are very close to those the product will see during normal operation. Otherwise, the data obtained from the test may lead to inaccurate reliability results, which may in turn lead to poor business decisions. The problem of operating conditions is not a concern when analyzing field failure data, for the units under analysis were operated under actual use conditions. One of the drawbacks of field failure data is that it may consist primarily of suspended data. Another one of the drawbacks of field failure data is that it may be tainted or incomplete. For example, many times field data obtained for reliability analysis may have originally been collected for another purpose, such as financial warranty purposes. In some cases, this data may not have all of the necessary information required to perform a good reliability analysis. Also, there may be large portions of information missing, that is, large segments of the field population which are unaccounted for. Have they failed? How long have they been running? Are they still in operation? The answers to these questions are very important in the analysis of field data, and if this information cannot be provided for a large segment of the product's population, a field data analysis may return grossly inaccurate results. It is generally a good idea to have a reliability professional involved in the development of field data collection systems in order to avoid some of these pitfalls. 

This gives a very basic overview of how to analyze and characterize the reliability of a product, and the background information necessary to gather data to perform such analyses in order to generate a reliability curve. We look at the process by which the reliability data is analyzed in the next issue of Reliability HotWire.

ReliaSoft Corporation

Copyright © 2001 ReliaSoft Corporation, ALL RIGHTS RESERVED