Sources of
Reliability Data
Part 1 - Reliability Testing Basics
Reliability testing is
the cornerstone of a reliability engineering program. It provides the most
detailed forms of life data in that the conditions under which the data
are collected can be carefully controlled and monitored. Furthermore, the
reliability tests can be designed to uncover particular suspected failure
modes and other problems. The type of reliability testing a product
undergoes will change along different points of its life-cycle, but the
overriding goal is to insure that data from all or most of the tests were
generated under similar enough conditions so that an "apples-to-apples"
comparison can be made of the product's reliability characteristics at
different points in the product's life.
A properly designed
series of tests, particularly during the product's earlier design stages,
can generate data that would be useful in the implementation of a
reliability growth tracking program. This will provide information helpful
in making management decisions regarding scheduling, development cost
projections and so forth. This information will also be useful in planning
the development cycle of future products.
Reliability Test
Design
Designing reliability tests can sometimes lead to a catch-22 situation in
that a certain amount of information is required about the life of a
product in order to design the most efficient life tests. Often,
reliability or test engineers are looking for a magic formula that will
allow them to obtain precise, accurate information on the life of their
products by testing small numbers of units for short periods of time.
Sadly, there will be no type of test plan that can meet all of these
requirements. It must be kept in mind that skimping on test units or test
time will almost always result in greater uncertainty in the results of
the test.
Ideally, a reliability
test would be one in which a relatively large number of units are tested
to failure. Although the concept of a large number of failures on test may
be an anathema to design engineers, the information that these tests
produce are necessary to successfully model the life behavior of the
product. The more failures that a reliability test produces, the more
precise the results of the analysis will be. This is especially important
for products that are new or otherwise have little historical information
about their reliability. When developing tests for such products, it is
advisable to test as many products as is feasible in order to obtain large
quantities of information. This will help guarantee a precise early
estimate of the product's reliability, which may be able to reduce the
scope of testing further along in the development process.
Once detailed initial
reliability information has been collected and analyzed, it can be used to
design reliability acceptance or demonstration tests. These tests, which
usually occur later in the development process, are used to demonstrate
that the reliability of a product is no worse than a certain level. It is
normally assumed that no failures will occur on such tests. However, in
order to effectively design such tests, a certain amount of information
about the product under test is required. At a minimum, one must be able
to estimate the distribution that the life of the product follows, and the
value of the shape parameter of that distribution. With this information,
one can design a test that will demonstrate that the products have met a
minimum reliability requirement at a given confidence, provided that there
are no unanticipated failures during the test.
Customer Usage
Profiling
An important requirement for designing useful reliability tests is to have
a good idea of how the product is actually going to be used in the field.
The tests should be based on a realistic expectation of the customer
usage, rather than estimates or "gut feelings" about the way the customer
will use the product. Tests based on mere speculation may result in a
product that may not have been rigorously tested and consequently may run
into operational difficulties due to use stress levels being higher than
anticipated. On the other hand, tests that are designed with a strong
basis of information on how the product will be used will be more
realistic and result in an optimized design that will exhibit fewer
failures in the field.
Customer usage profiles
can be set up that actively gather information on how the customers are
actually using an organization's product. This can range from a simple
questionnaire to a sophisticated instrumentation of the product that feeds
back detailed information about its operation. An incentive is often
useful to get customers to sign on for a usage measurement program,
particularly if it is an intrusive process that involves the installation
of data collection equipment. However, customers are often eager to
participate in the knowledge that the information that they provide will
ultimately result in a more reliable and user-friendly product.
Test Types
In many cases, the type of testing that a product undergoes will change as
the product's design becomes mature and the product moves from the initial
design stages to final design release and production. Nevertheless, it is
a good practice to continue to collect internally-generated data
concerning the product's reliability performance throughout the life-cycle
of the product. This will strengthen the reliability growth analysis and
help provide correlation between internal test results and field data. A
brief summary of various types of reliability tests is presented next.
Development Testing
Development testing occurs during the early phases of the product's
life-cycle, usually from project inception to product design release. It
is vital to be able to characterize the reliability of the product as it
progresses through its initial design stages so that the reliability
specifications will be met by the time the product is ready for release.
With a multitude of design stages and changes that could affect the
product's reliability, it is necessary to closely monitor how the
product's reliability grows and changes as the product design matures.
There are a number of different test types that can be run during this
phase of a product's life-cycle to provide useful reliability information:
-
Component-level Testing - Although component-level testing can
continue throughout the development phase of a product, it is most
likely to occur very early on. This may be due to the lack of
availability of parts in the early stages of the development program.
There may also be special interest in the performance of a specific
component if it has been radically redesigned, or if there is a separate
or individual reliability specification for that component. In many
cases, component-level testing is undertaken to begin characterizing a
product's reliability even though full system-level test units are
unavailable or prohibitively expensive. However, system-level
reliability characterization can be achieved through component-level
testing. This is possible if sufficient understanding exists to
characterize the interaction of the components. If this is the case, the
system-level reliability can be modeled based on the configuration of
components and the result of component reliability testing, using such
tools as ReliaSoft's
BlockSim.
- System-level
Testing - Although the results of component-level tests can be
used to characterize the reliability of the entire system, there is no
substitute for testing the entire system, particularly if that is how
the reliability is specified. That is, if the technical specifications
state a reliability goal for a specific system or configuration of
components, that entire system or configuration should be tested to
compare the actual performance with the stated goal. Although early
system-level test units may be difficult to obtain, it is advisable to
be able to perform reliability tests at the system level as early as
possible. At the very least, comprehensive system-level testing should
be performed immediately prior to the product's release for
manufacturing, in order to verify design reliability. During such
system-level reliability testing, the units under test should be from a
homogeneous population, and should be devoted solely to the reliability
test. The results of the reliability test could be skewed or confounded
by "piggybacking" other tests along with it, and this practice should be
avoided. A properly run system-level reliability test will be able to
provide valuable engineering information above and beyond the raw
reliability data.
- Environmental
and Accelerated Testing - It may be necessary in some cases to
institute a series of tests where the system is tested at extreme
environmental conditions, or with other stress factors accelerated above
the normal levels of use. It may be that the product would not normally
fail within the time constraints of the test, and it needs to have the
stress factors accelerated in order to get any meaningful data within a
reasonable time. In other cases, it may be necessary to simulate
different operating environments based on where the product is intended
to be sold or operated. Regardless of the cause, tests like these should
be designed, implemented and analyzed with care. Depending on the nature
of the accelerating stress factors, it is easy to draw incorrect
conclusions from the results of these tests. A good understanding of the
proper accelerating stresses and the design limits of the product are
necessary to be able to implement a meaningful accelerated reliability
test. For example, one would not want to design an accelerated test that
would overstress the product and introduce failure modes that would not
normally be encountered in the field. Given that there have been a lot
of incredible claims about the capability of accelerated testing and the
improbably high acceleration factors that can supposedly be produced,
care needs to be taken when setting up this type of reliability testing
program. (SHAMELESS PLUG: ReliaSoft's ALTA software is one of the few
applications solely dedicated to the analysis of accelerated test data.
The new version, ALTA PRO, is the only commercial software capable of
providing analytical results for time-varying stress tests, such as
step-stress tests. For more information on ALTA and ALTA PRO, see
http://ALTA.Reliasoft.com.)
- Shipping Tests
- Although shipping tests do not necessarily qualify as reliability
tests per se, shipping tests or simulations should be a
prerequisite to reliability testing. This is because the effects of
shipping will often have an impact on the reliability of the product
that the customer experiences. As such, it may be useful to incorporate
shipping tests alongside the normal reliability testing. For example, it
may be a good idea to put the units of a final design release
reliability test through a non-destructive shipping test prior to the
actual reliability testing in order to better simulate actual use
conditions.
Manufacturing Testing
The testing that goes on after a product design has been released for
production generally tends to measure the process rather than the product,
under the assumption that the released product design is final and good.
However, this is not necessarily the case, as post-release design changes
or feature additions are not uncommon. That notwithstanding, it is still
possible to obtain useful reliability information from manufacturing-type
testing without diluting any of the process-oriented information that
these tests are designed to produce.
- Functionality
Testing & Burn-In - This type of testing usually falls under the
category of operation verification. A large proportion, if not all, of
the products coming off of the assembly line are put on a very short
test in order to verify that they are functioning. In some situations,
they may be run for a predetermined "burn-in" time in order to weed out
those units that would have early infantile failures in the field.
Although it may not be possible to collect detailed reliability
information from this type of testing, what is lost in quality is made
up for in quantity. With the proper structuring, these tests can provide
a fairly good picture of early-life reliability behavior of the product.
- Extended
Post-Production Testing - This type of testing usually gets
implemented near the end or shortly after the product design release to
production. It is useful to structure these types of tests to be
identical to the final reliability verification tests conducted at the
end of the design phase. This is to be able to assess the effects of the
production process on the reliability of the product. In many cases, the
test units that undergo reliability testing prior to the onset of actual
production are hand-built or carefully adjust prior to the beginning of
the reliability tests. By replicating these tests with actual production
units, potential problems in the manufacturing process can be identified
before many units get shipped.
- Design/Process
Change Verification - This type of testing is similar to the
extended post-production testing in that it should closely emulate the
reliability verification testing that takes place at the end of the
design phase. This type of testing should occur at regular intervals
during production, or immediately following a post-release design change
or a change in the manufacturing process. These changes can have a
potentially large effect on the reliability of the product, and these
tests should be adequate - in terms of duration and sample size - to
detect such changes.
This gives just a brief
overview of some of the aspects of reliability testing. When it comes to
designing and implementing these tests, the philosophy that "more is
better" holds true - more units on test, more units run until failure.
With more data, the results will be more precise, with less uncertainty.
This will allow the engineer to make a better estimate of the product's
"real life" behavior. However, there is only way to categorize a product's
true behavior, and that is to investigate the reliability of the product
in the hands of the users (your customers).
In next month's issue, we will look at sources of field data.
|