|
Sources of Reliability Data
Part 1 - Reliability Testing
Basics
Reliability testing is
the cornerstone of a reliability engineering program. It provides the most
detailed forms of life data in that the conditions under which the data
are collected can be carefully controlled and monitored. Furthermore, the
reliability tests can be designed to uncover particular suspected failure
modes and other problems. The type of reliability testing a product
undergoes will change along different points of its life-cycle, but the
overriding goal is to insure that data from all or most of the tests were generated under similar enough conditions so that an
"apples-to-apples" comparison can be made of the product's
reliability characteristics at different points in the product's life.
A properly designed
series of tests, particularly during the product's earlier design stages,
can generate data that would be useful in the implementation of a
reliability growth tracking program. This will provide information helpful
in making management decisions regarding scheduling, development cost
projections and so forth. This information will also be useful in planning
the development cycle of future products.
Reliability Test
Design
Designing reliability tests can sometimes lead to a catch-22 situation in
that a certain amount of information is required about the life of a
product in order to design the most efficient life tests. Often,
reliability or test engineers are looking for a magic formula that will
allow them to obtain precise, accurate information on the life of their
products by testing small numbers of units for short periods of time.
Sadly, there will be no type of test plan that can meet all of these
requirements. It must be kept in mind that skimping on test units or test
time will almost always result in greater uncertainty in the results of
the test.
Ideally, a reliability
test would be one in which a relatively large number of units are tested
to failure. Although the concept of a large number of failures on test may
be an anathema to design engineers, the information that these tests
produce are necessary to successfully model the life behavior of the
product. The more failures that a reliability test produces, the more
precise the results of the analysis will be. This is especially important
for products that are new or otherwise have little historical information
about their reliability. When developing tests for such products, it is
advisable to test as many products as is feasible in order to obtain large
quantities of information. This will help guarantee a precise early
estimate of the product's reliability, which may be able to reduce the
scope of testing further along in the development process.
Once detailed initial
reliability information
has been collected and analyzed, it can be used to design reliability acceptance or
demonstration tests. These tests, which usually occur later in the
development process, are used to demonstrate that the reliability of a
product is no worse than a certain level. It is normally assumed that no
failures will occur on such tests. However, in order to effectively design
such tests, a certain amount of information about the product under test
is required. At a minimum, one must be able to estimate the distribution
that the life of the product follows, and the value of the shape parameter
of that distribution. With this information, one can design a test that
will demonstrate that the products have met a minimum reliability
requirement at a given confidence, provided that there are no unanticipated
failures during the test.
Customer Usage
Profiling
An important requirement for designing useful reliability tests is to have
a good idea of how the product is actually going to be used in the field.
The tests should be based on a realistic expectation of the customer
usage, rather than estimates or "gut feelings" about the way the
customer will use the product. Tests based on mere speculation may result
in a product that may not have been rigorously tested and consequently may
run into operational difficulties due to use stress levels being higher
than anticipated. On the other hand, tests that are designed with a strong
basis of information on how the product will be used will be more
realistic and result in an optimized design that will exhibit fewer
failures in the field.
Customer usage profiles
can be set up that actively gather information on how the customers are
actually using an organization's product. This can range from a simple
questionnaire to a sophisticated instrumentation of the product that feeds
back detailed information about its operation. An incentive is often
useful to get customers to sign on for a usage measurement program,
particularly if it is an intrusive process that involves the installation
of data collection equipment. However, customers are often eager to
participate in the knowledge that the information that they
provide will ultimately result in a more reliable and user-friendly
product.
Test Types
In many cases, the type of testing that a product undergoes will change as
the product's design becomes mature and the product moves from the initial
design stages to final design release and production. Nevertheless, it is
a good practice to continue to collect internally-generated data
concerning the product's reliability performance throughout the life-cycle
of the product. This will strengthen the reliability growth analysis and
help provide correlation between internal test results and field data. A
brief summary of various types of reliability tests is presented next.
Development Testing
Development testing occurs during the early phases of the product's
life-cycle, usually from project inception to product design release. It
is vital to be able to characterize the reliability of the product as it
progresses through its initial design stages so that the reliability
specifications will be met by the time the product is ready for release.
With a multitude of design stages and changes that could affect the
product's reliability, it is necessary to closely monitor how the
product's reliability grows and changes as the product design matures.
There are a number of different test types that can be run during this
phase of a product's life-cycle to provide useful reliability information:
- Component-level
Testing - Although component-level testing can continue
throughout the development phase of a product, it is most likely to
occur very early on. This may be due to the lack of availability of
parts in the early stages of the development program. There may also
be special interest in the performance of a specific component if it
has been radically redesigned, or if there is a separate or individual
reliability specification for that component. In many cases,
component-level testing is undertaken to begin characterizing a
product's reliability even though full system-level test units are
unavailable or prohibitively expensive. However, system-level
reliability characterization can be achieved through component-level
testing. This is possible if sufficient understanding exists to
characterize the interaction of the components. If this is the case,
the system-level reliability can be modeled based on the configuration
of components and the result of component reliability testing, using
such tools as ReliaSoft's BlockSim.
- System-level
Testing - Although the results of component-level tests can be
used to characterize the reliability of the entire system, there is no
substitute for testing the entire system, particularly if that is how
the reliability is specified. That is, if the technical specifications
state a reliability goal for a specific system or configuration of
components, that entire system or configuration should be tested to
compare the actual performance with the stated goal. Although early
system-level test units may be difficult to obtain, it is advisable to
be able to perform reliability tests at the system level as early as
possible. At the very least, comprehensive system-level testing should
be performed immediately prior to the product's release for
manufacturing, in order to verify design reliability. During such
system-level reliability testing, the units under test should be from
a homogeneous population, and should be devoted solely to the
reliability test. The results of the reliability test could be skewed
or confounded by "piggybacking" other tests along with it,
and this practice should be avoided. A properly run system-level
reliability test will be able to provide valuable engineering
information above and beyond the raw reliability data.
- Environmental
and Accelerated Testing - It may be necessary in some cases to
institute a series of tests where the system is tested at extreme
environmental conditions, or with other stress factors accelerated
above the normal levels of use. It may be that the product would not
normally fail within the time constraints of the test, and it needs to
have the stress factors accelerated in order to get any meaningful
data within a reasonable time. In other cases, it may be necessary to
simulate different operating environments based on where the product
is intended to be sold or operated. Regardless of the cause, tests
like these should be designed, implemented and analyzed with care. Depending on the nature of the accelerating stress factors, it is easy
to draw incorrect conclusions from the results of these tests. A good
understanding of the proper accelerating stresses and the design
limits of the product are necessary to be able to implement a
meaningful accelerated reliability test. For example, one would not
want to design an accelerated test that would overstress the product
and introduce failure modes that would not normally be encountered in
the field. Given that there have been a lot of incredible claims about
the capability of accelerated testing and the improbably high
acceleration factors that can supposedly be produced, care needs to be
taken when setting up this type of reliability testing program. (SHAMELESS
PLUG: ReliaSoft's ALTA software is one of the few applications solely
dedicated to the analysis of accelerated test data. The new version,
ALTA PRO, is the only commercial software capable of providing analytical results for time-varying
stress tests, such as step-stress tests. For more information on ALTA and
ALTA PRO, see
http://ALTA.Reliasoft.com.)
- Shipping Tests
- Although shipping tests do not necessarily qualify as reliability
tests per se, shipping tests or simulations should be a prerequisite
to reliability testing. This is because the effects of shipping will
often have an impact on the reliability of the product that the
customer experiences. As such, it may be useful to incorporate
shipping tests alongside the normal reliability testing. For example,
it may be a good idea to put the units of a final design release
reliability test through a non-destructive shipping test prior to the
actual reliability testing in order to better simulate actual use
conditions.
Manufacturing
Testing
The testing that goes on after a product design has been released for
production generally tends to measure the process rather than the product,
under the assumption that the released product design is final and good.
However, this is not necessarily the case, as post-release design changes
or feature additions are not uncommon. That notwithstanding, it is still
possible to obtain useful reliability information from manufacturing-type
testing without diluting any of the process-oriented information that
these tests are designed to produce.
- Functionality
Testing & Burn-In - This type of testing usually falls
under the category of operation verification. A large proportion, if
not all, of the products coming off of the assembly line are put on a
very short test in order to verify that they are functioning. In some
situations, they may be run for a predetermined "burn-in"
time in order to weed out those units that would have early infantile
failures in the field. Although it may not be possible to collect
detailed reliability information from this type of testing, what is
lost in quality is made up for in quantity. With the proper
structuring, these tests can provide a fairly good picture of
early-life reliability behavior of the product.
- Extended
Post-Production Testing - This type of testing usually gets
implemented near the end or shortly after the product design release
to production. It is useful to structure these types of tests to be
identical to the final reliability verification tests conducted at the
end of the design phase. This is to be able to assess the effects of
the production process on the reliability of the product. In many
cases, the test units that undergo reliability testing prior to the
onset of actual production are hand-built or carefully adjust prior to
the beginning of the reliability tests. By replicating these tests
with actual production units, potential problems in the manufacturing
process can be identified before many units get shipped.
- Design/Process
Change Verification - This type of testing is similar to the
extended post-production testing in that it should closely emulate the
reliability verification testing that takes place at the end of the
design phase. This type of testing should occur at regular intervals
during production, or immediately following a post-release design
change or a change in the manufacturing process. These changes can
have a potentially large effect on the reliability of the product, and
these tests should be adequate - in terms of duration and sample size
- to detect such changes.
This gives just a brief
overview of some of the aspects of reliability testing. When it comes to
designing and implementing these tests, the philosophy that "more is
better" holds true - more units on test, more units run until
failure. With more data, the results will be more precise, with less
uncertainty. This will allow the engineer to make a better estimate of the
product's "real life" behavior. However, there is only way to
categorize a product's true behavior, and that is to investigate the
reliability of the product in the hands of the users (your customers). In
next month's issue, we will look at sources of field data.
|