Reliability for Complex Systems
Part 2 - Simulation
the previous issue, we discussed
methods to analytically determine the reliability of a complex system.
While the analytical method has a number of advantages, such as being able
to determine the pdf or the failure rate for the entire system,
there are also some drawbacks. A major disadvantage of analytical analysis
of complex systems is the complexity of the solutions. Calculating the
analytical reliability solution for a sizable complex system may tax the
resources of even the most powerful PC. In situations such as this, it may
be more advantageous to use simulation to determine the complex system's
reliability. This article discusses the methodology used by ReliaSoft's
BlockSim to simulate system reliability. (NOTE: you may want to
download a free
evaluation version of BlockSim in order to perform some of the
complex system is one that cannot be broken down into groups of series and
parallel components. In many cases it is not easy to recognize which
components are in series and which are in parallel in a complex system.
The following network is a good example of such a complex system:
the figure illustrates, this system cannot be broken down into a group of
series and parallel systems. If the system can be broken down into
series/parallel configurations, it is a relatively simple matter to
determine the mathematical or analytical formula that describes the
system's reliability. However, for a complex system, determination of the
system reliability becomes more involved.
article, we will look at some of the techniques that can be employed to
determine a system's reliability via simulation. It is assumed that the
reliability values for the components have been determined using standard
(or accelerated) life data analysis techniques, so that the reliability
function for each component is known. With this component-level
reliability information available, simulation can then be performed to
determine the reliability of the entire system.
Simulation in system reliability analysis is based on the Monte Carlo
simulation method that generates random failure times from each
component's failure distribution. The overall system reliability is then
obtained by simulating system operation and empirically calculating the
reliability values for a series of time values. Through the use of
computers, simulation has become a very popular analysis tool. Simulation
is simple to apply and it can produce results that can be rather difficult
to solve analytically. On the other hand, simulation methods also have
certain drawbacks, not the least of which is that the results depend on
the number of simulations, which results in a lack of repeatability. Other
drawbacks are that systems with static components (i.e., components in
which the reliability does not change with time) cannot be simulated, and
that most of the reliability optimization and allocation techniques cannot
illustrate how Monte Carlo data points are generated, we will demonstrate
how to generate times to failure based on a two-parameter Weibull
distribution with beta equal to two (
=2) and eta equal to 100 (=100).
The reliability equation for the two-parameter Weibull distribution is
0 < R(T) < 1. If we assume that the values of R(T) are
uniformly distributed over the interval between 0 and 1, then we can let
U, a uniformly distributed random number in the same interval,
represent R(T). Substituting U for R(T), beta (),
and solving for T yields:
This equation is valid for any uniform random number U, 0 < U <
1. The procedure is then repeated using newly generated random
numbers, U, until the desired number of simulated failure times,
T, are reached.
The same methodology, using different equations, is used for other
The system simulation methodology process is based on the Monte Carlo
simulation method which was described in the previous section. This is
different from the analytical methodology discussed in last month's issue.
While one can perform a Monte Carlo simulation based on the results of the
analytical system reliability solution, this should not be confused with
the methodology described below, which uses Monte Carlo simulation of the
individual components to estimate the overall system reliability.
the reliability simulation option requires a number of inputs. The first
input is the end time at which the reliability is to be estimated. The
second input is the number of increments. The end time is divided into the
number of increments specified. When the simulation is performed, a table
of reliabilities and instantaneous failure rates is generated for each
incremental time up to the end time. However, only the instantaneous
failure rate estimation is affected by the number of increments. The
Use Seed option allows the user to choose the seed value for the
generation of random numbers. Use of the same seed value will result in
identical simulation results, provided the other inputs remain the same.
The next two inputs for the simulation, the number of inner loops and the
number of outer loops, can be found on the Setup page of the
Reliability/Maintainability Simulation window. The product of the two
values will determine the total number of simulations to be performed. The
number of inner loops indicates the number of simulation points to be
generated for each component. The number of outer loops indicates the
number of repetitions of the inner loops. If, for example, 1000 inner
loops and 10 outer loops are to be performed, this means that first 1000
simulation points will be generated and the reliability of the system at
the end of each of the 1000 runs will be calculated. This will then be
repeated 10 times, each time with a new stream of random numbers for the
simulation points. This will yield 10 different system reliability values
each obtained from 1000 runs. The average of these 10 reliability values
will be the returned system reliability at the specified time.
In summary, the
simulation procedure consists of the following steps:
Step 1 - Decide on the number of points to generate (Inner Loops).
Step 2 - For each run, generate a random number between 0 and 1.
Step 3 - Obtain a failure time for each component based on this random
Step 4 - Keep the smallest time-to-failure with the corresponding
component (i.e., time-to-failure with a value less than the desired
Step 5 - Check which components or combination of components cause
Step 6 - The unreliability of the system is the number of times the
system was found to have failed divided by the total number of runs.
The reliability of the system is 100% minus the unreliability.
Step 7 - Return to Step 2 and repeat the procedure for the desired
number of cycles (Outer Loops).
Step 8 - The reliability of the system is the summation of the
reliabilities of the Outer Loops divided by the number of Outer Loops
(i.e., the average reliability).
System Simulation Example
In order to illustrate these principles, consider the following
Given that components A through E are identical, with a two-parameter
Weibull failure distribution with a beta value of 1.2 (
=1.2) and an eta value of 1230 (
=1230), determine the reliability of the system at 1500 hours. Note that
the Start and End blocks cannot fail. The Reliability/Maintainability
Simulation utility in BlockSim is used for this example. Since we are not
solving for system reliability using analytical techniques, the
reliability equation for the system cannot be obtained. However, a table
of reliability vs. time can be generated. First, open the
Reliability/Maintainability Simulation window. On the Reliability page,
End Time of 1500 hrs, 15 Increments, and a Seed Value
of 1. When you perform the simulation, these settings will generate a
table of 15 reliability values with the corresponding times and failure
On the Setup page of the Reliability/Maintainability Simulation window,
specify 5 Outer Loops and 10,000 Inner Loops.
This means that 10,000 random times-to-failure will be generated for each
component. This failure time will be compared to the simulation time
increment. If the failure time is less than the time increment, a failure
will be counted against the system. The system reliability is the ratio of
the number of successes to the number of trials (in this case, there are
10,000 trials). The process is repeated 5 times, and the results averaged
to get a system reliability value at each time increment. When the
simulation is complete, the Results Panel window will appear with the
you can see from the preceding table, the reliability of the system at
1500 hours is 0.1738, or 17.38%. This gives a simple demonstration of how
system reliability simulation works. While the technique is rather simple,
it also requires many repetitions in order to develop a realistic
solution, thus making the use of a computer necessary to be able to
perform the analysis in a timely fashion.
future issues of the Reliability HotWire, we will look at how simulation
can be used to determine a system's availability as well as its