In last month's Reliability Basics, we examined the reliability function - what it is and how it can be used. The concept of the lifetime distribution was introduced, as was the probability density function (pdf), which mathematically defines that function. The pdf for a particular distribution will contain a number of parameters, which can then be used in other functions derived from the pdf.
In this issue, we will look at how we can begin to determine estimates of the parameters for each lifetime distribution, based on test data. These estimates can then be used to construct reliability functions and plots, as well as other life data statistics, such as the MTBF. The simplest and longest-used method for parameter estimation is that of probability plotting. This methodology involves plotting the failure times on a specially-constructed plotting paper to determine the fit of the data to a given distribution and, if applicable, estimates of the distribution's parameters.
A distribution's probability plotting paper is constructed by linearizing the cumulative density function (cdf) or unreliability function of the distribution. Once this has occurred, the scales for the x- and y-axis of the distribution's plotting paper can be constructed and the plotting can commence. As an example, we will use the well-known Weibull distribution. The cdf or unreliability function of the two-parameter Weibull distribution is given by:
where and are parameters. We now need to linearize this function into the form y = mx + b:
If we now set:
the cdf equation can now be rewritten as:
This is now a linear equation, with a slope of and an intercept of ln( ). Now the x- and y-axes of the Weibull probability plotting paper can be constructed. The x-axis is simply logarithmic, since x = ln(T). The y-axis is slightly more complicated, since it must represent:
where Q(T) is the unreliability. In a similar fashion, the cdfs for other lifetime distributions can be linearized to construct the probability plotting paper. The final result for Weibull probability plotting paper looks like the following:
Note that since the mathematical expression for the cdf differs from distribution to distribution, the structure of the plotting paper will differ from distribution to distribution as well. These different types of plotting papers can be obtained through engineering supply stores or, more commonly, generated with various software packages. Probability papers generated by ReliaSoft's Weibull++ software for some commonly used distributions can be found at http://www.weibull.com/GPaper/index.htm.
The question now arises of how to plot our failure times on the plotting paper. We can see that the x-axis values will correspond to our failure times, since x = ln(T). Each x-axis value is simply the natural logarithm of each time-to-failure. What about the corresponding y-coordinate values to go with our x-coordinate failure times? Taking another look at the y-axis equation:
we see that the y-coordinate is based on Q(T), or the unreliability. This means that we need to come up with unreliability estimates for each of our failure times in order to plot the data on a two-dimensional plot. These unreliability estimates are accomplished with what are called median ranks.
Median ranks are based on a solution for the cumulative binomial distribution, based on sample size and failure number. The median ranks represent the 50% confidence level ("best guess") estimate for the true unreliability for a failure, based on the total number of failures and the order number (first, second, etc.) of the failure in question. There is also an approximation that can be used to estimate median ranks, called Benard's approximation. It has the form:
where N is the total number of failures and j is the failure order number. We will not delve any further into the derivation of the median ranks, other than to say that tables of median ranks can be found in many statistics and life data texts. (To read more about median ranks and probability plotting, click here.)
Based on Benard's approximation, we can now calculate unreliability estimates for each of our failure times. These are shown in the following table:
Now that we have y-coordinate values to go with the x-coordinate failure times, we can now plot our failure data on a Weibull probability plot:
The failure times plotted on Weibull probability paper fall in a fairly linear fashion, indicating that our choice of the two-parameter Weibull distribution was valid. If the points did not seem to follow a straight line, we might want to consider using another lifetime distribution to analyze the data. We can now draw a best-fit line through the points.
This line represents the model of the unreliability, as expressed by the linearized unreliability function, or cdf.
Determination of , or the Weibull slope, is relatively easy. As we saw when we were discussing the linearization of the two-parameter Weibull pdf, the slope of the linear equation is simply . In other words, the slope of the linearized line on the Weibull probability plot is equal to the Weibull slope (or shape parameter), . Many types of Weibull plotting paper have scales that allow one to read the slope of the line directly, rather than having to calculate it based on "rise over run."
By drawing a line parallel to the best-fit model line through the slope scale, we can see that the estimate for for this data set is approximately 1.4.
Mathematical manipulation of the Weibull cdf, or unreliability, equation will be required to determine the estimate of , the Weibull scale parameter. The two-parameter Weibull unreliability function is given by:
We want to be able to read the value of from the x-axis time scale, which can be expressed mathematically as T = . Substituting this into the Weibull unreliability function at T = , we get:
Hence, is where our best-fit unreliability model line intersects with a horizontal line extended from the 63.2% level of the unreliability, or y-axis scale:
As the graphic shows, the best-fit model line intersects the 63.2% unreliability line at approximately 44 hours. Therefore, the estimate for for our data is 44 hours.
This illustrates the basics of probability plotting for complete data using a two-parameter Weibull example. The methodology can be more difficult for other types of analysis. For example, if the data set contained suspensions, we would have to be able to account for them. This is dealt with by modifying the median rank values for the failure times, although that particular methodology exceeds the scope of this article. (For a more detailed discussion of this methodology, click here.)
There are also shortfalls to this method of parameter estimation. Besides the most obvious, which is the amount of effort required, manual probability plotting is not always consistent in the results. Two people plotting a straight line through a set of points will not always draw this line the same way, thus coming up with slightly different results. In addition, probability plotting can be very difficult for analyzing large sample sizes (e.g. warranty data). This method was used primarily before the widespread use of computers that could easily perform the calculations for more complicated parameter estimation methods, such as least squares and maximum likelihood methods. These methods will be discussed in future issues of Reliability HotWire.
Copyright © 2001 ReliaSoft Corporation, ALL RIGHTS RESERVED