Appendix A.1. Frequency vs. Bayesian Probability
Statistical description of the outcome of an experiment seeking to determine the value of an unknown parameter is essential under two scenarios: (a) when the analyst cannot make a deterministic statement about the value of the parameter, or (b) when the parameter itself is inherently random, so it does not assume a single value. An example of the first scenario is when the parameter has a fixed single value like a physical constant, but the experimental procedure introduces unavoidable sources of uncertainties, thus resulting in seemingly random outcomes for the measured parameter value. Mathematically, this is described as follows:
where
yi is the
ith recorded or inferred measurement of the parameter’s unknown true value
z. The measurements are contaminated by a random error
resulting from a number of uncontrolled experimental conditions. The estimate of the true value
z may be improved by repeating the same experiment a number of times
N or by conducting
N other experiments. This type of inference is referred to as
Bayesian inference, and the parameter
z is referred to as an
epistemic parameter.
In the second scenario, the parameter is inherently random, as dictated by the physics, such as the number of counts in a Geiger counting experiment; hence, its measured value is expected to exhibit random behavior, which may be described mathematically as follows:
Unlike the previous equation, the true parameter value
zi is inherently random, because it changes its value every time it is recorded. This value is referred to as an
aleatory parameter. The recorded values
zi may be split into two terms emulating Equation (A1), as follows:
whereas Equations (A1) and (A3) are mathematically similar, their interpretation is distinctly different. Equation (A1) asserts that
z, the true value of the parameter, is single-valued, but its inference is obfuscated by the uncontrolled random experimental errors. If a high precision experiment is employed, then the random term in Equation (A1) will be significantly diminished, ultimately approaching zero in the limit of perfect and/or many repeated measurements. In Equation (A3), the split into two terms
and
is artificial. It represents a mathematically convenient way to describe an aleatory parameter in terms of a constant term
, representing the mean parameter value, and a random term with zero mean
, representing deviations from
. Note that the random term
will not vanish if perfect measurements are possible. Therefore,
may be thought of as a mathematical feature that compactly describes the distribution of the random values assumed by the aleatory parameter. Other mathematical features may also be derived, such as the standard deviation
of the random term
, and the tolerance interval containing a certain portion of the random values, which is defined in the next subsection.
Note that the extracted features, and σt, have fixed single values, despite the aleatory nature of the parameter they describe. The implication is that one may resort to Bayesian or epistemic methods to infer the features of an aleatory parameter. Conversely, frequentist or aleatory methods may be used to describe an epistemic parameter. For example, the variable y in Equation (A1) may be regarded as an aleatory parameter, since it is contaminated by uncontrolled random errors. If the PDF of the aleatory parameter can be captured, then its mean value may be used to infer the unknown epistemic parameter value z. Examples of the interaction between aleatory and epistemic methods are shown below.
Figure A1 plots two PDFs that describe the true values assumed by typical epistemic and aleatory parameters. The epistemic parameter is described by a delta function (in black) centered around the true value—a mathematical abstraction of a PDF with zero spread. The other distribution describes an aleatory parameter which assumes a range of values resulting from its inherent randomness. The two types of uncertainties are relevant to the TSURFER methodology. The variable
y in Equation (A1) may represent the measured
keff value from a number of experiments in which the measurement process is contaminated by numerous sources of aleatory and epistemic uncertainties. The experimentalist typically lumps these uncertainties together in the form of a single source, and the standard deviation of the source may be reported with each measurement. The variable
z may represent the cross sections for which the true mean values are unknown.
Cross sections are an example of aleatory parameters that are treated as epistemic in the TSURFER methodology. In theory, cross sections characterize the probability of interaction between a nucleus and a neutron, which is an inherently random event, implying aleatory treatment. However, the cross sections evaluation procedures [
36] (a combination of differential and integral measurements and analytical methods) result in a mean value estimate (dotted blue line) that is significantly different from the true mean value of their aleatory spread (dotted black line), thereby justifying their treatment as epistemic.
Figure A1.
PDF Examples for True Values of Epistemic and Aleatory Parameters.
Figure A1.
PDF Examples for True Values of Epistemic and Aleatory Parameters.
Appendix A.2. Statistical Sampling and Inference Analyses
Like deterministic calculations, two analyses can be defined in statistics: the sampling and the inference analyses, representing the equivalent of deterministic forward and inverse analyses, respectively. In the sampling analysis, the true PDF is known, and the goal is to generate samples that are consistent with that PDF, often to be employed in subsequent downstream calculations such as sampling the cross sections from their prior PDFs to calculate the corresponding PDF of keff for a target application or critical experiment. Consistency can be measured in a number of ways, such as by comparing features (e.g., mean, standard deviation, tolerance intervals) that are extracted from the true PDF and the available samples. In the inference analysis, N samples of the parameter, or functions thereof (e.g., samples of experimental keff which are related to the target application keff and/or the nuclear data) are presented with the goal of determining the true PDF that generated the samples. In many practical applications, the objective is to determine features like the PDF tolerance levels rather than defining the entire PDF. This objective represents the key focus of this work.
An important byproduct of sampling analysis is the ability to determine the probability that a sample is drawn from a certain range of values within the distribution. This is a key requirement for the inference analysis when a single measurement is available, as illustrated in the next subsection. Statements such as the following are typical: Prob (
z <
zH) = 0.9, where
z is an aleatory parameter, and
zH is some fixed upper limit. This statement asserts that there is
chance that a single sample has a value lower than
zH. This statement, while probabilistically posed, has the following deterministic interpretation:
if the entire population’s values are known (i.e., only achieved with infinite number of samples), then exactly p fraction of these values will be below zH. Even though this statement has no uncertainty, it is unrealizable in practice. With a finite number of samples, this
p fraction is no longer an exact deterministic number. This can be demonstrated by drawing
N samples and recording the fraction of values that are below
zH, and then repeating the experiment by drawing
N new samples, and so forth. In each experiment, a different fraction
p is calculated, representing the fraction of the
N values that are below
zH. The recorded values of
p are expected to have a distribution of their own, as shown in the three subplots on the left in
Figure A2 for different values of
N. Note that the distribution of
p values intuitively shrinks as
N increases, ultimately converging to a delta function centered around the exact value of
p, which can be calculated from the
z distribution for the given threshold value
zH; for example, in a normal distribution with a zero mean and a unit standard deviation,
p = 95% for
zH = 1.65. These are the numerical values employed in
Figure A2. The plots show that when
N is finite, there is always a non-zero probability to calculate values for
p that are less than the true value of 95%. For example, in the case of
N = 1000, the distribution of
p values is approximately centered around the 95% value, suggesting that there is 50% chance that the threshold value of
zH = 1.65 will contain at least 95% of the distribution. There is an equal 50% chance that coverage of less than 95% would be obtained. Clearly, if
N continues to increase, then a 50% chance that the calculated
p value will be less than 95% will continue to exist. However, the range of values is shrinking closer to the 95% value.
If
zH is multiplied by a number greater than 1.0, then the probability of covering at least 95% of the population will increase above 50%, as demonstrated on the right three subplots of
Figure A2, which employ a threshold value of
zH = 2.0. These graphs suggest that when
N > 1000, there will be a near-zero (in reality a very small number) chance that the calculated
p value will be less than 95%, thus, providing much more confidence than 50%. Hence, the confidence in the estimated coverage can be increased by simply increasing the size of the tolerance interval. This illustrates the basic concept behind tolerance interval estimation techniques. Details on how this can be achieved for various distributions are presented below.
Figure A2.
Tolerance Interval α-Coverage with Finite Samples.
Figure A2.
Tolerance Interval α-Coverage with Finite Samples.
With the epistemic uncertainty, the sampling analysis is trivial, since all samples will assume the same value. In reality, the true PDF is unknown, so the samples are generated based on a prior PDF, representing the best knowledge about the parameter. In the inference analysis, the goal is to improve the estimate of the true PDF in order to approach the shape of the delta function. Thus, upon the introduction of new measurements, a successful inference analysis for epistemic uncertainty will generate PDFs that are successively shrinking their spread and ultimately converging to the true value in the form of a delta function. With aleatory uncertainty, the sampling analysis is tasked with generating samples with different values to emulate the aleatory nature of the parameter. Although this is a straightforward process, the additional step of generating a cumulative density function (CDF) with values between 0 and 1 is required. Details on this step are not relevant to the current discussion. The goal of the inference analysis is to use the available samples to recover the true PDF. In most practical problems, it is much more convenient to infer features extracted from the PDF rather than the full PDF. For example, with a normal distribution, it suffices to know the mean and standard deviation to fully describe the distribution. Other features are also valuable, such as the
K-coverage parameter, which defines an interval, which is often centered around the mean and stretched by the standard deviation for normal PDFs, according to the following:
where
and
σt are the true mean and standard deviation of the distribution. The
K-coverage parameter defines an interval which covers a portion
α of the area under the PDF. Note that
α has units of probability or fraction, and
K is dimensionless, such that
Kσt has the units of the original parameter
z. Both are referred to as the
coverage parameters, with
K determining the size of the interval, and
α determining the associated portion covered by that interval. Tables may be found relating
K and
α for many key distributions, including normal distribution,
χ 2-distribution,
t-distribution, and non-central
t-distribution.
In aleatory analysis, the K-coverage provides information on the interval, which is expected to contain the most variations of the parameter. This is valuable in manufacturing processes that involve assurances on product quality. In epistemic analysis, the K-coverage informs the interval that is most likely, with confidence α, to contain the real value of the parameter. This information is key to quantifying risk as measured by the consequences of the parameter being outside this interval. This interval is referred to as the tolerance interval, and it represents the goal of this work, where the z parameter refers to the bias calculated by TSURFER and the objective is to create an upper limit zH to cover α = 95% of the distribution with 95% level of confidence. As mentioned earlier, if one employs infinite number of samples N to calculate the tolerance limit, the confidence would be 100% that zH covers 95% of the distribution. Therefore, the goal is to find a mathematical approach to estimate a multiplier K that brings the confidence up to a user-defined value.
Appendix A.3. Interaction between Aleatory and Epistemic Methods
This section discusses the key enabling mathematical principle that allows the analyst to seamlessly transition between the aleatory and epistemic analysis methods/interpretation. The true PDF that generated the samples in Equations (A1) and (A2) can be described as follows, assuming a normal PDF, which is one of the most commonly used PDFs in statistics,
where the proportionality implies that a constant is needed to normalize the area under the PDF to 1, (the exact definition of which is omitted since it does not benefit the discussion). In aleatory settings, this PDF describes the distribution of all the random values
z assumed by the aleatory parameter—the mean value of which is
, and the standard deviation is
—typically referred to as the
sampling distribution. A basic statistical result states that if
N measurements of
z are recorded, then the following can be shown [
2]:
and furthermore, the integral,
measures the frequency (or fraction of times) with which the parameter
z assumes values between the two limits
zL and
zH. This statement implies that the parameter is expected to assume values outside this interval with a probability of 1 −
p. This interpretation of Equation (A5) represents the basis of the sampling analysis. As mentioned above, if
N is indeed infinity, then the above statements in Equations (A6) and (A7) produce exact values for these quantities.
In the inference analysis, the objective is to build an estimate of the PDF in Equation (A5) using a finite number of samples. More importantly, if additional samples become available, then the estimated PDF must be updated and converged to the true PDF given in Equation (A5). If only a single measurement is available, then the integral in Equation (A7) allows for the assertion that there is a probability p that the single sample belongs to the interval (zL, zH). As discussed below, this provides an aleatory-based approach for estimating confidence intervals for quantities such as the true mean and standard deviation of an aleatory parameter.
In the epistemic setting, this same PDF may be written as follows:
This PDF has the same form as that in Equation (A5), but the
z value is assumed to be single-valued and known, and
is the unknown. This describes a scenario in which a single sample has been measured, and in the objective is to infer the value of the unknown parameter
. In statistical literature [
5], this PDF is often denoted by:
and is referred to as the
distribution conditioned on the measurements, which is referred to herein as the
inference distribution to distinguish it from the sampling distribution. Unlike the aleatory case, this PDF does not imply that
assumes random values; instead, it measures the analyst’s degree-of-belief about the true value given the single measurement
z. For example, if the first sample
z1 is very far from the true mean
, then the corresponding PDF will be centered around
z1, implying that
is likely to be close to
z1. Although this may seem to be incorrect, it is the best guess possible, based on the single measurement
z1. Mathematically, the interval in Equation (A7) will be used to measure the analyst’s confidence that the true value lies between
zL and
zH. This probability
p hedges against the analyst’s lack of complete knowledge about the true value of
by stating that there is a 1 −
p chance that the true value might lie outside the given interval. With very few samples and high measurement uncertainty, the probability
1 −
p is expected to be high, but it will gradually approach zero via the consolidation approach (explained in the following section) as more measurements are accumulated.
Appendix A.4. Consolidation of PDFs
A key mathematical tool used in the inference analysis consolidates the PDFs obtained from multiple independent sources/different experiments. This mathematical approach allows for the unique differences between epistemic and aleatory uncertainties. For the aleatory case, the following two PDFs can be consolidated, based on
N1 and
N2, respectively, as independently recorded samples of an aleatory parameter
z:
The first (second) PDF implies that, based on the available
N1 (
N2) samples, the mean value of the samples is
and the standard deviation of the samples is given by
. As these PDFs have the meaning of frequency, they can be consolidated according to the following rule:
This straightforward averaging approach is consistent with the frequency interpretation of aleatory uncertainty. It is equivalent to combining all samples from the two experiments into one experiment and recalculating the frequencies of the various assumed parameter values. This consolidation approach ensures that if the two PDFs are exactly the same, then the resulting PDF will assume the same shape, which is intuitively sound.
For the epistemic case, consider the following two PDFs:
Each of these two PDFs provides a measure of the degree-of-belief about the true value of the mean value
based on a single measurement. In practical applications,
z1 may represent the best estimate from prior knowledge, which is the reference value for
keff for a given model of a critical benchmark experiment, and
z2 is the corresponding real measurement for
keff. For a critical benchmark model describing a thermal flux spectrum, typical values for
are on the order of 500–700 pcm, and a typical value for
, representing the measurement uncertainty, is on the order of 150 pcm. Bayesian consolidation of these two PDFs gives [
37]
This consolidation approach recognizes that with additional knowledge, the consolidated PDF must converge to the true delta-function–looking PDF, since the parameter has a single value rather than a range of values, as is the case with an aleatory parameter. Multiplying the PDFs achieves this goal. (A simple social example is provided here for illustration. Consider asking a random person about which road to take to reach a certain destination, with the options being limited to Road A or Road B. The first individual recommends Road B and claims to be 80% confident. If another individual provides the same information independently of the first individual, the overall consolidated confidence that Road B is the correct way should increase. One can show that the confidence increases to (0.8 × 0.8)/(0.8 × 0.8 + 0.2 × 0.2) = 0.94. However, if the individuals provide contradicting information (e.g., if the second individual recommends Road A with 80% confidence), then the consolidated knowledge should assign equal confidence to both ways, for 50% each, thus representing the state of complete ignorance. This can be achieved via PDF multiplication. Averaging the PDF would not result in increased confidence when consistent knowledge is gleaned from multiple sources). This consolidation approach ensures that if similar PDFs are consolidated, then the resulting PDF will have a smaller spread, indicating higher confidence in the true value of the parameter. This approach is depicted in
Figure A3.
Figure A3.
Bayesian Consolidation of Two Prior PDFs.
Figure A3.
Bayesian Consolidation of Two Prior PDFs.
Appendix A.5. Aleatory and Epistemic Mean Value Estimation
One interesting observation is that while
z may be an aleatory parameter, its derived features,
,
, and
K, are epistemic because their true values are single-valued. The implication is that one could transition between the two viewpoints as dictated by the problem, as demonstrated in this and the next few subsections. This subsection starts with the simplest inference problem, in which the true value
is unknown; however,
is known, and the PDF is known to be normal, which defines the relationship between
K and
p, leaving
to be the only unknown. This problem may be used to describe both epistemic and aleatory uncertainties per Equations (A1) and (A3), respectively. For illustration, assume that one is interested in estimating the true value of an epistemic parameter per Equation (A3). The Bayesian approach employs a sample
zi to calculate an epistemic PDF for
of the form:
According to Bayes, the best-estimate epistemic PDF that consolidates these
N PDFs (assuming no prior knowledge) is given by
where it can be shown that [
6]
The resulting PDF is normal and is centered around the mean value of the samples , and its standard deviation is 1/ smaller than σt, the true standard deviation of z. As N goes to infinity, the resulting PDF converges to a delta function centered around the true mean value . The implication here is that while the parameter z is aleatory, the inference analysis has adopted an epistemic setting to extract a feature of the aleatory parameter PDF: that is, its mean value .
The same inference problem can be solved using an aleatory setting. This requires defining an estimator, which is a function of the samples that can be used to estimate the unknown quantity of interest, or the true mean
. The idea of using an estimator is inspired by the features definition in Equation (A12). This follows because it is desirable to use an estimator that converges to the true value with an infinite number of samples. If the number of samples is finite, then these very definitions are expected to produce random values, thus allowing for an aleatory treatment. (The same approach for an estimator is discussed in relation to
Figure A2). The following estimator
zm of the mean is thus intuitively reasonable:
Note that
zm is a new aleatory parameter; it can be thought of as the mathematical average of
N aleatory parameters, all having the same PDF as the aleatory parameter
z. To generate a sample for
zm,
N samples must be generated for the parameter
z, and their average must be calculated (this basic approach is used by TSURFER whereby
N experiments are employed to estimate the bias, with each experiment producing a single sample measurement. See the discussion in
Section 2.2 regarding Equation [
7]). Assuming the true PDF of
z is known, the true PDF of
zm can be predicted as [
1] (this can be verified numerically by creating many samples for
zm, each of which is generated using
N random samples of
z, and finally, building a histogram for the samples for
zm. This process can be repeated using different values for
N. As
N increases, the histogram for
zm will shrink at a rate proportional to
):
Interestingly, this PDF has the same mean value as the true mean value of
z, which is
, and its standard deviation is related to that of
z by
. The implication is that if this experiment can be run many times, with each run producing a sample for
zm, then a distribution will be generated that is centered around the true mean value
and a standard deviation
. An example is shown in
Figure A4 for a normal distribution, with
= 1 and
= 2, for various values of
N, with
N = 1 representing the original distribution.
However, this approach is not a practical solution, since it requires many executions of the experiment. If this approach were feasible, then a PDF could have been built for the original parameter
z directly rather than employing an estimator
zm. To overcome this challenge, recall the approach employed above to establish confidence with a single measurement
zm. Since the goal is to estimate the fixed value of
, an epistemic treatment per Equation (A8) is possible. Therefore, the following equation can be used to characterize confidence about
:
This is the same PDF as the true PDF of the estimator zm in Equation (A14), but now as a function of , with zm serving as a fixed single-valued measurement from the experiment. Per Equation (A7), confidence p can be established that the true value lies within the interval (zm − K, zm + K).
Figure A4.
Distribution of the samples' mean with known standard deviation.
Figure A4.
Distribution of the samples' mean with known standard deviation.
A discussion of an aleatory approach to characterize confidence using the concept of confidence interval follows. The objective is to find an interval that contains the true mean
. As discussed above, such intervals are typically described using a single
K-coverage parameter. As depicted in
Figure A4 (subplot with
N = 30), there is a
p chance that the estimated value
zm, which represents a single sample drawn from the distribution in Equation (A11), will lie in the interval
(represented by the two blue lines surrounding the mean value indicating by the green line), i.e.,
This statement can also be interpreted as follows: there is
p chance that the estimator’s value will be within no more than
units away from the true mean
. Using numerical values of
p = 0.95,
K = 2,
N = 100,
, and
zm = 1.2, there is 95% chance that |
− 1.2| < 0.3, i.e., the true mean is over- or under-estimated by at most 0.3 units from the value 1.2. The interval [0.9, 1.5] is called a 95% confidence interval for
and it allows the above equation to be rewritten as follows:
These two statements are equivalent simply because the true standard deviation of the estimator is known. The first statement shows that zm cannot be away from the true mean by more than with p confidence, and the second statement indicates that the true mean cannot be away from the single sample zm by more than with the same confidence p. The second statement can be readily calculated using the Bayesian PDF by integrating over between the two limits, zm − and zm +. If interpreted in an aleatory sense, this statement hedges only once, emphasizing the fact that if the experiment is repeated M times, then (1 − p) M of these intervals will fail to contain the true mean . It is interesting to note, here, that the true mean is an epistemic parameter, whereas the confidence interval concept is aleatory. If the confidence interval is interpreted in a Bayesian sense (it is called a credible interval), then based on the single measurement, the analyst thinks that the true value may be outside the interval with confidence 1 − p, thus measuring a lack of confidence in the single measurement as 1 − p.
This discussion highlights the mechanics of an inference analysis that started with an aleatory approach to form an aleatory estimator, per Equation (A13), and its associated PDF per Equation (A14), which then switched to an epistemic approach per Equation (A15) to characterize confidence using a single measurement. Note that the aleatory approach required the selection of an estimator, a suitable functional form to estimate the quantity of interest with the given samples. Clearly this decision can be difficult for general quantities of interest, as shown below for quantities such as the K-coverage parameter, in which both the true mean and the standard deviation are unknown. Thus, a clear advantage for the Bayesian approach is that it did not require the use of estimators. It only required interpretation of the original PDF as a measure of confidence for the unknown quantity given a single measurement in the sense of Equation (A8). With additional measurements, a straightforward approach employing multiplication of PDFs, as in Equation (A11), resulted in a PDF that automatically improves the estimate of the mean. Interestingly, it could be argued that the Bayesian approach provides insight on what estimator function should be used for aleatory analysis, simply because its definition emerges naturally following the successive multiplication of the N PDFs from the N samples.
As explained above, the mean value represents one possible feature that may be extracted from the PDF. Other important features include an interval or a range of values that covers a user-defined portion of the distribution. This was defined earlier as the
K-coverage parameter, which covers an
α portion of the distribution. This interval is referred to as the
tolerance interval in both frequentist and Bayesian statistics. As an example, using the sampling distribution in Equation (A5), it can be shown that the following interval
contains 97.5% of the population of random values for an aleatory parameter, or a 97.5% degree-of-belief for the true value of an epistemic parameter. As discussed above, if infinite samples are available to estimate
, then the upper limit of this interval would be exact. Employing a simple estimator
(with
K = 1.96) for the true upper threshold
of this tolerance interval, the sampling distribution for
is given by
This distribution is simply a shifted version of the distribution in Equation (A14) since the true standard deviation
is known. When using a single estimate of
zm, it is clear that there is a 50% chance that the estimated
zm will be higher than the true mean
, causing the estimated upper threshold
to be higher than the true upper threshold
. The implication is that there is 50% chance that the estimated tolerance interval
—which is based on a single estimate of
zm—will contain less than 97.5% of the population of random values. To increase the confidence above 50%, the value of
K can be increased above 1.96 to make up for the under-estimated
zm value: a higher value of
K is needed to ensure the following:
re-arranging as follows:
This equation shows that if it can be ensured with 100% confidence that K remains above the noted value, then the estimated tolerance interval will contain at least 97.5% of the population of z values. This is only possible if K is infinite, because a single sample of zm can potentially have very small values. In practice, it suffices to ensure that zm is only a few standard deviations away from the mean, which is expected to occur with a very high probability. For example, per Equation (A14), there is only 2.5% chance that . Inserting that in Equation (A19), it is straightforward to see that with (100−2.5)% = 97.5% confidence, the following value ensures that at least 97.5% of the z values will be contained in the interval upper-limited by . This is referred to as an upper tolerance limit with 97.5% probability and 97.5% confidence. If N is infinity, then K approaches the minimum value required to contain 97.5% of the population with 100% confidence. As N decreases, K must increase to make up for the uncertainty in zm, and this reduces the confidence below 100%, because it is unrealistic to select K to be infinity.
In the previous example, both the aleatory and epistemic treatments have led to similar results. It is thus instructive to determine when the two approaches would be different. The key difference lies in the consolidation of prior knowledge. To understand this difference, assume that the confidence interval has been established for the true mean based on
N1 measurements of
z. These results can be rendered using the estimator in Equation (A13) in the form of a PDF in Equation (A14). When additional measurements are made using
N2 samples, a new PDF may be independently constructed. The Bayesian approach only requires access to the prior PDF that was generated with
N1 samples to consolidate it with the new PDF. However, the aleatory consolidation requires not only the prior PDF, but also the number of samples used to generate it, as well as the functional form of the estimator, to ensure consistency among both sets of samples
N1 and
N2. Effectively, using the aleatory approach is equivalent to conducting a third virtual experiment that combines all available samples and employs a unified estimator. In practical settings, when knowledge is obtained from multiple sources, the details on the inference process (e.g., the exact functional form of the estimator and the number of samples) are often not well-documented. Results are often communicated in a minimal manner as the confidence interval and/or the associated PDF, for example. Therefore, it becomes difficult to justify consolidating knowledge from multiple sources as shown in Equation (A9). To overcome this problem, approximate methods [
16,
38,
39] have been developed to consolidate knowledge for the tolerance interval from multiple sources. This problem is nonexistent with the Bayesian consolidation approach, as reflected in Equation (A10). Therefore, Bayesian methods have the advantage of consolidating knowledge from multiple sources, precluding the need to track the details of the inference analysis from each source.
Appendix A.6. Standard Deviation Inference
This subsection extends the inference analysis to characterize confidence in the estimation of the true standard deviation
using
N samples of the aleatory parameter
z. The PDF shape is assumed to be normal, and only
is assumed to be unknown. The following section addresses the more general case in which both the mean and the standard deviation are unknown. Employing an estimator of the standard deviation, emulating the true value obtained with infinite number of samples per Equation (A6),
As shown above, the estimator
sm is expected to be an aleatory parameter since it is based on a finite number of samples. An analytical expression can be derived for the true PDF of
sm, which is generated numerically for different values of
N with
= 2, in
Figure A5. It can be shown that the distribution of
is given by a
χ 2-distribution with
N degrees of freedom [
2]. With increasing
N, the distribution approaches a normal distribution, and the corresponding distributions for
sm (i.e., dividing the
χ 2-distribution by
N and taking the square root) ultimately converge to a delta function centered around the true value of the standard deviation. (As mentioned above, while the
χ 2-distribution can be analytically derived, it can be numerically verified by running an experiment in which
N samples are generated, from which an estimate of the variance is calculated; the process is repeated many times, building a histogram for the recorded variance samples. If all the recorded variances are divided by the real standard deviation, then the normalized form of the
χ 2-distribution is obtained. The value of numerical verification of the estimator distribution is stressed, because when the true PDF is not normal, then the analytical results relating the confidence to the K-coverage parameter can no longer be developed, so numerical methods must suffice. This provides a clear, intuitive method for developing the relationship between
p and
K, as shown in the subsequent subsections).
Figure A5.
Distribution of the samples' standard deviation.
Figure A5.
Distribution of the samples' standard deviation.
It can be shown that the shape of this distribution is unaffected by the true value, so it is possible to describe the PDF (it is also common to work directly with the variance rather than the standard deviation, since the
χ2-distribution is a function of the variance. However, this subtlety is bypassed here for the sake of using simpler notations).
in terms of a normalized variable
. As in the previous analysis, this PDF allows for calculation of the confidence
p that a single sample, such as
, will lie within a given interval. As the goal is often to estimate an upper bound on the true standard deviation
, the following integral can be used to describe the confidence in the estimated value
:
This integral can be readily calculated since the form of
is analytically known. This integral is intuitive, because it asserts that all estimates
greater than
contribute to the probability
p. The implication is there is 1-
p chance that the true value
will exceed the estimated value
, which is given by:
In practical applications, with
N being relatively small, the value of 1 −
p might be too high to be satisfactory. Two useful mathematical approaches may be used to reduce the probability while not requiring a significant increase in
N. The first approach involves the use of a multiplier, which serves as a conservative approach against the under-estimation of the standard deviation. This is achieved by multiplying the estimated
by a fixed multiplier
K and converting the integral above to the following:
This shrinks the upper limit of the integration by a factor K, hence increasing the confidence p that the new limit is above the true standard deviation . This practice is very common among engineers, in which an additional multiplier is used to obtain a more conservative upper-bound on uncertain quantities of interest.
The second approach is to generate two estimates of the standard deviation,
and
, and take the maximum
to represent the best estimate of the standard deviation. It is easy to show that
This equation implies that
will be lower than the true standard deviation
, only if both estimates are below the limit, which results in squaring the probability since the two estimates are independently drawn form
. This approach is known as
order statistics [
40,
41], an example of which is Wilks statistics [
14,
15]. Combining the two approaches shown above using a
K multiplier and the maximum of
n independent estimates of standard deviation results in much higher confidence
p:
If the integral in Equation (A21) is initially equal to 0.22 for N = 3, implying an initial confidence of p = 0.78, then K = 1.25 and n = 2 increase the confidence to 0.98. These numbers may be interpreted as follows. First, nN = 6 samples of the aleatory parameter z must be drawn and divided into two batches, n = 2, with each batch containing N = 3 samples. From each batch, a single estimate of the standard deviation is calculated per Equation (A20), and , and the maximum of the two is multiplied by K = 1.25 to form the best estimate for the standard deviation . This estimate is expected to be higher than the true value with p = 98% confidence. The implication is that if this experiment is repeated M times, then the estimated values will be lower than the true value in 0.02M of the repeated times, or 1 − p = 0.02.
The previous discussion provides a method to estimate the true standard deviation of the PDF of an aleatory parameter. The confidence in the analysis is described using a probability measure
p. The language suggested a single hedging, as before. However, these results can be used to introduce the concept of double hedging presented earlier. Recall that
is a fixed feature extracted from the true PDF of
z. It may be interpreted as the square root of the average squared distance from the mean. In this interpretation, it is only necessary to hedge once, as in
p (
sm >
) = 0.98. The double hedging arises if
sm is interpreted as a
K-coverage interval with
K = 1, or
, thus representing the interval that captures approximately 68% of the normally distributed random values:
this interval is referred to as a
tolerance interval with
K = 1. By definition, a tolerance interval is sought to capture a certain portion
α of the population. If the distribution is known, then there is a one-to-one relationship between the
K-coverage parameter and its corresponding
α. For example, in a normal distribution, a
K = 1 corresponds to
α = 0.683,
K = 2.0 to
α = 0.955,
K = 3.0 to
α = 0.997 (Values rounded to the third significant figure), and so on. When the true standard deviation is known, these statements are exact, and their confidence is 100%. The hedging here refers only to the aleatory nature of future samples. However, when only a single estimate of the standard deviation is available, then it is possible that with probability 1 −
p,
sm is less than
, thus implying that the coverage
α will be less than its true value. Using the following two equations,
one can calculate the exact coverage
α corresponding to different estimates
sm using the true parameter PDF, as well as the associated confidence
p using the
χ2-distribution. This information can be translated into economical or safety metrics that can be used to determine the appropriate values for the required coverage and the confidence
p.
If a Bayesian approach is used, then the same PDF,
, can be used to characterize confidence in the true value
by treating
sm as a fixed value and
as the unknown, as shown in the previous subsection. The PDFs corresponding to different estimates can be consolidated together via multiplication [
6]:
Appendix A.7. Simultaneous Inference of Mean and Standard Deviation
This subsection discusses the more general case in which both the mean and standard deviations are to be inferred from the samples. As before, the first step is to select the estimators for the mean and standard deviations:
Note the two changes in the definition of
sm: (1) it employs the mean value estimator
zm since the true mean
is unknown, and (2) it divides by
N-1 rather than
N. This is explained in terms of the available number of degrees of freedom (DOFs), a key concept in statistical inference. If there are
N random samples of the parameter
z, then they may be regarded as a vector in an
N dimensional space, as denoted by
= [
z1 z2 … zN]
T. A coordinate system is required to describe the components, with the samples representing the components in the standard coordinate system. Clearly, different coordinate systems will result in different components obtained via
where
is the
ith component in a different coordinate system, as described by
N orthonormal vectors,
. It is important to ensure that the statistical properties of the samples, as calculated by the selected estimators, remain invariant to the choice of the coordinate system. Invariance implies that, if an estimator of the standard deviation is calculated, then it should have the same value, regardless of the coordinate system employed. As shown below, when the estimated mean
zm is subtracted from all the samples, it can be shown that there is a coordinate system that will always have one of its
N components zeroed out. This implies that the true number of DOFs has been diminished by one. To show this, recall that the sample mean is given by
This equation shows that the sample mean is the projection (weighted by
) of the random vector
onto the unit vector
. The residual vector is thus given by
where
is an orthogonal projector designed to remove the component along the vector
. This implies that, while the residual vector has
N components, it only has
N − 1 random DOFs, with the component along the direction
zeroed out.
As the estimated standard deviation should be invariant to linear transformation, the estimator function should recognize that the transformed random vector, while still random, has only
N − 1 random components. Therefore, the mean may be thought of as a mathematical feature extracted from the random vector component along the direction
, and the standard deviation may be considered as another feature extracted from the remaining
N −
1 components, which are independent of the component along the direction
. As expected, this ensures that the two estimators are independent, as shown in the scatter plot provided in
Figure A6, with the
x-axis representing the sample mean values, and the
y-axis representing the sample standard deviations.
Figure A6.
Joint distribution of samples' means and standard deviations.
Figure A6.
Joint distribution of samples' means and standard deviations.
The distribution of the
sm values is discussed first. As described in
Appendix A.6, the true PDF of
is the
χ2-distribution can be shown analytically with
N − 1 DOFs. This is not surprising, because the removal of the sample mean reduces the DOFs by 1. It also ensures that the distribution of
sm is independent of the distribution of
zm. This implies that the distribution of
sm will be the same, regardless of the value of
zm, even if the true value
is unknown. Hence, the discussion in
Appendix A.6 still applies when estimating the true standard deviation, even when the true mean value is not known. Therefore, one can calculate the confidence that the estimated standard deviation bounds the true standard deviation as follows:
If the mean is known, then this formula can be used to establish a tolerance interval with
K-coverage and confidence
p, as shown in the previous section. As discussed above, if the true standard deviation is known, the confidence becomes
p = 1.0. However, if neither the true mean nor the standard deviation is unknown, then different samples will result in different sizes of the tolerance interval, thereby resulting in different coverage
α. That is, for every sample of the mean and standard deviation, a different
K-coverage parameter would be required to have the same
α coverage. Unlike the case presented in
Appendix A.5, in which
K must compensate for uncertainties in the mean value only, in this case,
K must compensate for uncertainties in both the estimated mean and the standard deviation.
Figure A7 illustrates this scenario, in which samples of the mean and standard deviation are shown for the case of a normal distribution with
= 1,
= 2, as indicated by a red point. All values are calculated based on a small number of samples:
N = 10. The purple line traces the values that satisfy
with
K = 1.65, namely,
, which corresponds to a coverage of 95%. All samples above the line will result in tolerance intervals with upper threshold values that are larger than the value
, which is the minimum required to have 95% coverage. All values below the line will result in lower upper thresholds and hence coverage that is lower than 95%. Consider for example the black point (with
zm = 0.25 and
sm = 1.5), which requires a
K value of 2.7 to reach a 95% coverage. Thus, the goal of the tolerance interval estimation reduces to finding the minimum value of
K to ensure, with 95% probability, that the estimated mean and standard deviation will produce an upper threshold value greater than
.
Figure A7.
Joint distribution of mean values and standard deviation.
Figure A7.
Joint distribution of mean values and standard deviation.
First, consider the purple line that passes through the true mean
and the true standard deviation
. The points above that line are numerically calculated to be 46% of the total, which is not exactly equal to 50%, because the distribution of standard deviation values is not symmetric with low values of
N. Above
N = 50, the distribution of the standard deviation approaches the symmetric normal shape. Next, consider the green line that traces the points satisfying the equation
, increasing the value of
K. The number of points above this line is increased to approximately 88%, meaning that the confidence associated with the upper threshold value of the tolerance interval increased from 46 to 88% by a moderate increase in the
K value. This implies that, by increasing the value of
K, one can have higher confidence that a single estimate of the sample mean and standard deviation can produce an upper threshold for the 95% tolerance interval. This exercise can be repeated by gradually increasing the
K value until 95% of the points are above the line, which is found to be 2.9 for this example. Fortunately, the joint distribution of
zm and
sm can be analytically derived, allowing one to find the distribution of
K values, which can be integrated numerically to obtain the minimum value required to reach a probability of 95%. This distribution is called the
non-central t-distribution, which is discussed below [
1,
2].
Using the two estimators for mean and standard deviation, define the following estimator:
This estimator is inspired by the quantity appearing in the exponent of Equation (A14), by replacing
by
sm. This estimator recognizes that the true value for the standard deviation is unknown, so it is reasonable to try the sample standard deviation. This quantity has a
t-distribution [
42], which looks very similar to the normal distribution, with the distinction of only practical relevance for small values of
N. For low values of
N, the
t-distribution appears to be wider than the normal distribution, with heavier tails. The
K-coverage parameter for a given portion of the distribution is expected to be larger than that calculated from the normal distribution. Repeating the previous analysis from
Appendix A.5,
leading to the following minimum value of the estimator
Km:
which ensures with 100% probability that the tolerance limit
covers at least
α% of the population, where α represents the exact coverage obtained with the true parameters
,
and
. The distribution of the
Km values is called the
non-central t-distribution [
43], as shown in
Figure A8 for the case of
= 1.0,
= 2.0, and
K = 1.65, corresponding to 95% coverage, shown as a red vertical line. As demonstrated above, if the objective is to ensure 95% coverage with a given probability
p%, then the
K value must be increased until the area under the curve is at least
p%.
Figure A8.
Noncentral t-distribution.
Figure A8.
Noncentral t-distribution.