A Study on the 
          
            
              
                X
                ¯
              
            
          
         and S Control Charts with Unequal Sample Sizes

Park, Chanseok; Wang, Min

doi:10.3390/math8050698

Open AccessArticle

A Study on the X ¯ and S Control Charts with Unequal Sample Sizes

by

Chanseok Park

¹

and

Min Wang

^2,*

¹

Applied Statistics Laboratory, Department of Industrial Engineering, Pusan National University, Busan 46241, Korea

²

Department of Management Science and Statistics, The University of Texas at San Antonio, San Antonio, TX 78249, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(5), 698; https://doi.org/10.3390/math8050698

Submission received: 12 March 2020 / Revised: 24 April 2020 / Accepted: 30 April 2020 / Published: 2 May 2020

(This article belongs to the Special Issue Statistical Simulation and Computation)

Download

Browse Figures

Versions Notes

Abstract

:

The control charts based on

\bar{X}

and S are widely used to monitor the mean and variability of variables and can help quality engineers identify and investigate causes of the process variation. The usual requirement behind these control charts is that the sample sizes from the process are all equal, whereas this requirement may not be satisfied in practice due to missing observations, cost constraints, etc. To deal with this situation, several conventional methods were proposed. However, some methods based on weighted average approaches and an average sample size often result in degraded performance of the control charts because the adopted estimators are biased towards underestimating the true population parameters. These observations motivate us to investigate the existing methods with rigorous proofs and we provide a guideline to practitioners for the best selection to construct the

\bar{X}

and S control charts when the sample sizes are not equal.

Keywords:

control chart; unequal sample sizes; unbiasedness; relative efficiency; ARL; SDRL

MSC:

26A51; 26D20; 33B15; 62F99; 62P30; 65C60

1. Introduction

Control charts, also known as Shewhart control charts [1,2,3], have been used to determine if a manufacturing process is in a state of control. In particular, the

\bar{X}

and S charts have been widely used to monitor or detect the mean and variability of variables. Here, a variable is a quality characteristic measured on a numerical scale. For example, variables include continuous measurement process data such as length, pressure, width, temperature, and volume, in a time-ordered sequence.

Due to their importance and usefulness in real life applications, these traditional types of univariate and control charts have still received much attention in the literature. We observe that these control charts are usually adopted for continuously monitoring numerous data and for solving the problem of process control in the Industry 4.0 framework; see, for example [4,5,6], among others. Of particular note is that [7] developed a nice qcr package in R to generate Shewhart-type charts and obtained numerical results of interest for a process quality control. More recently, based on the concept of data depth, Ref. [8] proposed a novel alternative way for constructing control charts when the critical to quality (CTQ) variables of the process are functional and also developed the Phase I and II control charts for stabilizing and monitoring the processes, respectively.

It deserves mentioning that these traditional Shewhart control charts mentioned above consist of the upper and lower control limits (for short, UCL and LCL) and the center line (CL). It is noteworthy that the American Standard is based on

CL \pm 3 \cdot SE

control limits with an ideal false alarm rate (FAR) of 0.27% while the British Standard is based on

CL \pm 3.09 \cdot SE

with an ideal FAR of 0.20%, where

SE

denotes the standard error.

The usual requirement behind these control charts is that the sample sizes from the process are all equal. In practice, however, it is often the case that this requirement can not be met due to wrong or missing observations in collecting them. In such setting, the three conventional approaches below are widely used to deal with unequal sample sizes:

(i): A weighted average approach in calculating $\bar{\bar{X}}$ and ${\bar{S}}^{2}$ .
(ii): Control limits based on an average sample size.
(iii): A weighted average in calculating $\bar{S}$ .

For more details, see Subsection 6.3.2 of Montgomery [9] and Subsection 3.8.B of ASTM [10]. The first approach uses variable-width control limits which are determined by the sample-specific values such as

n_{i}

,

A_{3}

,

B_{3}

, and

B_{4}

. To estimate the scale parameter, a weighted average of sample variances is calculated first and then its square-root is taken to estimate the scale parameter. The second approach uses fixed-width control limits which is based on the average of the sample sizes. For more details on these two methods, see Subsection 6.3.2 of Montgomery [9]. The third approach is very similar to calculating

{\bar{S}}^{2}

in the first approach. However, it uses a weighted average of sample standard deviations directly to estimate the scale parameter. For more details, see Subsection 3.8.B of ASTM [10]. It is known that these three approaches may be satisfactory when the sample sizes are not very different. Given that the average of the sample sizes is not necessarily an integer in general, a practical alternative to the second approach is the use of a modal sample size.

However, when using these ad hoc approaches above, the parameter estimators are biased and they actually underestimate the true population parameters as will be shown in Remark 2 in Section 2 and Remarks 3 and 4 in Section 3. These underestimating ad hoc approaches could result in degraded performance of the control charts. Nonetheless, these biased methods are widely covered in many popular textbooks. These observations motivate us to clarify these conventional methods and investigate other existing methods, especially when the samples are not equal in size. Through the rigorous proofs, we provide a guideline for the best selection of the methods to construct the

\bar{X}

and S control charts.

This paper is organized as follows. In Section 2, we provide two location estimators and four scale estimators with unequal sample sizes and show that they are all unbiased. In Section 3, we provide the variances of the estimators considered in this paper and show the inequality relations among them. In Section 4, we provide the relative efficiency of the methods and conduct simulation results to compare the performance of the location and scale estimators. In Section 5, we illustrate how to construct various Shewhart-type control charts (i.e., S,

S^{2}

, and

\bar{X}

charts) using the provided estimators. In Section 6, we provide the empirical estimates of the average run length (ARL) and the standard deviation of the run length (SDRL) through using the extensive Monte Carlo simulations. Three real-data examples are presented in Section 7 for illustrative purposes. Some concluding remarks are given in Section 8.

2. Estimation of Process Parameters with Unequal Sample Sizes

In this section, we provide two location estimators and three scale estimators for the process parameters under the assumption that each sample has different sample sizes. In parametric statistical quality control, the underlying distribution is used to construct the control charts. A quality characteristic is assumed to be normally distributed, which is most widely used in most practical cases. Under this assumption, we show that the estimators provided in this section are all unbiased.

We assume that we have m samples and that each sample has different sample sizes. Let

X_{i j}

be the ith sample (subgroup) of size

n_{i}

from a stable manufacturing process, where

i = 1, 2, \dots, m

and

j = 1, 2, \dots, n_{i}

. We also assume that

X_{i j}

are independent and identically distributed as normal with mean

μ

and variance

σ^{2}

.

2.1. Location Parameter

Using the ith sample above, the sample mean and the sample variance are given by

{\bar{X}}_{i} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} X_{i j}, and S_{i}^{2} = \frac{1}{n_{i} - 1} \sum_{j = 1}^{n_{i}} {(X_{i j} - {\bar{X}}_{i})}^{2},

where

i = 1, 2, \dots, m

. Montgomery [9] provides two location estimators of the population mean parameter

μ

in Equations (6.2) and (6.30) in his book, which are given by

\begin{matrix} {\bar{\bar{X}}}_{A} & = \frac{{\bar{X}}_{1} + {\bar{X}}_{2} + \dots + {\bar{X}}_{m}}{m} = \frac{1}{m} \sum_{i = 1}^{m} {\bar{X}}_{i} \end{matrix}

(1)

and

\begin{matrix} {\bar{\bar{X}}}_{B} & = \frac{n_{1} {\bar{X}}_{1} + n_{2} {\bar{X}}_{2} + \dots + n_{m} {\bar{X}}_{m}}{n_{1} + n_{2} + \dots + n_{m}} = \frac{1}{N} \sum_{i = 1}^{m} n_{i} {\bar{X}}_{i}, \end{matrix}

(2)

where

n_{i} \geq 2

and

N = \sum_{i = 1}^{m} n_{i}

.

These grand averages can be used as the CL on the

\bar{X}

chart. Since

E ({\bar{X}}_{i}) = μ

for

i = 1, 2, \dots, m

, it is easily seen that

E ({\bar{\bar{X}}}_{A}) = μ

and

E ({\bar{\bar{X}}}_{B}) = μ

, showing that these two estimators are unbiased. In addition, the variances of

{\bar{\bar{X}}}_{A}

and

{\bar{\bar{X}}}_{B}

are given by

Var ({\bar{\bar{X}}}_{A}) = σ^{2} \sum_{i = 1}^{m} n_{i}^{- 1} / m^{2}

and

Var ({\bar{\bar{X}}}_{B}) = σ^{2} / N

, which results in

Var ({\bar{\bar{X}}}_{A}) \geq Var ({\bar{\bar{X}}}_{B})

. Thus,

{\bar{\bar{X}}}_{B}

is preferred to

{\bar{\bar{X}}}_{A}

. It should be noted that, for the case of an equal sample size, we have

{\bar{\bar{X}}}_{A} = {\bar{\bar{X}}}_{B} = \bar{\bar{X}}

, where

\bar{\bar{X}} = \sum_{i = 1}^{m} {\bar{X}}_{i} / m

.

2.2. Scale Parameter

It is well known that

S_{i}^{2}

is an unbiased estimator of

σ^{2}

. However,

S_{i}

is not an unbiased estimator of

σ

as below. Since

(n_{i} - 1) S_{i}^{2} / σ^{2}

is distributed as the gamma with shape

(n_{i} - 1) / 2

and scale 2, we have

E [{\{\frac{(n_{i} - 1) S_{i}^{2}}{σ^{2}}\}}^{1 / 2}] = \frac{2^{1 / 2} \cdot Γ (n_{i} / 2)}{Γ ((n_{i} - 1) / 2)} .

Then, we have

E [S_{i}] = \sqrt{\frac{2}{n_{i} - 1}} \frac{Γ (n_{i} / 2)}{Γ ((n_{i} - 1) / 2)} \cdot σ = c_{4} (n_{i}) \cdot σ,

(3)

where

c_{4} (n_{i}) = \sqrt{\frac{2}{n_{i} - 1}} \frac{Γ (n_{i} / 2)}{Γ ((n_{i} - 1) / 2)} .

(4)

This shows that

S_{i} / c_{4} (n_{i})

is actually an unbiased estimator of

σ

. Note that

c_{4} (\cdot)

is the normal-consistent unbiasing factor, which is a function of the sample size and this

c_{4}

notation was originally used in ASQC [11] to the best of our knowledge. For more details on

c_{4} (\cdot)

, the interested reader is referred to Vardeman [12]. In Appendix A, we also provide an approximate calculation of

c_{4} (\cdot)

which can be used for practically easier calculation.

Thus, we can estimate

σ

using

S_{i} / c_{4} (n_{i})

which are are all unbiased. Analogous to Equation (1), one can use

{\bar{S}}_{A} = \frac{S_{1} / c_{4} (n_{1}) + S_{2} / c_{4} (n_{2}) + \dots + S_{m} / c_{4} (n_{m})}{m} = \frac{1}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})},

(5)

which is clearly unbiased for

σ

. Since

\sum_{i = 1}^{m} E [S_{i}] = \sum_{i = 1}^{m} c_{4} (n_{i}) \cdot σ

from Equation (3), the estimator below

{\bar{S}}_{B} = \frac{S_{1} + S_{2} + \dots + S_{m}}{c_{4} (n_{1}) + c_{4} (n_{2}) + \dots + c_{4} (n_{m})} = \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})}

(6)

is also unbiased for

σ

. In addition, we consider the following unbiased estimator proposed by Burr [13]

{\bar{S}}_{C} = \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} .

(7)

This estimator is actually the best linear unbiased estimator (BLUE) and we provide the proof below.

Theorem 1.

The estimator

{\bar{S}}_{C}

in Equation (7) is the BLUE.

Proof.

First, we consider a linear unbiased estimator in the form of

\sum_{i = 1}^{m} w_{i} S_{i}

. Then, its variance and expectation are given by

Var (\sum_{i = 1}^{m} w_{i} S_{i}) = \sum_{i = 1}^{m} w_{i}^{2} {1 - c_{4} {(n_{i})}^{2}} σ^{2} and E (\sum_{i = 1}^{m} w_{i} S_{i}) = \sum_{i = 1}^{m} w_{i} c_{4} (n_{i}) σ .

To obtain the BLUE, we need to minimize

Var (\sum_{i = 1}^{m} w_{i} S_{i})

with the unbiasedness condition

E (\sum_{i = 1}^{m} w_{i} S_{i}) = σ

. Thus, our objective is to minimize

\sum_{i = 1}^{m} w_{i}^{2} {1 - c_{4} {(n_{i})}^{2}} subject to \sum_{i = 1}^{m} w_{i} c_{4} (n_{i}) = 1,

which can be easily solved by using the method of Lagrange multipliers. The auxiliary function with the Lagrange multiplier

λ

is given by

Ψ = \sum_{i = 1}^{m} w_{i}^{2} {1 - c_{4} {(n_{i})}^{2}} - λ \{\sum_{i = 1}^{m} w_{i} c_{4} (n_{i}) - 1\} .

It is immediate from

\partial Ψ / \partial w_{k}

that

2 w_{k} {1 - c_{4} {(n_{k})}^{2}} - λ c_{4} (n_{k}) = 0

, which results in

w_{k} = \frac{λ c_{4} (n_{k})}{2 {1 - c_{4} {(n_{k})}^{2}}} .

(8)

Multiplying

c_{4} (n_{k})

to Equation (8) and then making the sum of the two sides, we have

\sum_{i = 1}^{m} w_{i} c_{4} (n_{i}) = \frac{λ}{2} \sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}} .

Since

\sum_{i = 1}^{m} w_{i} c_{4} (n_{i}) = 1

, we first solve the above for

λ

and then substitute it into Equation (8), which provides

w_{k} = \frac{\frac{c_{4} (n_{k})}{1 - c_{4} {(n_{k})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}},

which results in

{\bar{S}}_{C} = \sum_{i = 1}^{m} w_{i} S_{i}

. This completes the proof. □

Remark 1.

It should be noted that, for the case of an equal sample size (

n_{1} = n_{2} = \dots = n_{m} = n

), we can easily show that

{\bar{S}}_{A} = {\bar{S}}_{B} = {\bar{S}}_{C} = \bar{S} / c_{4} (n)

, where

\bar{S} = \sum_{i = 1}^{m} S_{i} / m

.

One can also incorporate the pooled sample variance in estimating

σ

given by

S_{p}^{2} = \frac{\sum_{i = 1}^{m} (n_{i} - 1) S_{i}^{2}}{N - m},

(9)

where

N = \sum_{i = 1}^{m} n_{i}

again. However,

S_{p}

is not unbiased for

σ

although

S_{p}^{2}

is. This is because

E [S_{p}] = c_{4} (N - m + 1) σ .

(10)

Based on this, Burr [13] suggested the following unbiased estimator of

σ

,

{\bar{S}}_{D} = \frac{S_{p}}{c_{4} (N - m + 1)} .

(11)

Remark 2.

The weighted average approach introduced in Subsection 6.3.2 of Montgomery [9] uses the pooled sample standard deviation

S_{p}

from Equation (9) to estimate σ. However, since

c_{4} (x) < 1

, which will be shown in Lemma 1, it is immediate from Equation (10) that

S_{p}

clearly underestimates the true parameter σ.

We have introduced the four unbiased scale estimators of

σ

which are denoted by

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

, and

{\bar{S}}_{D}

. A natural question appears: which of the four estimators should be recommended for estimating

σ

in practical applications? In the following section, we clarify this question by providing a guideline in terms of inequalities of their variances of the estimators under consideration.

3. Inequalities of the Variances of the Scale Estimators

We first obtain the variance of

{\bar{S}}_{A}

in Equation (5), which is obtained as

Var ({\bar{S}}_{A}) = \frac{1}{m^{2}} \sum_{i = 1}^{m} \frac{1}{c_{4} {(n_{i})}^{2}} Var (S_{i}) .

(12)

Using Equation (3) and the unbiasedness property of

S_{i}^{2}

, we have

Var (S_{i}) = E (S_{i}^{2}) - E {(S_{i})}^{2} = σ^{2} - c_{4} {(n_{i})}^{2} σ^{2} = σ^{2} \{1 - c_{4} {(n_{i})}^{2}\} .

(13)

Substituting Equation (13) into (12), we have

Var ({\bar{S}}_{A}) = \frac{σ^{2}}{m^{2}} \sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\} .

(14)

Similarly, the variance of

{\bar{S}}_{B}

in Equation (6) is easily obtained as

Var ({\bar{S}}_{B}) = \frac{\sum_{i = 1}^{m} Var (S_{i})}{{\sum_{i = 1}^{m} c_{4} (n_{i})}^{2}} = σ^{2} \cdot \frac{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2}}

(15)

and that of

{\bar{S}}_{C}

in Equation (7) is also obtained as

Var ({\bar{S}}_{C}) = \frac{σ^{2}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} .

(16)

Finally, for the case of

{\bar{S}}_{D}

, we have

Var ({\bar{S}}_{D}) = E ({\bar{S}}_{D}^{2}) - E {({\bar{S}}_{D})}^{2} .

(17)

Using Equation (11) and the unbiasedness property of

{\bar{S}}_{D}

, we can rewrite Equation (17) as

Var ({\bar{S}}_{D}) = \frac{1}{c_{4} {(N - m + 1)}^{2}} E (S_{p}^{2}) - σ^{2} .

Since

S_{p}^{2}

is also unbiased for

σ^{2}

, we have

Var ({\bar{S}}_{D}) = σ^{2} \{\frac{1}{c_{4} {(N - m + 1)}^{2}} - 1\} .

(18)

Next, as aforementioned, we here answer the question of how to choose the best one among the four estimators by comparing their variances of these four scale estimators. To be more specific, we prove the inequality relations among the four scale estimators as follows.

Lemma 1.

The function

c_{4} (x)

defined in Equation (4) is monotonically increasing and

\sqrt{\frac{2 x - 3}{2 x - 2}} < c_{4} (x) < 1 .

Proof.

Using

(x - 1) / 2 \cdot Γ ((x - 1) / 2) = Γ ((x + 1) / 2)

, we can rewrite

c_{4} (\cdot)

in Equation (4) as

c_{4} (x) = \sqrt{\frac{2}{x - 1}} \frac{Γ (x / 2)}{Γ ((x - 1) / 2)} = {[\frac{Γ {(x / 2)}^{2}}{Γ ((x - 1) / 2) Γ ((x + 1) / 2)}]}^{1 / 2},

(19)

where

x \geq 2

. Watson [14] showed that the function

θ (x) = - x + x \frac{Γ (x) Γ (x + 1)}{Γ {(x + 1 / 2)}^{2}}

(20)

is monotonically decreasing. It is noteworthy that the function

θ (x)

was motivated by Wallis’ famous infinite fraction for

π

[15]. We can rewrite

c_{4} (x)

using

θ (x)

in Equation (20) as

c_{4} (x) = \frac{1}{\sqrt{\frac{θ ((x - 1) / 2)}{(x - 1) / 2} + 1}} .

(21)

Since both

θ ((x - 1) / 2)

and

1 / {(x - 1) / 2}

are positive and monotonically decreasing,

\frac{1}{c_{4} (x)} = \sqrt{θ ((x - 1) / 2) \cdot \frac{1}{(x - 1) / 2} + 1}

is also decreasing. Thus,

c_{4} (x)

is monotonically increasing. Watson [14] and Mortici [16] also showed that

\sqrt{x + \frac{1}{4}} < \frac{Γ (x + 1)}{Γ (x + \frac{1}{2})} \leq \sqrt{x + \frac{1}{π}} .

Multiplying

\sqrt{2 / (2 x + 1)}

on each of the above terms, some algebra shows that

\sqrt{\frac{4 x + 1}{4 x + 2}} < c_{4} (2 x + 2) \leq \sqrt{\frac{2 x + 2 / π}{2 x + 1}} < 1 .

For convenience, we let

x^{*} = 2 x + 2

. Then, we have

\sqrt{\frac{2 x^{*} - 3}{2 x^{*} - 2}} < c_{4} (x^{*}) < 1,

which completes the proof. □

Lemma 2

(Chebyshev’s sum inequality). If

a_{1} \geq a_{2} \geq \dots \geq a_{m}

and

b_{1} \geq b_{2} \geq \dots \geq b_{m}

, then we have

m \sum_{i = 1}^{m} a_{i} b_{i} \geq \sum_{i = 1}^{m} a_{i} \cdot \sum_{i = 1}^{m} b_{i} .

Similarly, if

a_{1} \leq a_{2} \leq \dots \leq a_{m}

and

b_{1} \geq b_{2} \geq \dots \geq b_{m}

, then we have

m \sum_{i = 1}^{m} a_{i} b_{i} \leq \sum_{i = 1}^{m} a_{i} \cdot \sum_{i = 1}^{m} b_{i} .

Proof.

First, we will consider the case where both

a_{i}

and

b_{i}

are increasing. In this case, both

(a_{i} - a_{j})

and

(b_{i} - b_{j})

have the same sign, or at least one of them can have the zero value. Then, the value of

(a_{i} - a_{j}) (b_{i} - b_{j})

is positive or zero for any i and j, which results in

\sum_{i = 1}^{m} \sum_{j = 1}^{m} (a_{i} - a_{j}) (b_{i} - b_{j}) \geq 0 .

After the tedious algebra of the above, we have

m \sum_{i = 1}^{m} a_{i} b_{i} \geq \sum_{i = 1}^{m} a_{i} \cdot \sum_{i = 1}^{m} b_{i} .

Next, we consider the case when

a_{i}

is increasing and

b_{i}

is decreasing. In this case,

(a_{i} - a_{j})

and

(b_{i} - b_{j})

have different signs, or at least one of them can have the zero value. Then, the value of

(a_{i} - a_{j}) (b_{i} - b_{j})

is negative or zero for any i and j. Similar to the above approach, we have

\sum_{i = 1}^{m} \sum_{j = 1}^{m} (a_{i} - a_{j}) (b_{i} - b_{j}) \leq 0,

which results in

m \sum_{i = 1}^{m} a_{i} b_{i} \leq \sum_{i = 1}^{m} a_{i} \cdot \sum_{i = 1}^{m} b_{i}

. This completes the proof. □

It should be noted that the above inequality name is coined after Pafnuty Lvovich Chebyshev (1821–1894) who mentioned it in a brief note [17]. He provided it in an integral form and his original proof can be found in Chebyshev [18]. For more details, the readers are also referred to Besenyei [19] and Section 2.17 of Hardy et al. [20].

Theorem 2.

We have

Var ({\bar{S}}_{A}) \geq Var ({\bar{S}}_{B}) .

Proof.

For convenience, we rearrange the sample sizes so that

n_{1} \leq n_{2} \leq \dots \leq n_{m}

. Then, it is immediate from Equations (14) and (15) that it suffices to show

\frac{1}{m^{2}} \sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\} \geq \frac{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2}} .

Since

c_{4} (x)

is increasing from Lemma 1, it is easily seen that

1 / c_{4} {(x)}^{2} - 1

is decreasing. For convenience, let

a_{i} = c_{4} (n_{i})

and

b_{i} = 1 / c_{4} {(n_{i})}^{2} - 1

. Then, we have

a_{1} \leq a_{2} \leq \dots \leq a_{m}

and

b_{1} \geq b_{2} \geq \dots \geq b_{m}

. Thus, we observe from Lemma 2 that

\sum_{i = 1}^{m} c_{4} (n_{i}) \cdot \sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\} \geq m \cdot \sum_{i = 1}^{m} \{\frac{1}{c_{4} (n_{i})} - c_{4} (n_{i})\} .

(22)

Applying Lemma 2 again with

a_{i} = c_{4} (n_{i})

(increasing) and

b_{i} = 1 / c_{4} (n_{i}) - c_{4} (n_{i})

(decreasing), we have

\sum_{i = 1}^{m} c_{4} (n_{i}) \cdot \sum_{i = 1}^{m} \{\frac{1}{c_{4} (n_{i})} - c_{4} (n_{i})\} \geq m \cdot \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\},

which results in

\sum_{i = 1}^{m} \{\frac{1}{c_{4} (n_{i})} - c_{4} (n_{i})\} \geq \frac{m \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{\sum_{i = 1}^{m} c_{4} (n_{i})} .

(23)

Comparing Equations (22) and (23), we have

\sum_{i = 1}^{m} c_{4} (n_{i}) \cdot \sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\} \geq \frac{m^{2} \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{\sum_{i = 1}^{m} c_{4} (n_{i})},

which results in

\frac{1}{m^{2}} \sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\} \geq \frac{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2}} .

This completes the proof. □

Theorem 3.

We have

Var ({\bar{S}}_{B}) \geq Var ({\bar{S}}_{C}) .

Proof.

We have the variances of

{\bar{S}}_{B}

and

{\bar{S}}_{C}

from Equations (15) and (16), which are given by

Var ({\bar{S}}_{B}) = \frac{σ^{2} \cdot \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}{{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2}} and Var ({\bar{S}}_{C}) = \frac{σ^{2}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} .

Thus, it suffices to show

{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2} \leq \sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}} \cdot \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\} .

(24)

For convenience, we let

a_{i} = c_{4} (n_{i}) / \sqrt{1 - c_{4} {(n_{i})}^{2}}

and

b_{i} = \sqrt{1 - c_{4} {(n_{i})}^{2}}

. Then, it is immediate from the Cauchy–Schwarz inequality,

{\{\sum a_{i} b_{i}\}}^{2} \leq \{\sum a_{i}^{2}\} \{\sum b_{i}^{2}\}

that the inequality in Equation (24) holds. This completes the proof. □

Lemma 3.

The function

\frac{c_{4} {(x)}^{2}}{1 - c_{4} {(x)}^{2}}

is concave.

Proof.

It is immediate from Equation (21) that we have

\frac{c_{4} {(x)}^{2}}{1 - c_{4} {(x)}^{2}} = \frac{(x - 1) / 2}{θ ((x - 1) / 2)} .

(25)

For convenience, we let

y = (x - 1) / 2

. Then, it suffices to show

y / θ (y)

is concave. Bustoz and Ismail [21] showed that

θ (y)

is completely monotonic on

[- 1 / 2, \infty)

using the representation by Watson [14]. For more details on complete monotonicity, refer to Section XIII.4 of Feller [22]. It is well known that completely monotonic functions are log-convex. For example, see Lemma 4.3 of Merkle [23], Theorem 1 of Fink [24], Exercise 6 in Section 2.1 of Niculescu and Persson [25], and Equation (3.4) of van Haeringen [26] with

n = 0

and

m = 1

, among others.

Thus,

log θ (y)

is convex so

- log θ (y)

is concave. Since

log y

is also concave,

log (y / θ (y)) = log y + \{- log θ (y)\}

is concave. This implies that

y / θ (y)

is log-concave. The log-concavity of

y / θ (y)

guarantees that it is concave. This completes the proof. □

Theorem 4.

We have

Var ({\bar{S}}_{C}) \geq Var ({\bar{S}}_{D}) .

Proof.

From Equations (16) and (18), we have

Var ({\bar{S}}_{C}) = \frac{σ^{2}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} and Var ({\bar{S}}_{D}) = σ^{2} \{\frac{1}{c_{4} {(N - m + 1)}^{2}} - 1\} .

Thus, it suffices to show

\frac{c_{4} {(N - m + 1)}^{2}}{1 - c_{4} {(N - m + 1)}^{2}} \geq \sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}} .

It is immediate from Lemma 3 that

c_{4} {(x)}^{2} / {1 - c_{4} {(x)}^{2}}

is concave. Then, using Jensen’s inequality, we have

\frac{c_{4} {(\bar{n})}^{2}}{1 - c_{4} {(\bar{n})}^{2}} \geq \frac{1}{m} \sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}},

where

\bar{n} = \sum_{i = 1}^{m} n_{i} / m

. Thus, it suffices to show

\frac{c_{4} {(N - m + 1)}^{2}}{1 - c_{4} {(N - m + 1)}^{2}} \geq \frac{m c_{4} {(\bar{n})}^{2}}{1 - c_{4} {(\bar{n})}^{2}} .

(26)

Using Equation (25) in Lemma 3, we have

\begin{matrix} \frac{c_{4} {(N - m + 1)}^{2}}{1 - c_{4} {(N - m + 1)}^{2}} = \frac{(N - m) / 2}{θ ((N - m) / 2)} \end{matrix}

(27)

and

\begin{matrix} \frac{m c_{4} {(\bar{n})}^{2}}{1 - c_{4} {(\bar{n})}^{2}} = \frac{m (\bar{n} - 1) / 2}{θ ((\bar{n} - 1) / 2)} = \frac{(N - m) / 2}{θ ((\bar{n} - 1) / 2)}, \end{matrix}

(28)

where

θ (x)

is defined in Equation (20). Comparing Equations (27) and (28), we need to show

θ ((N - m) / 2) \leq θ ((\bar{n} - 1) / 2) .

Since

(\bar{n} - 1) / 2 \leq (N - m) / 2

and

θ (x)

is decreasing from Watson [14], the above inequality in Equation (26) holds. This completes the proof. □

In combination with the inequalities in Theorems 2–4, we have the following result:

Var ({\bar{S}}_{A}) \geq Var ({\bar{S}}_{B}) \geq Var ({\bar{S}}_{C}) \geq Var ({\bar{S}}_{D}) .

(29)

Lemma 4.

The

c_{4} (x)

defined in Equation (4) satisfies

\frac{1}{m} \sum_{i = 1}^{m} c_{4} (n_{i}) \leq c_{4} (\bar{n}) .

Proof.

First, we will show that

c_{4} (x)

is concave. Taking the logarithm of

c_{4} (x)

in Equation (19), we have

log c_{4} (x) = log Γ (\frac{x}{2}) - \frac{1}{2} log Γ (\frac{x - 1}{2}) - \frac{1}{2} log Γ (\frac{x + 1}{2}),

where

x \geq 2

. It is well known that the second derivative of

log Γ (x)

can be expressed as the sum of the series

\frac{d^{2}}{d x^{2}} log Γ (x) = \sum_{k = 0}^{\infty} \frac{1}{{(x + k)}^{2}} .

(30)

For more details, see Merkle [27] and Section 11.14 (iv) of Schilling [28].

Using Equation (30), we have

\begin{matrix} \frac{d^{2}}{d x^{2}} log c_{4} (x) & = \sum_{k = 0}^{\infty} \frac{1}{{(x / 2 + k)}^{2}} - \frac{1}{2} \sum_{k = 0}^{\infty} \frac{1}{{(x / 2 - 1 / 2 + k)}^{2}} - \frac{1}{2} \sum_{k = 0}^{\infty} \frac{1}{{(x / 2 + 1 / 2 + k)}^{2}} \\ = \sum_{k = 0}^{\infty} [\frac{1}{{(x / 2 + k)}^{2}} - \frac{1}{2 {(x / 2 - 1 / 2 + k)}^{2}} - \frac{1}{2 {(x / 2 + 1 / 2 + k)}^{2}}] \\ = - \sum_{k = 0}^{\infty} \frac{\frac{3}{4} {(x / 2 - 1 / 2 + k)}^{2} + \frac{3}{4} (x / 2 - 1 / 2 + k) + \frac{1}{8}}{{(x / 2 + k)}^{2} {(x / 2 - 1 / 2 + k)}^{2} {(x / 2 + 1 / 2 + k)}^{2}} < 0 . \end{matrix}

Thus,

c_{4} (x)

is log-concave which guarantees that

c_{4} (x)

is concave. Then, using Jensen’s inequality, we have

\frac{1}{m} \sum_{i = 1}^{m} c_{4} (n_{i}) \leq c_{4} (\bar{n}),

where

\bar{n} = \sum_{i = 1}^{m} n_{i} / m

. This completes the proof. □

Remark 3.

It is worth mentioning that one can argue an alternative way based on an average sample size

\bar{n} = \sum_{i = 1}^{m} n_{i} / m

. For example, see Section 6.3.2 of Montgomery [9]. In such setting, one may consider the following quotient for the estimator of σ

{\bar{S}}^{*} = \frac{\bar{S}}{c_{4} (\bar{n})},

(31)

where

\bar{S} = \frac{1}{m} \sum_{i = 1}^{m} S_{i} .

(32)

However, this estimator is not unbiased because

E [{\bar{S}}^{*}] = \frac{\frac{1}{m} \sum_{i = 1}^{m} E [S_{i}]}{c_{4} (\bar{n}) σ} = \frac{\frac{1}{m} \sum_{i = 1}^{m} c_{4} (n_{i})}{c_{4} (\bar{n})} \cdot σ .

Because

\frac{1}{m} \sum_{i = 1}^{m} c_{4} (n_{i}) \leq c_{4} (\bar{n})

from Lemma 4,

{\bar{S}}^{*}

can underestimate the true parameter σ. However, for the case of an equal sample size

(n_{1} = n_{2} = \dots = n_{m} = n)

, we have

{\bar{S}}^{*} = \bar{S} / c_{4} (n)

so that

{\bar{S}}_{A} = {\bar{S}}_{B} = {\bar{S}}_{C} = {\bar{S}}^{*}

. Thus, in this special case of an equal sample size,

{\bar{S}}^{*}

is unbiased.

Remark 4.

There is another alternative in Subsection 3.8.B of ASTM [10], which is based on the weighted average of sample standard deviations given by

{\bar{S}}_{w} = \frac{n_{1} S_{1} + n_{2} S_{2} + \dots + n_{m} S_{m}}{n_{1} + n_{2} + \dots + n_{m}} = \frac{\sum_{i = 1}^{m} n_{i} S_{i}}{N} .

(33)

Then, we have

E [{\bar{S}}_{w}] = \frac{\sum_{i = 1}^{m} n_{i} c_{4} (n_{i})}{N} \cdot σ .

It deserves mentioning that the estimator above still underestimates the true parameter σ because

c_{4} (x) < 1

from Lemma 1. In addition, for the case of an equal sample size, we have

{\bar{S}}_{w} = \bar{S}

, where

\bar{S} = \frac{1}{m} \sum_{i = 1}^{m} S_{i}

.

Theorem 5.

Let

S_{N}^{2} = \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {(X_{i j} - \bar{\bar{X}})}^{2} / (N - 1)

and

\bar{\bar{X}} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} X_{i j}

. Then,

\bar{\bar{X}}

and

{\bar{S}}_{E} = S_{N} / c_{4} (N)

are the uniform minimum variance unbiased estimators of μ and σ, respectively. Thus, we have

Var ({\bar{S}}_{D}) \geq Var ({\bar{S}}_{E}) .

Proof.

One can obtain the uniform minimum variance unbiased estimator (UMVUE) using complete sufficient statistics as described in Theorem 7.3.23 of Casella and Berger [29]. For more details on the UMVUE, one can refer to Definition 1.6 of Lehmann and Casella [30].

The control charts we have developed are under the assumption that

X_{i j}

are independent and identically distributed as normal with mean

μ

and variance

σ^{2}

which leads to the joint complete sufficient statistics for

μ

and

σ^{2}

given by

\bar{\bar{X}} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} X_{i j}

and

\sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {(X_{i j} - \bar{\bar{X}})}^{2}

, respectively. Since

E [\bar{\bar{X}}] = μ

and

E [{\bar{S}}_{E}] = σ

,

\bar{\bar{X}}

and

{\bar{S}}_{E}

are the UMVUEs of

μ

and

σ

, respectively. This completes the proof. □

Remark 5.

It is noteworthy that, analogous to Equation (18), we have

Var ({\bar{S}}_{E}) = σ^{2} \{\frac{1}{c_{4} {(N)}^{2}} - 1\},

(34)

which also results in

Var ({\bar{S}}_{D}) \geq Var ({\bar{S}}_{E})

since

c_{4} (x)

is increasing from Lemma 1.

Remark 6.

Although

{\bar{S}}_{E}

can attain the minimum variance, we do not adopt this estimator to construct the control charts. The main reason can be discussed as follows. Consider the case that

X_{i j}

are independent and identically distributed as normal with different means

μ_{i}

for

j = 1, 2, \dots, n_{i}

and variance

σ^{2}

. Then, the pooled sample variance,

S_{p}^{2}

, is a complete sufficient statistic. See Example 2.3 in Section 2.2 of Lehmann and Casella [30]. This implies that

{\bar{S}}_{E}

is better under the null hypothesis

H_{0} : μ_{1} = μ_{2} = \dots = μ_{m}

(in control), whereas

{\bar{S}}_{D}

is better under the alternative (out of control). Thus, if the process is out of control, the control charts using

{\bar{S}}_{E}

could have wider control limits, which may result in an increase in the rate incorrectly signaling that the process is in-control when the process is actually out of control.

In addition, one can think of the case of non-constancy of σ. In this heteroscedasticity case, Burr [13] mentioned that

{\bar{S}}_{C}

seems preferable to

{\bar{S}}_{D}

. We think that this case should be investigated more thoroughly in a sequel paper.

4. Comparison of the Performance

In this section, we provide the relative efficiency of the methods to compare their performance. We also carried out Monte Carlo simulations to compare the empirical biases and variances.

4.1. Relative Efficiency

When we compare the performance of unbiased estimators (say,

{\hat{θ}}_{1}

and

{\hat{θ}}_{0}

), the relative efficiency (RE) is widely used in the statistics literature. See Section 2.2 of Lehmann [31] for more details. The RE of

{\hat{θ}}_{1}

with respect to

{\hat{θ}}_{0}

is given by

RE ({\hat{θ}}_{1} ∣ {\hat{θ}}_{0}) = \frac{Var ({\hat{θ}}_{0})}{Var ({\hat{θ}}_{1})},

where

{\hat{θ}}_{0}

is a reference estimator. In general, the estimator with the smaller variance of the two estimators is used as a reference estimator so that

RE ({\hat{θ}}_{1} ∣ {\hat{θ}}_{0}) \leq 1

.

To estimate the location parameter, we considered

{\bar{\bar{X}}}_{A}

in Equation (1) and

{\bar{\bar{X}}}_{B}

in Equation (2). Then, the RE of

{\bar{\bar{X}}}_{A}

with respect to

{\bar{\bar{X}}}_{B}

is easily obtained as

RE ({\bar{\bar{X}}}_{A} ∣ {\bar{\bar{X}}}_{B}) = \frac{m^{2}}{(\sum_{i = 1}^{m} n_{i}) \cdot (\sum_{i = 1}^{m} n_{i}^{- 1})} .

Note that

RE ({\bar{\bar{X}}}_{A} ∣ {\bar{\bar{X}}}_{B}) \leq 1

where the equality holds if and only if

n_{1} = n_{2} = \dots = n_{m}

due to the inequality of the arithmetic mean and the harmonic mean [32].

For the case of the scale parameter, we considered the five estimators. Among them,

{\bar{S}}_{E}

has the smallest variance. Thus, it can be used as a reference to compare the performance of the scale estimators and the RE is then given by

RE ({\bar{S}}_{j} ∣ {\bar{S}}_{E}) = \frac{Var ({\bar{S}}_{E})}{Var ({\bar{S}}_{j})},

where

j = A, B, C, D

. For notational brevity, we denote

RE ({\bar{S}}_{j}) = RE ({\bar{S}}_{j} ∣ {\bar{S}}_{E}) .

(35)

It is immediate from Equations (14)–(16), (18), and (34) that we have

\begin{matrix} RE ({\bar{S}}_{A}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{m^{2}}{\sum_{i = 1}^{m} \{\frac{1}{c_{4} {(n_{i})}^{2}} - 1\}}, \\ RE ({\bar{S}}_{B}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{{\{\sum_{i = 1}^{m} c_{4} (n_{i})\}}^{2}}{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}}, \\ RE ({\bar{S}}_{C}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}, \end{matrix}

and

\begin{matrix} RE ({\bar{S}}_{D}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{c_{4} {(N - m + 1)}^{2}}{1 - c_{4} {(N - m + 1)}^{2}} . \end{matrix}

It can be easily seen from Equations (29) and (34) that we have

RE ({\bar{S}}_{A}) \leq RE ({\bar{S}}_{B}) \leq RE ({\bar{S}}_{C}) \leq RE ({\bar{S}}_{D}) \leq 1

. In particular, when

n_{1} = n_{2} = \dots = n_{m}

, we have

RE ({\bar{S}}_{A}) = RE ({\bar{S}}_{B}) = RE ({\bar{S}}_{C}) \leq RE ({\bar{S}}_{D})

.

We have considered the RE to compare the performance of the above unbiased estimators. However,

{\bar{S}}^{*}

in Equation (31) and

{\bar{S}}_{w}

in Equation (33) are not unbiased as mentioned in Remarks 2 and 3, respectively. In addition,

\bar{S}

in Equation (32) is not unbiased as easily seen from Equation (3). Thus, it is reasonable to consider the mean square error (MSE) to obtain the RE since the MSE can be regarded as a overall measure of bias and dispersion. For other utilization and modification of the RE, one can refer to Park et al. [33,34] and Ouyang et al. [35] which considered the ratio of the determinants of the covariance matrices, that is, the generalized variances [36,37]. Since the MSE is the same as the variance for unbiased estimators (

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

, and

{\bar{S}}_{D}

), the RE based on the MSE is the same as that based on the variance in this unbiased case. In addition, the variance of

{\bar{S}}_{E}

is the same as its MSE. Thus, we consider the RE of

\bar{S}

,

{\bar{S}}^{*}

, and

{\bar{S}}_{w}

as follows. We denote them by

RE (\bar{S}) = \frac{Var ({\bar{S}}_{E})}{MSE (\bar{S})}, RE ({\bar{S}}^{*}) = \frac{Var ({\bar{S}}_{E})}{MSE ({\bar{S}}^{*})}, and RE ({\bar{S}}_{w}) = \frac{Var ({\bar{S}}_{E})}{MSE ({\bar{S}}_{w})} .

We next obtain their biases which are easily obtained using Equation (3)

\begin{matrix} Bias (\bar{S}) & = [\frac{1}{m} \sum_{i = 1}^{m} c_{4} (n_{i}) - 1] σ, \\ Bias ({\bar{S}}^{*}) & = [\frac{\sum_{i = 1}^{m} c_{4} (n_{i})}{m c_{4} (\bar{n})} - 1] σ, \end{matrix}

and

\begin{matrix} Bias ({\bar{S}}_{w}) & = [\frac{\sum_{i = 1}^{m} n_{i} c_{4} (n_{i})}{N} - 1] σ . \end{matrix}

In addition, the variances are also easily obtained by using

Var (S_{i}) = σ^{2} \{1 - c_{4} {(n_{i})}^{2}\}

so that we have

\begin{matrix} Var (\bar{S}) & = \frac{σ^{2}}{m^{2}} \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}, \\ Var ({\bar{S}}^{*}) & = \frac{1}{c_{4} {(\bar{n})}^{2}} \frac{σ^{2}}{m^{2}} \sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\}, \end{matrix}

and

\begin{matrix} Var ({\bar{S}}_{w}) & = \frac{σ^{2}}{N^{2}} \sum_{i = 1}^{m} n_{i}^{2} \{1 - c_{4} {(n_{i})}^{2}\} . \end{matrix}

Considering that the MSE is the variance plus the squared bias, we can obtain the RE based on the MSE

\begin{matrix} RE (\bar{S}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{m^{2}}{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\} + {\{\sum_{i = 1}^{m} c_{4} (n_{i}) - m\}}^{2}}, \\ RE ({\bar{S}}^{*}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{m^{2} c_{4} {(\bar{n})}^{2}}{\sum_{i = 1}^{m} \{1 - c_{4} {(n_{i})}^{2}\} + {\{\sum_{i = 1}^{m} c_{4} (n_{i}) - m c_{4} (\bar{n})\}}^{2}}, \end{matrix}

and

\begin{matrix} RE ({\bar{S}}_{w}) & = \{\frac{1}{c_{4} {(N)}^{2}} - 1\} \cdot \frac{N^{2}}{\sum_{i = 1}^{m} n_{i}^{2} \{1 - c_{4} {(n_{i})}^{2}\} + {\{\sum_{i = 1}^{m} n_{i} c_{4} (n_{i}) - N\}}^{2}} . \end{matrix}

4.2. Empirical Biases and Variances

The RE is a useful statistical tool to compare the performance of unbiased estimators. However, the conventional methods such as in Equations (31)–(33) are biased. To compare these with the methods provided here, we obtain the empirical biases and variances by carrying out Monte Carlo simulations.

We consider two cases: an equal sample size and unequal sample sizes. We first generated samples of equal size with

n_{1} = n_{2} = n_{3} = 3

, with

n_{1} = n_{2} = n_{3} = 10

, and with

n_{1} = n_{2} = n_{3} = 20

. We next generated samples of unequal sizes with

n_{1} = 3

,

n_{2} = 5

,

n_{3} = 7

, with

n_{1} = 5

,

n_{2} = 10

,

n_{3} = 15

, and with

n_{1} = 10

,

n_{2} = 20

,

n_{3} = 30

. Again, let

X_{i j}

be the ith sample (subgroup) of size

n_{i}

. Then,

X_{i j}

were generated from the normal distribution with mean

μ_{0} = 100

and standard deviation

σ_{0} = 10

and we obtained the scale estimates including the unbiased estimators (

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

,

{\bar{S}}_{E}

) and the conventional methods (

\bar{S}

,

{\bar{S}}^{*}

,

{\bar{S}}_{w}

). In order to obtain the empirical biases and variances of these estimates, we repeated this simulation ten million times (

I = 10^{7}

) and the results are summarized in Table 1 for the case of an equal sample size and Table 2 for the case of unequal sample sizes. In addition, the MSEs along with the squared empirical biases (red) and variances (light blue) are plotted in Figure 1 for the case of an equal sample size and in Figure 2 for the case of unequal sample sizes. In the figures,

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

,

\bar{S}

,

{\bar{S}}^{*}

, and

{\bar{S}}_{w}

are denoted by A, B, C, D, S, S*, and Sw, respectively.

Comparing the simulation results in Table 1, we can observe that the empirical results of

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

, and

{\bar{S}}^{*}

are the same for the case of an equal sample size. These results are quite reasonable as pointed out in Remark 3. We can notice that the empirical biases are not exactly zero, although quite negligible, because these biases are due to a random phenomenon of Monte Carlo simulation. In addition,

\bar{S}

and

{\bar{S}}_{w}

have the same results as pointed out in Remark 4. On the other hand, for the case of unequal sample sizes, they all have different results.

For the case of an equal sample size, the empirical variances and biases are noticeably different for a smaller sample size, but all of them are getting closer as the sample size is increasing as also shown in Figure 1. For the case of unequal sample sizes, the empirical biases are getting smaller as the sample sizes are increasing. However, the variances are still noticeably different even with large sample sizes.

Comparing the empirical variances only,

{\bar{S}}_{w}

performs very well, whereas it is seriously biased. Thus, it is reasonable to compare the MSEs along with the biases and we can conclude that

{\bar{S}}_{D}

is overall the best. Note that we do not recommend the use of

{\bar{S}}_{E}

, even though

{\bar{S}}_{E}

always has the best results in all the measures. This is because it can lead to degraded performance when the process is out of control, as aforementioned in Remark 6.

5. Construction of the Control Charts with Unequal Sample Sizes

We briefly introduce how to construct the control charts and then discuss how to implement the estimators provided here to construct the S chart in Section 5.1 and improve the S chart using the probability limits in Section 5.2. We also discuss the construction of the

\bar{X}

chart in Section 5.3.

In general, we construct statistical quality control charts based on two phases [9,38], usually denoted by Phase-I and Phase-II. Then, one can establish control limits with a set of stable manufacturing process data in Phase-I. Then, we monitor the process in Phase-II using the control limits obtained in Phase-I. We assume that we have m samples from a stable manufacturing process (Phase-I) and each sample has different sample sizes, denoted by

n_{i}

where

i = 1, 2, \dots, m

. Then, we monitor the process with a sample of size

n_{k}

(Phase-II).

5.1. The S Chart

From the statistical asymptotic theory (for example, Corollarys 6–10 of Arnold [39]), we can have an approximate distribution

\frac{S_{k} - E (S_{k})}{SE (S_{k})} \overset{•}{\sim} N (0, 1),

where

S_{k}

is the sample standard deviation with sample size

n_{k}

. In order to construct the

CL \pm 3 \cdot SE

control limits, we can set up

(S_{k} - E (S_{k})) / SE (S_{k}) = \pm 3

. Solving this for

S_{k}

, we have

E (S_{k}) \pm 3 \cdot SE (S_{k}) = c_{4} (n_{k}) σ \pm 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} σ .

Since

σ

is generally unknown, we need to estimate

σ

. One can use

{\bar{S}}_{A}

in Equation (5),

{\bar{S}}_{B}

in Equation (6),

{\bar{S}}_{C}

in Equation (7), and

{\bar{S}}_{D}

in Equation (11). Using

{\bar{S}}_{A}

, we can construct the S chart with sample size

n_{k}

as follows:

\begin{matrix} {UCL}_{A} & = \frac{c_{4} (n_{k})}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})} + 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{1}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})} \\ {CL}_{A} & = \frac{c_{4} (n_{k})}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})} \\ {LCL}_{A} & = \frac{c_{4} (n_{k})}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})} - 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{1}{m} \sum_{i = 1}^{m} \frac{S_{i}}{c_{4} (n_{i})}, \end{matrix}

where we assign zero to

LCL

if it is negative. Next, using

{\bar{S}}_{B}

, we construct the S chart as follows:

\begin{matrix} {UCL}_{B} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})} + 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})} \\ {CL}_{B} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})} \\ {LCL}_{B} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})} - 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{\sum_{i = 1}^{m} S_{i}}{\sum_{i = 1}^{m} c_{4} (n_{i})} . \end{matrix}

In addition, using

{\bar{S}}_{C}

, we can construct the S chart as follows:

\begin{matrix} {UCL}_{C} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} + 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} \\ {CL}_{C} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} \\ {LCL}_{C} & = c_{4} (n_{k}) \cdot \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} - 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{\sum_{i = 1}^{m} \frac{c_{4} (n_{i}) S_{i}}{1 - c_{4} {(n_{i})}^{2}}}{\sum_{i = 1}^{m} \frac{c_{4} {(n_{i})}^{2}}{1 - c_{4} {(n_{i})}^{2}}} . \end{matrix}

Of particular note is that, for the case of an equal sample size (

n_{1} = n_{2} = \dots = n_{m} = n

) in Phase-I, it is easily seen that

{UCL}_{A} = {UCL}_{B} = {UCL}_{C}

,

{CL}_{A} = {CL}_{B} = {CL}_{C}

, and

{LCL}_{A} = {LCL}_{B} = {LCL}_{C}

. Thus, we have the S chart below with sample size

n_{k}

to use in Phase-II

\begin{matrix} UCL & = \frac{c_{4} (n_{k})}{c_{4} (n)} \bar{S} + \frac{3 \sqrt{1 - c_{4} {(n_{k})}^{2}}}{c_{4} (n)} \bar{S} \\ CL & = \frac{c_{4} (n_{k})}{c_{4} (n)} \bar{S} \\ LCL & = \frac{c_{4} (n_{k})}{c_{4} (n)} \bar{S} - \frac{3 \sqrt{1 - c_{4} {(n_{k})}^{2}}}{c_{4} (n)} \bar{S}, \end{matrix}

(36)

where

\bar{S} = \sum_{i = 1}^{m} S_{i} / m

. We assign zero to

LCL

if it is negative. Furthermore, if we assume

n_{k} = n

, we have

\begin{matrix} UCL & = \bar{S} + \frac{3 \sqrt{1 - c_{4} {(n)}^{2}}}{c_{4} (n)} \bar{S} = B_{4} (n) \bar{S} \\ CL & = \bar{S} \\ LCL & = \bar{S} - \frac{3 \sqrt{1 - c_{4} {(n)}^{2}}}{c_{4} (n)} \bar{S} = B_{3} (n) \bar{S}, \end{matrix}

where

B_{3} (n) = max \{1 - 3 \sqrt{1 - c_{4} {(n)}^{2}} / c_{4} (n), 0\}

, and

B_{4} (n) = 1 + 3 \sqrt{1 - c_{4} {(n)}^{2}} / c_{4} (n)

. This is a well-known S chart introduced in the quality control literature. For example, see the chart in Equation (6.27) of Montgomery [9]. This indicates that the proposed S chart includes the existing S chart as a special case.

Using

{\bar{S}}_{D}

, we construct the S chart as follows:

\begin{matrix} {UCL}_{D} & = c_{4} (n_{k}) \cdot \frac{S_{p}}{c_{4} (N - m + 1)} + 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{S_{p}}{c_{4} (N - m + 1)} \\ {CL}_{D} & = c_{4} (n_{k}) \cdot \frac{S_{p}}{c_{4} (N - m + 1)} \\ {LCL}_{D} & = c_{4} (n_{k}) \cdot \frac{S_{p}}{c_{4} (N - m + 1)} - 3 \sqrt{1 - c_{4} {(n_{k})}^{2}} \cdot \frac{S_{p}}{c_{4} (N - m + 1)}, \end{matrix}

where

S_{p} = {\{\sum_{i = 1}^{m} (n_{i} - 1) S_{i}^{2} / (N - m)\}}^{1 / 2}

and

N = \sum_{i = 1}^{m} n_{i}

. Unlike the previous cases, this control chart is not the same as the one in Equation (36) even for the case of an equal sample size.

5.2. The S and $S^{2}$ Charts with Probability Limits

We can improve the above S charts by using probability limits as mentioned in Section 4.7.4 of Ryan [40]. It follows from the following Chi-square distribution result

\frac{(n_{k} - 1) S_{k}^{2}}{σ^{2}} \sim χ_{n_{k} - 1}^{2}

that

P [χ_{1 - α / 2, n_{k} - 1}^{2} \leq \frac{(n_{k} - 1) S_{k}^{2}}{σ^{2}} \leq χ_{α / 2, n_{k} - 1}^{2}] = 1 - α,

(37)

where

χ_{γ, ν}^{2}

is the

γ

th upper quantile of the Chi-square distribution with

ν

degrees of freedom. Rewriting Equation (37) about

S_{k}

, we then have

P [σ \cdot \sqrt{\frac{χ_{1 - α / 2, n_{k} - 1}^{2}}{n_{k} - 1}} \leq S_{k} \leq σ \cdot \sqrt{\frac{χ_{α / 2, n_{k} - 1}^{2}}{n_{k} - 1}}] = 1 - α .

Thus, we can construct the S chart with probability limits such that

UCL = σ {χ_{α / 2, n_{k} - 1}^{2} / (n_{k} - 1)}^{1 / 2}

,

CL = σ

, and

LCL = σ {χ_{1 - α / 2, n_{k} - 1}^{2} / (n_{k} - 1)}^{1 / 2}

. In practice, since

σ

is unknown, we need to estimate

σ

. Thus, with the estimator

\hat{σ}

, we can obtain

\begin{matrix} UCL & = \hat{σ} \cdot \sqrt{\frac{χ_{α / 2, n_{k} - 1}^{2}}{n_{k} - 1}} \\ CL & = \hat{σ} \\ LCL & = \hat{σ} \cdot \sqrt{\frac{χ_{1 - α / 2, n_{k} - 1}^{2}}{n_{k} - 1}} . \end{matrix}

In the above, one construct the S chart with probability limits by using

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

or

{\bar{S}}_{D}

instead of

\hat{σ}

.

Next, we consider the construction of the

S^{2}

chart. We rewrite Equation (37) about

S_{k}^{2}

and we then have

P [σ^{2} \cdot \frac{χ_{1 - α / 2, n_{k} - 1}^{2}}{n_{k} - 1} \leq S_{k}^{2} \leq σ^{2} \cdot \frac{χ_{α / 2, n_{k} - 1}^{2}}{n_{k} - 1}] = 1 - α .

Using the above along with

{\hat{σ}}^{2} = S_{p}^{2}

where

S_{p}^{2}

is the pooled sample variance denoted in Equation (9), we can also construct the

S^{2}

chart as follows:

\begin{matrix} UCL & = S_{p}^{2} \cdot \frac{χ_{α / 2, n_{k} - 1}^{2}}{n_{k} - 1} \\ CL & = S_{p}^{2} \\ LCL & = S_{p}^{2} \cdot \frac{χ_{1 - α / 2, n_{k} - 1}^{2}}{n_{k} - 1} . \end{matrix}

Note that none of

{\bar{S}}_{A}^{2}

,

{\bar{S}}_{B}^{2}

,

{\bar{S}}_{C}^{2}

, or

{\bar{S}}_{D}^{2}

is unbiased for

σ^{2}

, whereas

S_{p}^{2}

is unbiased. Thus, it is not recommended to use any of the four estimators to construct the

S^{2}

chart.

5.3. The $\bar{X}$ Chart

From the statistical asymptotic theory, we have

\frac{{\bar{X}}_{k} - E ({\bar{X}}_{k})}{SE ({\bar{X}}_{k})} \overset{•}{\sim} N (0, 1),

where

{\bar{X}}_{k}

is the sample mean with sample size

n_{k}

. In order to construct the

CL \pm 3 \cdot SE

control limits, we can set up

({\bar{X}}_{k} - E ({\bar{X}}_{k})) / SE ({\bar{X}}_{k}) = \pm 3

. Solving this for

{\bar{X}}_{k}

, we have

E ({\bar{X}}_{k}) \pm 3 \cdot SE ({\bar{X}}_{k}) = μ \pm \frac{3 σ}{\sqrt{n_{k}}} .

Since

μ

and

σ

are unknown in practice, we need to estimate them. With the estimates

\hat{μ}

and

\hat{σ}

, we have

\begin{matrix} UCL & = \hat{μ} + \frac{3 \hat{σ}}{\sqrt{n_{k}}} \\ CL & = \hat{μ} \\ LCL & = \hat{μ} - \frac{3 \hat{σ}}{\sqrt{n_{k}}} . \end{matrix}

To estimate

μ

, one can use

{\bar{\bar{X}}}_{A}

defined in Equation (1) or

{\bar{\bar{X}}}_{B}

defined in Equation (2). To estimate

σ

, one can use any of

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

, and

{\bar{S}}_{D}

.

As an illustration, using

{\bar{\bar{X}}}_{B}

and

{\bar{S}}_{A}

, we have

\bar{X}

chart as follows:

\begin{matrix} {UCL}_{A} & = {\bar{\bar{X}}}_{B} + \frac{3 {\bar{S}}_{A}}{\sqrt{n_{k}}} \\ {CL}_{A} & = {\bar{\bar{X}}}_{B} \\ {LCL}_{A} & = {\bar{\bar{X}}}_{B} - \frac{3 {\bar{S}}_{A}}{\sqrt{n_{k}}} . \end{matrix}

Similarly, we can construct various

\bar{X}

charts using a total of eight combinations of

μ

and

σ

estimators.

Remark 7.

It should be noted that, when

{\bar{S}}_{D}

is used, the

\bar{X}

chart is somewhat different from the above chart and is given by

\begin{matrix} {UCL}_{D} & = \bar{\bar{X}} + \frac{3 S_{p}}{c_{4} (n m - m + 1) \sqrt{n_{k}}} \\ {CL}_{D} & = \bar{\bar{X}} \\ {LCL}_{D} & = \bar{\bar{X}} - \frac{3 S_{p}}{c_{4} (n m - m + 1) \sqrt{n_{k}}} . \end{matrix}

As shown in Section 4,

{\bar{S}}_{D}

performs better than

{\bar{S}}_{A}

,

{\bar{S}}_{B}

, and

{\bar{S}}_{C}

. However, to the best of our knowledge, the

\bar{X}

chart based on

{\bar{S}}_{D}

has not been widely used probably because the calculation of the normal-consistent unbiasing factor

c_{4}

is difficult especially with a large argument value. Most textbooks provide the values of

c_{4} (n)

only for

n \leq 25

. In Appendix A, for an easy calculation, we provide an approximation for

c_{4} (n)

which is highly accurate within one unit in the ninth decimal place for

n > 25

. Thus, using this, one can easily calculate the LCL and UCL of the

\bar{X}

chart based on

{\bar{S}}_{D}

.

6. Average Run Length and Standard Deviation of Run Length

To compare the performance of the control charts based on the various estimators, we obtained the empirical estimates of the ARL and the SDRL through using the extensive Monte Carlo simulation method. For this simulation, we considered the

\bar{X}

chart. In this chart, we only estimated the location with

\hat{μ} = {\bar{\bar{X}}}_{B}

because the RE of

{\bar{\bar{X}}}_{B}

is better than that of

{\bar{\bar{X}}}_{A}

as shown in Section 4. For the scale, we considered seven different estimates (

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

,

\bar{S}

,

{\bar{S}}^{*}

,

{\bar{S}}_{w}

) and we denoted the charts based on these scale estimates by A, B, C, D, S, S*, and Sw, respectively.

We assume that we have m samples (

n_{1}, n_{2}, \dots, n_{m}

) in Phase-I. Again, let

X_{i j}

be the ith sample (subgroup) of size

n_{i}

. Then,

X_{i j}

’s were generated from the normal distribution with mean

μ_{0} = 10

and standard deviation

σ_{0} = 5

and we obtained the location estimate (

{\bar{\bar{X}}}_{B}

) and the seven scale estimates. Using these estimates, we constructed the seven control charts based on

CL \pm 3 \cdot SE

control limits with FAR 0.27%. Then, we monitored the process with a new sample of size

n_{k}

from the same normal distribution and obtained the run length in Phase-II. We repeated this simulation one million times (

I = 10^{6}

) to obtain the run lengths and then estimated the ARL and SDRL based on these run lengths. Note that the simulation results are the same as the ones under different parameter values of

μ_{0}

and

σ_{0}

. It deserves mentioning that the results are quite reasonable, since the normal distribution is a location-scale family, whereas the results are somewhat dependent on the number of samples and the combination of the sample sizes.

We generated

m = 15

samples with the five different scenarios. The sample sizes of each scenario are given by

Scenario I n_{1} = n_{2} = \dots = n_{5} = 3, n_{6} = n_{7} = \dots = n_{10} = 10, n_{11} = n_{12} = \dots = n_{15} = 17,

Scenario II n_{1} = n_{2} = \dots = n_{5} = 5, n_{6} = n_{7} = \dots = n_{10} = 10, n_{11} = n_{12} = \dots = n_{15} = 15,

Scenario III n_{1} = n_{2} = \dots = n_{5} = 7, n_{6} = n_{7} = \dots = n_{10} = 10, n_{11} = n_{12} = \dots = n_{15} = 13,

Scenario IV n_{1} = n_{2} = \dots = n_{5} = 9, n_{6} = n_{7} = \dots = n_{10} = 10, n_{11} = n_{12} = \dots = n_{15} = 11,

Scenario V n_{1} = n_{2} = \dots = n_{5} = 10, n_{6} = n_{7} = \dots = n_{10} = 10, n_{11} = n_{12} = \dots = n_{15} = 10,

We considered Scenarios I–IV for the cases of unequal sample sizes and Scenario V for the case of an equal sample size as a reference. Upon each of these scenarios, we estimated the ARL and SDRL as described above. The simulation results are provided in Table 3.

Note that the ideal ARL under the normal distribution is around

1 / 0.0027 \approx 370

with FAR 0.27%. However, this ideal value is obtained when using the true

μ_{0}

and

σ_{0}

. In practice, we need to estimate these parameters in Phase-I. Thus, with such an uncertainty due to estimation, the target ARL can be different from the ideal ARL of 370. We observed that the empirical results of A, B, C, and S* have the same value under the case of an equal sample size (Scenario V) and these results have the same tendency when the RE measure is used in Section 4.2. These results are expected as pointed out in Remark 3. The estimated ARL of D (362.58) is very close to that of A, B, C, and S* (364.36). In addition, the estimated SDRL of D (537.31) is very close to that of A, B, C, and S* (541.45). This minor difference may be due to a random phenomenon of the Monte Carlo simulation. Thus, it is quite reasonable to assume that the target ARL is around 360 and the target SDRL is around 540.

In what follows, we analyze and compare our results based on these target values. Note that we did not consider the values from the control charts based on S and Sw because they have much smaller ARL values than the others, which is mainly due to the fact that they have serious negative bias even with the case of an equal sample size as seen in Table 1.

In Scenario I, there is a serious difference in terms of the sample sizes (3, 10, 17). The ARL values of the charts based on A and B seriously overshoot the target value and their SDRLs are far above the target value. The ARL values of the charts using S and Sw seriously undershoot the target value while the ARL of the chart using S* has a minor underestimate. These results are closely related to the RE. In Table 2,

\bar{S}

and

{\bar{S}}_{w}

have noticeable negative values of the biases while

{\bar{S}}^{*}

has a decent negative value. This implies that the scale estimates underestimate the true value, which results in a narrower control chart. Thus, using a narrower control chart, one can have a smaller ARL. When it comes to the SDRL, the SDRLs of the charts using A and B seriously overshoot the target and those using S and Sw undershoot the target. The chart using S* overshoots the target. These results are similar to those with the RE.

In Scenarios II–IV, we have similar observations as noticed in Scenario I. Because there is a less serious difference in sample sizes, the observations are not so dramatic as seen in Scenario I, but their tendencies are quite similar. For example, in Scenario IV, the sample sizes are very slightly different, so these results are quite close to those in Scenario V (equal sample size).

In Scenario V, as mentioned earlier, we considered this as a reference. In this case, we can also observe that the charts using S and Sw have the same results and their ARL values seriously undershoot the target value. In Remark 4, we pointed out

{\bar{S}}_{w} = \bar{S}

for the case of an equal sample size. We can also observe this in Figure 1a. As mentioned earlier, their underperformance is due to a serious negative bias as seen in Table 1. The results show that the charts based on C and D perform very well. However, when the samples are very small (small number of samples with small sample size), we expect that the chart based on D can be slightly better than that based on C as shown in Figure 1. However, in practice, with a decent size of samples, one can use the charts based on either C or D.

7. Illustrative Examples

Here, we provide three real-data examples to illustrate the applications of the proposed methods into the control charts. All computations were analyzed using the R language [41,42]. The R functions for the

\bar{X}

and S charts can be obtained in Appendix B.

Example 1.

We consider the data set presented earlier in Table 30 in Section 3.31 of ASTM [10]. The data sets were obtained from ten shipments whose sample sizes were not equal. The sample sizes are 50, 50, 100, 25, 25, 50, 100, 50, 50, 50. The corresponding sample means are given by 55.7, 54.6, 52.6, 55.0, 53.4, 55.2, 53.3, 52.3, 53.7, 54.3 and the corresponding sample standard deviations are 4.35, 4.03, 2.43, 3.56, 3.10, 3.30, 4.18, 4.30, 2.09, 2.67, respectively. Using these, we have

{\bar{\bar{X}}}_{A} = 54.01

,

{\bar{\bar{X}}}_{B} = 53.8

,

{\bar{S}}_{A} = 3.420251

,

{\bar{S}}_{B} = 3.420254

,

{\bar{S}}_{C} = 3.405517

, and

{\bar{S}}_{D} = 3.491055

.

Using the R functions provided in Appendix B, we can obtain the following results. In the R function, A, B, C, and D denote the control limits based on

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

, respectively. The R functionXbarchart()uses

{\bar{\bar{X}}}_{B}

as a default for the CL of the

\bar{X}

chart since

{\bar{\bar{X}}}_{B}

performs better as seen in Section 4.1.

> ni = c(50, 50, 100, 25, 25, 50, 100, 50, 50, 50)

> Xbari = c(55.7, 54.6, 52.6, 55.0, 53.4, 55.2, 53.3, 52.3, 53.7, 54.3)

> Si = c(4.35, 4.03, 2.43, 3.56, 3.10, 3.30, 4.18, 4.30, 2.09, 2.67)

> Xbarchart(Xbari, Si, ni=ni, nk=25)

LCL CL UCL

A 51.74785 53.8 55.85215

B 51.74785 53.8 55.85215

C 51.75669 53.8 55.84331

D 51.70537 53.8~55.89463

> Schart(Si, ni, nk=25)

LCL CL UCL

A 1.911697 3.384818 4.857940

B 1.911699 3.384822 4.857945

C 1.903462 3.370238 4.837013

D 1.951272 3.454889 4.958505

The control limits are calculated using Methods A–D and the results are summarized in Table 4. The results using Methods A–C are quite close but those using Method D are slightly different from others.

In addition, the variances of

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

, and

{\bar{S}}_{E}

are obtained from Equations (14)–(16), (18), and (34), respectively, and we have

Var ({\bar{S}}_{A}) = 0.0011375146 \cdot σ^{2}

,

Var ({\bar{S}}_{B}) = 0.0011348232 \cdot σ^{2}

,

Var ({\bar{S}}_{C}) = 0.0009301593 \cdot σ^{2}

,

Var ({\bar{S}}_{D}) = 0.0009263542 \cdot σ^{2}

, and

Var ({\bar{S}}_{E}) = 0.0009111612 \cdot σ^{2}

. Thus, the REs with respect to

{\bar{S}}_{E}

are easily obtained from Equation (35) and we have

RE ({\bar{S}}_{A}) = 80.10 %

,

RE ({\bar{S}}_{B}) = 80.29 %

,

RE ({\bar{S}}_{C}) = 97.96 %

, and

RE ({\bar{S}}_{D}) = 98.36 %

.

Example 2.

We consider the data set presented earlier in Table 32 in Section 3.31 of ASTM [10]. The data sets were obtained from 21 tension testing machines whose sample sizes were not equal. The sample sizes are 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5. The corresponding sample means are given by 73.8, 71.0, 74.2, 71.0, 70.0, 67.0, 73.5, 71.2, 71.2, 71.2, 71.6, 71.2, 74.2, 74.6, 72.4, 75.3, 69.0, 71.8, 72.8, 69.8, 69.00, and the corresponding sample standard deviations are 1.10, 0.71, 0.45, 1.41, 0.00, 2.35, 1.91, 1.79, 0.45, 0.45, 0.55, 0.55, 0.84, 0.55, 0.55, 0.50, 0.71, 0.84, 0.45, 1.30, 0.00, respectively. Using these, we have

{\bar{\bar{X}}}_{A} = 71.70476

,

{\bar{\bar{X}}}_{B} = 71.65243

,

{\bar{S}}_{A} = 0.8869858

,

{\bar{S}}_{B} = 0.8861882

,

{\bar{S}}_{C} = 0.8762927

, and

{\bar{S}}_{D} = 1.014672

.

Using the R functions in Appendix B with the above data sets, the control limits are easily calculated and we summarized the results in Table 5.

As was illustrated in Example 1, the variances of

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

, and

{\bar{S}}_{E}

are obtained from Equations (14)–(16), (18), and (34), respectively, and we have

Var ({\bar{S}}_{A}) = 0.006484797 \cdot σ^{2}

,

Var ({\bar{S}}_{B}) = 0.006477515 \cdot σ^{2}

,

Var ({\bar{S}}_{C}) = 0.006434091 \cdot σ^{2}

,

Var ({\bar{S}}_{D}) = 0.006116037 \cdot σ^{2}

, and

Var ({\bar{S}}_{E}) = 0.004913916 \cdot σ^{2}

. Then, the REs with respect to

{\bar{S}}_{E}

are given by

RE ({\bar{S}}_{A}) = 75.78 %

,

RE ({\bar{S}}_{B}) = 75.86 %

,

RE ({\bar{S}}_{C}) = 76.37 %

, and

RE ({\bar{S}}_{D}) = 80.34 %

.

Example 3.

We consider the data set presented earlier in Table 6.4 of Montgomery [9] which includes 113 measurements (in millimeters) of the diameters of piston rings for an automotive engine produced by a forging process. The data were obtained from 25 samples and the sample sizes are given by 5, 3, 5, 5, 5, 4, 4, 5, 4, 5, 5, 5, 3, 5, 3, 5, 4, 5, 5, 3, 5, 5, 5, 5, 5. The corresponding sample means are calculated as 74.010, 73.996, 74.008, 74.003, 74.003, 73.996, 73.999, 73.997, 74.004, 73.998, 73.994, 74.001, 73.994, 73.990, 74.008, 73.997, 73.999, 74.007, 73.998, 74.008, 74.000, 74.002, 74.002, 74.005, 73.998, and the corresponding sample standard deviations are 0.0148, 0.0046, 0.0147, 0.0091, 0.0122, 0.0099, 0.0055, 0.0123, 0.0064, 0.0063, 0.0029, 0.0042, 0.0100, 0.0153, 0.0087, 0.0078, 0.0115, 0.0070, 0.0085, 0.0068, 0.0122, 0.0074, 0.0119, 0.0087, 0.0162. Using these, we have

{\bar{\bar{X}}}_{A} = 74.00068

,

{\bar{\bar{X}}}_{B} = 74.00066

,

{\bar{S}}_{A} = 0.01010231

,

{\bar{S}}_{B} = 0.01012067

,

{\bar{S}}_{C} = 0.01030545

, and

{\bar{S}}_{D} = 0.01032266

.

Similar to the two examples above, we can obtain the control limits using these data sets. The control limits are calculated and summarized in Table 6. In addition, the variances of

{\bar{S}}_{A}

,

{\bar{S}}_{B}

,

{\bar{S}}_{C}

,

{\bar{S}}_{D}

, and

{\bar{S}}_{E}

are obtained as

Var ({\bar{S}}_{A}) = 0.006472658 \cdot σ^{2}

,

Var ({\bar{S}}_{B}) = 0.006390116 \cdot σ^{2}

,

Var ({\bar{S}}_{C}) = 0.006020000 \cdot σ^{2}

,

Var ({\bar{S}}_{D}) = 0.005697867 \cdot σ^{2}

, and

Var ({\bar{S}}_{E}) = 0.004474206 \cdot σ^{2}

. Using these, the REs with respect to

{\bar{S}}_{E}

are calculated as

RE ({\bar{S}}_{A}) = 69.12 %

,

RE ({\bar{S}}_{B}) = 70.02 %

,

RE ({\bar{S}}_{C}) = 74.32 %

, and

RE ({\bar{S}}_{D}) = 78.52 %

.

8. Conclusions

In this paper, we have considered several unbiased location and scale estimators for the process parameters of the

\bar{X}

and S control charts when the sample sizes are not necessarily equal. These estimators are essential for constructing the control limits of the Shewhart-type control charts. A natural question is: among these unbiased estimators, which one should be recommended in practical applications? We clarified this question by providing the inequality relations among the variances of these estimators through the rigorous proofs. We also showed that the conventional ad hoc methods could result in degraded performance of the control charts, mainly because the adopted estimators are all biased and they actually tend to underestimate the true scale parameter.

We also provided the relative efficiency of the methods along with the conventional methods and the empirical estimates of the ARL and the SDRL through using the extensive Monte Carlo simulations. We observed that the chart based on

{\bar{S}}_{D}

outperforms the others under consideration from both theoretical and numerical points of view. The only difficulty of using

{\bar{S}}_{D}

lies in calculating the normal-consistent unbiasing factor

c_{4}

for a large argument value (large sample size) without using a professional software. To resolve this problem, we provided an approximation of the

c_{4}

as a function of a sample size which can be easily calculated with a general calculator for the case of a large argument value and is accurate within one unit in the ninth decimal place.

All of the theoretical results revealed an interesting and useful connection between the two fields of statistics and mathematics. For example, the normal-consistent unbiasing factor

c_{4}

can be expressed as a function through using the Watson representation, which helps one to understand the behavior of the

c_{4}

in depth. We thus expect that the new findings about the

c_{4}

can help quality engineers develop more useful results in various engineering statistics fields.

It is noteworthy to mention that all the theoretical and empirical results of this paper require the assumption of the homoscedasticity case of

σ

. In an ongoing work, we investigate these estimators more thoroughly for the heteroscedasticity case.

Author Contributions

C.P. developed methodology and R functions; M.W. investigated mathematical formulas; and C.P. and M.W. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2017R1A2B4004169).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CL	center line
LCL	lower control limit
UCL	upper control limit
ARL	average run length
SDRL	standard deviation of run length
FAR	false alarm rate
MSE	mean square error
BLUE	best linear unbiased estimator
UMVUE	uniform minimum variance unbiased estimator
RE	relative efficiency

Appendix A. Calculation of c 4 (n)

Many textbooks provide the table for the values of

c_{4} (n)

. However, to our knowledge, this is very limited to the case for

n \leq 25

. To construct the control charts using the proposed methods, it is important to calculate

c_{4} (n)

accurately, especially for

n > 25

.

The calculation of

c_{4} (n)

needs to calculate the gamma function or the factorial which runs into overflow errors for a large value of n. To avoid this problem, one can use the log-gamma function to calculate

c_{4} (n)

, or an approximation technique for

c_{4} (n)

can be used for an easier and simpler calculation. For example, two well-known approximations below are widely used for

c_{4} (n)

:

c_{4 a} (n) \approx \frac{4 n - 4}{4 n - 3} and c_{4 b} (n) \approx \sqrt{\frac{4 n - 5}{4 n - 3}} .

For more details, refer to Chapter 3 of ASTM [10]. These approximations are based on the Stirling’s formula [43] and accurate within one unit in the fourth and fifth decimal places for

n > 25

, respectively.

However, we can obtain a better approximation using the Wallis’ inequality. For more details on the Wallis’ inequality, one can refer to Wallis [15], Stedall [44], and Kazarinoff [45]. Here, we provide two approximations such that

\begin{matrix} c_{4 c} (n) & \approx \sqrt{\frac{1}{n - 1}} \cdot \sqrt[4]{n^{2} - 3 n + \frac{5}{2}} \end{matrix}

and

\begin{matrix} c_{4 d} (n) & \approx \sqrt{\frac{1}{n - 1}} \cdot \sqrt[8]{n^{4} - 6 n^{3} + 14 n^{2} - 15 n + 6}, \end{matrix}

which are accurate within one unit in the seventh and ninth decimal places, respectively, for

n > 25

. For more details, see Mortici [16].

To compare the approximations above with the true value, we calculate the relative error times

10^{6}

of each approximation which is given by

ϵ_{j} (n) = |\frac{c_{4 j} (n) - c_{4} (n)}{c_{4} (n)}| \times 10^{6},

where

j = a, b, c, d

. The relative errors for

n = 10, 20, 30, 40, 50

are calculated and summarized in Table A1. As shown in the table,

c_{4 d} (n)

provides the best approximation and the accuracy gets better for a larger value of n as expected. We can also observe that the approximation is quite accurate even for smaller values of n so that it can be practically used for

n \geq 10

, say. This approximation can be useful for field engineers and practitioner because

c_{4 d} (n)

can be calculated using a regular calculator.

Table A1. The relative errors

ϵ_{j} (n)

for

n = 10, 20, 30, 40, 50

.

Table A1. The relative errors

ϵ_{j} (n)

for

n = 10, 20, 30, 40, 50

.

n	10	20	30	40	50
$c_{4} (n)$	$\frac{128}{105} \sqrt{\frac{2}{π}}$	$\frac{65536}{230945} \sqrt{\frac{38}{π}}$	$\frac{33554432}{145422675} \sqrt{\frac{58}{π}}$	$\frac{34359738368}{172308161025} \sqrt{\frac{78}{π}}$	$\frac{70368744177664}{56433306445425} \sqrt{\frac{2}{π}}$
$ϵ_{a} (n)$	322.5166918	79.761632484	35.2405932138	19.7566428390	12.6174044235
$ϵ_{b} (n)$	63.4846795	6.814116424	1.9195517402	0.7896677690	0.3982547452
$ϵ_{c} (n)$	5.7906964	0.264861432	0.0472214077	0.0141995968	0.0056418668
$ϵ_{d} (n)$	0.1547649	0.001535605	0.0001158804	0.0000191278	0.0000047817

Appendix B. R Codes for Illustrative Examples

Xbarchart = function(Xbari,Si,ni,nk,CL=c(“XB”,“XA”),FAR=0.002699796){

CL = match.arg(CL)

if (CL==“XA”) {

Xbarbar = mean(Xbari)

}

else if (CL==“XB”) {

Xbarbar = sum(ni*Xbari) / sum(ni)

}

else {

stop(“Choose the Xbarbar: \“XA\” or \“XB\”.”)

}

N = sum(ni)

m = length(ni)

c4ni = sqrt(2/(ni-1))*exp(lgamma(ni/2) - lgamma((ni-1)/2))

one.minus.c4sq = 1-c4ni^2

S = numeric(4)

S[1] = sum(Si/c4ni) / m

S[2] = sum(Si) / sum(c4ni)

S[3] = sum(c4ni*Si/one.minus.c4sq) / sum(c4ni^2/one.minus.c4sq)

c4Nm1 = sqrt(2/(N-m))*exp(lgamma((N-m+1)/2) - lgamma((N-m)/2))

S[4] = sqrt( sum((ni-1)*Si^2) / (N-m) ) / c4Nm1

z.cut = qnorm(1-FAR/2)

OUT = array(dim=c(4,3))

for ( i in 1:4 ) {

OUT[i,] = Xbarbar + c(-z.cut*S[i]/sqrt(nk),0,z.cut*S[i]/sqrt(nk))

}

colnames(OUT) = c(“LCL”, “CL”, “UCL”)

rownames(OUT) = c(“A”, “B”, “C”, “D”)

return(OUT)

}

Schart = function(Si,ni,nk,FAR=0.002699796){

z.cut = qnorm(1-FAR/2)

c4nk = sqrt(2/(nk-1))*exp(lgamma(nk/2) - lgamma((nk-1)/2))

c4ni = sqrt(2/(ni-1))*exp(lgamma(ni/2) - lgamma((ni-1)/2))

N = sum(ni)

m = length(ni)

one.minus.c4sq = 1-c4ni^2

S = numeric(4)

S[1] = sum(Si/c4ni) / m

S[2] = sum(Si) / sum(c4ni)

S[3] = sum(c4ni*Si/one.minus.c4sq) / sum(c4ni^2/one.minus.c4sq)

c4Nm1 = sqrt(2/(N-m))*exp(lgamma((N-m+1)/2) - lgamma((N-m)/2))

S[4] = sqrt( sum((ni-1)*Si^2) / (N-m) ) / c4Nm1

OUT = array(dim=c(4,3))

for ( i in 1:4 ) {

CL = c4nk*S[i]

LCL = max(CL - z.cut*sqrt(1-c4nk^2)*S[i], 0)

UCL = CL + z.cut*sqrt(1-c4nk^2)*S[i]

OUT[i,] = c(LCL, CL, UCL)

}

colnames(OUT) = c(“LCL”, “CL”, “UCL”)

rownames(OUT) = c(“A”, “B”, “C”, “D”)

return(OUT)

}

References

Shewhart, W.A. Quality Control Charts. Bell Syst. Tech. J. 1926, 5, 593–603. [Google Scholar] [CrossRef]
Shewhart, W.A. Quality Control. Bell Syst. Tech. J. 1927, 6, 722–735. [Google Scholar] [CrossRef]
Shewhart, W.A. Economic Control of Quality of Manufactured Product; Van Nostrand Reinhold: Princeton, NJ, USA, 1931. [Google Scholar]
Kourti, T.; MacGregor, J.F. Multivariate SPC Methods for Process and Product Monitoring. J. Qual. Technol. 1996, 28, 409–428. [Google Scholar] [CrossRef]
Flores, M.; Fernández-Casal, R.; Naya, S.; Tarrío-Saavedra, J.; Bossano, R. ILS: An R package for statistical analysis in Interlaboratory Studies. Chemom. Intell. Lab. Syst. 2018, 181, 11–20. [Google Scholar] [CrossRef]
Golshan, M.; MacGregor, J.F.; Bruwer, M.J.; Mhaskar, P. Latent Variable Model Predictive Control (LV-MPC) for trajectory tracking in batch processes. J. Process Control 2010, 20, 538–550. [Google Scholar] [CrossRef]
Flores, M. qcr: Quality Control Review; R Package Version 1.2. Available online: https://cran.r-project.org/package=qcr (accessed on 21 April 2020).
Flores, M.; Naya, S.; Fernández-Casal, R.; Zaragoza, S.; Raña, P.; Tarrío-Saavedra, J. Constructing a Control Chart Using Functional Data. Mathematics 2020, 8, 58. [Google Scholar] [CrossRef] [Green Version]
Montgomery, D.C. Statistical Quality Control: An Modern Introduction, 7th ed.; John Wiley & Sons: Singapore, 2013. [Google Scholar]
ASTM E11. Manual on Presentation of Data and Control Chart Analysis, 9th ed.; Luko, S.N., Ed.; American Society for Testing and Materials: West Conshohocken, PA, USA, 2018. [Google Scholar]
ASQC. ASQC Standard A-1 (Proposed): Definitions, Symbols, Formulas and Tables for Control Charts. Ind. Qual. Control 1967, 24, 217–221. [Google Scholar]
Vardeman, S.B. A brief tutorial on the estimation of the process standard deviation. IIE Trans. 1999, 31, 503–507. [Google Scholar] [CrossRef]
Burr, I.W. Control Charts for Measurements with Varying Sample Sizes. J. Qual. Technol. 1969, 1, 163–167. [Google Scholar] [CrossRef]
Watson, G.N. A Note on Gamma Functions. Edinb. Math. Notes 1959, 42, 7–9. [Google Scholar] [CrossRef] [Green Version]
Wallis, J. Arithmetica Infinitorum; University of Oxford: Oxford, UK, 1656. [Google Scholar]
Mortici, C. New approximation formulas for evaluating the ratio of gamma functions. Math. Comput. Model. 2010, 52, 425–433. [Google Scholar] [CrossRef]
Chebyshev, P.L. Sur les expressions approximatives des intégrales définies par les autres prises entre les même limites. In Oeuvres de P. L. Tchebychef I–II, Vol. 2; Markov, A.A., Sonin, N., Eds.; Imprimerie de l’Academie Imperiale des Sciences: St. Petersbourg, Russia, 1882; pp. 716–719. [Google Scholar]
Chebyshev, P.L. Sur une série qui fournit les valeurs extrêmes des intégrales, lorsque la fonction sous le signe est décomposée en deux facteurs. In Oeuvres de P. L. Tchebychef I–II, Vol. 2; Markov, A.A., Sonin, N., Eds.; Imprimerie de l’Academie Imperiale des Sciences: St. Petersbourg, Russia, 1883; pp. 405–419. [Google Scholar]
Besenyei, A. Picard’s Weighty Proof of Chebyshev’s Sum Inequality. Math. Mag. 2018, 91, 366–371. [Google Scholar] [CrossRef]
Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: London, UK, 1934. [Google Scholar]
Bustoz, J.; Ismail, M.E.H. On Gamma Function Inequalities. Math. Comput. 1986, 47, 659–667. [Google Scholar] [CrossRef]
Feller, W. An Introduction to Probability Theory and Its Applications, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1966; Volume II. [Google Scholar]
Merkle, M. Completely Monotone Functions: A Digest. In Analytic Number Theory, Approximation Theory, and Special Functions; Milovanović, G.V., Rassias, M.T., Eds.; Springer: New York, NY, USA, 2014; pp. 347–364. [Google Scholar]
Fink, A. Kolmogorov-Landau inequalities for monotone functions. J. Math. Anal. Appl. 1982, 90, 251–258. [Google Scholar] [CrossRef] [Green Version]
Niculescu, C.P.; Persson, L.E. Convex Functions and Their Applications: A Contemporary Approach; Springer: New York, NY, USA, 2006. [Google Scholar]
Van Haeringen, H. Inequalities for Real Powers of Completely Monotonic Functions. J. Math. Anal. Appl. 1997, 210, 102–113. [Google Scholar] [CrossRef] [Green Version]
Merkle, M. Logarithmic Convexity and Inequalities for the Gamma Function. J. Math. Anal. Appl. 1996, 203, 369–380. [Google Scholar] [CrossRef] [Green Version]
Schilling, R.L. Measures, Integrals and Martingales; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury: Pacific Grove, CA, USA, 2002. [Google Scholar]
Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
Lehmann, E.L. Elements of Large-Sample Theory; Springer: New York, NY, USA, 1999. [Google Scholar]
Gwanyama, P.W. The HM-GM-AM-QM Inequalities. Coll. Math. J. 2004, 35, 47–50. [Google Scholar] [CrossRef]
Park, C.; Leeds, M. A Highly Efficient Robust Design Under Data Contamination. Comput. Ind. Eng. 2016, 93, 131–142. [Google Scholar] [CrossRef]
Park, C.; Ouyang, L.; Byun, J.H.; Leeds, M. Robust design under normal model departure. Comput. Ind. Eng. 2017, 113, 206–220. [Google Scholar] [CrossRef]
Ouyang, L.; Park, C.; Byun, J.H.; Leeds, M. Robust Design in the Case of Data Contamination and Model Departure. In Statistical Quality Technologies: Theory and Practice (ICSA Book Series in Statistics); Lio, Y., Ng, H., Tsai, T.R., Chen, D.G., Eds.; Springer: Cham, Switzerland, 2019; pp. 347–373. [Google Scholar]
Anderson, T.W. An Introduction to Multivariate Statistical Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Prentice-Hall Inc.: Englewood Cliffs, NJ, USA, 2007. [Google Scholar]
Vining, G. Technical Advice: Phase I and Phase II Control Charts. Qual. Eng. 2009, 21, 478–479. [Google Scholar] [CrossRef]
Arnold, S.F. Mathematical Statistics; Prentice-Hall: Englewood Cliffs, NJ, USA, 1990. [Google Scholar]
Ryan, T.P. Statistical Methods For Quality Improvement, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
Gentleman, R.; Ihaka, R. The R language. In Proceedings of the 28th Symposium on the Interface; Billard, L., Fisher, N., Eds.; The Interface Foundation of North America: Fairfax Station, VA, USA, 1991. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: http://www.r-project.org (accessed on 29 April 2020).
Robbins, H. A Remark on Stirling’s Formula. Am. Math. Mon. 1955, 62, 26–29. [Google Scholar] [CrossRef]
Stedall, J.A. Arithmetica Infinitorum: John Wallis 1656; Springer: New York, NY, USA, 2004; (English Translation from the Original BookWritten in Latin). [Google Scholar]
Kazarinoff, D.K. On Wallis’ formula. Edinb. Math. Notes 1956, 40, 19–21. [Google Scholar] [CrossRef] [Green Version]

Figure 1. (a)

n_{1} = n_{2} = n_{3} = 3

, (b)

n_{1} = n_{2} = n_{3} = 10

, and (c)

n_{1} = n_{2} = n_{3} = 20

.

Figure 1. (a)

n_{1} = n_{2} = n_{3} = 3

, (b)

n_{1} = n_{2} = n_{3} = 10

, and (c)

n_{1} = n_{2} = n_{3} = 20

.

Figure 2. (a)

n_{1} = 3, n_{2} = 5, n_{3} = 7

, (b)

n_{1} = 5, n_{2} = 10, n_{3} = 15

, and (c)

n_{1} = 10, n_{2} = 20, n_{3} = 30

.

Figure 2. (a)

n_{1} = 3, n_{2} = 5, n_{3} = 7

, (b)

n_{1} = 5, n_{2} = 10, n_{3} = 15

, and (c)

n_{1} = 10, n_{2} = 20, n_{3} = 30

.

Table 1. The empirical bias, variance, MSE, and RE with an equal sample size.

	${\bar{S}}_{A}$	${\bar{S}}_{B}$	${\bar{S}}_{C}$	${\bar{S}}_{D}$	${\bar{S}}_{E}$	$\bar{S}$	${\bar{S}}^{*}$	${\bar{S}}_{w}$
	( $n_{1} = n_{2} = n_{3} = 3$ )
Bias	0.0005	0.0005	0.0005	0.0004	0.0008	$- 1.1372$	0.0005	$- 1.1372$
Var	9.1087	9.1087	9.1087	8.6503	6.4354	7.1539	9.1087	7.1539
MSE	9.1087	9.1087	9.1087	8.6503	6.4354	8.4473	9.1087	8.4473
RE	0.7065	0.7065	0.7065	0.7439	1.0000	0.7618	0.7065	0.7618
	( $n_{1} = n_{2} = n_{3} = 10$ )
Bias	0.0001	0.0001	0.0001	0.0002	0.0002	$- 0.2733$	0.0001	$- 0.2733$
Var	1.9007	1.9007	1.9007	1.8689	1.7388	1.7982	1.9007	1.7982
MSE	1.9007	1.9007	1.9007	1.8689	1.7388	1.8729	1.9007	1.8729
RE	0.9148	0.9148	0.9148	0.9304	1.0000	0.9284	0.9148	0.9284
	( $n_{1} = n_{2} = n_{3} = 20$ )
Bias	0.0001	0.0001	0.0001	0.0001	0.0002	$- 0.1305$	0.0001	$- 0.1305$
Var	0.8885	0.8885	0.8885	0.8811	0.8511	0.8655	0.8885	0.8655
MSE	0.8885	0.8885	0.8885	0.8811	0.8511	0.8825	0.8885	0.8825
RE	0.9578	0.9578	0.9578	0.9660	1.0000	0.9644	0.9578	0.9644

Table 2. The empirical bias, variance, MSE, and RE with unequal sample sizes.

	${\bar{S}}_{A}$	${\bar{S}}_{B}$	${\bar{S}}_{C}$	${\bar{S}}_{D}$	${\bar{S}}_{E}$	$\bar{S}$	${\bar{S}}^{*}$	${\bar{S}}_{w}$
	( $n_{1} = 3$ , $n_{2} = 5$ , $n_{3} = 7$ )
Bias	0.0008	0.0008	0.0009	0.0009	0.0007	$- 0.7140$	$- 0.1211$	$- 0.6164$
Var	5.4627	5.2936	4.3850	4.2500	3.6337	4.5640	5.1653	3.8867
MSE	5.4627	5.2936	4.3850	4.2500	3.6337	5.0738	5.1800	4.2667
RE	0.6652	0.6864	0.8287	0.8550	1.0000	0.7162	0.7015	0.8517
	( $n_{1} = 5$ , $n_{2} = 10$ , $n_{3} = 15$ )
Bias	0.0006	0.0006	0.0003	0.0003	0.0002	$- 0.3496$	$- 0.0783$	$- 0.2792$
Var	2.5013	2.4512	1.8990	1.8685	1.7388	2.2825	2.4126	1.7990
MSE	2.5013	2.4512	1.8990	1.8685	1.7388	2.4047	2.4188	1.8770
RE	0.6952	0.7094	0.9157	0.9306	1.0000	0.7231	0.7189	0.9264
	( $n_{1} = 10$ , $n_{2} = 20$ , $n_{3} = 30$ )
Bias	$- 0.0001$	$- 0.0001$	0.0001	0.0001	0.0002	$- 0.1634$	$- 0.0331$	$- 0.1319$
Var	1.1229	1.1138	0.8885	0.8811	0.8511	1.0777	1.1064	0.8657
MSE	1.1229	1.1138	0.8885	0.8811	0.8511	1.1044	1.1075	0.8831
RE	0.7579	0.7641	0.9579	0.9659	1.0000	0.7706	0.7684	0.9637

Table 3. Estimated ARL and SDRL with

n_{k} = 10

.

Table 3. Estimated ARL and SDRL with

n_{k} = 10

.

	A	B	C	D	$\bar{S}$	${\bar{S}}^{*}$	${\bar{S}}_{w}$
		Scenario I
ARL	475.03	456.02	363.61	361.84	257.78	343.39	270.79
SDRL	1301.18	1184.11	536.61	531.45	499.21	777.03	387.18
		Scenario II
ARL	390.41	388.19	364.59	362.56	269.79	357.12	274.27
SDRL	654.28	643.59	540.99	533.90	421.37	586.10	391.26
		Scenario III
ARL	370.63	370.25	363.84	361.77	273.95	361.84	275.39
SDRL	565.01	563.36	538.59	530.64	400.49	549.63	392.54
		Scenario IV
ARL	364.28	364.23	363.63	361.77	275.28	363.39	275.32
SDRL	539.37	539.16	537.54	531.26	393.40	537.95	392.44
		Scenario V (equal sample size)
ARL	364.36	364.36	364.36	362.58	275.95	364.36	275.95
SDRL	541.45	541.45	541.45	537.31	395.12	541.45	395.12

Table 4. Control limits for the

\bar{X}

and S charts for Example 1.

Table 4. Control limits for the

\bar{X}

and S charts for Example 1.

	$\bar{X}$ Chart			S Chart
Method	LCL	CL	UCL	LCL	CL	UCL
			sample size, $n_{k} = 25$
A	51.74785	53.80	55.85215	1.911697	3.384818	4.857940
B	51.74785	53.80	55.85215	1.911699	3.384822	4.857945
C	51.75669	53.80	55.84331	1.903462	3.370238	4.837013
D	51.70537	53.80	55.89463	1.951272	3.454889	4.958505
			sample size, $n_{k} = 50$
A	52.34891	53.80	55.25109	2.369028	3.402846	4.436665
B	52.34891	53.80	55.25109	2.369030	3.402850	4.436669
C	52.35516	53.80	55.24484	2.358823	3.388188	4.417553
D	52.31887	53.80	55.28113	2.418070	3.473290	4.528509
			sample size, $n_{k} = 100$
A	52.77392	53.80	54.82608	2.683351	3.411625	4.139899
B	52.77392	53.80	54.82608	2.683354	3.411629	4.139903
C	52.77834	53.80	54.82166	2.671792	3.396929	4.122065
D	52.75268	53.80	54.84732	2.738900	3.482250	4.225600

Table 5. Control limits for the

\bar{X}

and S charts for Example 2.

Table 5. Control limits for the

\bar{X}

and S charts for Example 2.

	$\bar{X}$ Chart			S Chart
Method	LCl	CL	UCL	LCL	CL	UCL
			sample size, $n_{k} = 4$
A	70.32195	71.65243	72.98291	0	0.8171958	1.851804
B	70.32314	71.65243	72.98171	0	0.8164609	1.850139
C	70.33799	71.65243	72.96687	0	0.8073440	1.829480
D	70.13042	71.65243	73.17444	0	0.9348355	2.118381
			sample size, $n_{k} = 5$
A	70.46241	71.65243	72.84244	0	0.8337539	1.741710
B	70.46348	71.65243	72.84137	0	0.8330041	1.740144
C	70.47676	71.65243	72.82810	0	0.8237026	1.720713
D	70.29110	71.65243	73.01375	0	0.9537773	1.992439

Table 6. Control limits for the

\bar{X}

and S charts for Example 3.

Table 6. Control limits for the

\bar{X}

and S charts for Example 3.

	$\bar{X}$ Chart			S Chart
Method	LCL	CL	UCL	LCL	CL	UCL
			sample size, $n_{k} = 3$
A	73.98317	74.00066	74.01816	0	0.008952937	0.02299266
B	73.98313	74.00066	74.01819	0	0.008969207	0.02303445
C	73.98281	74.00066	74.01851	0	0.009132968	0.02345501
D	73.98278	74.00066	74.01854	0	0.009148216	0.02349417
			sample size, $n_{k} = 4$
A	73.98551	74.00066	74.01582	0	0.009307435	0.02109109
B	73.98548	74.00066	74.01584	0	0.009324349	0.02112941
C	73.98521	74.00066	74.01612	0	0.009494595	0.02151520
D	73.98518	74.00066	74.01615	0	0.009510446	0.02155112
			sample size, $n_{k} = 5$
A	73.98711	74.00066	74.01422	0	0.009496024	0.01983717
B	73.98709	74.00066	74.01424	0	0.009513281	0.01987322
C	73.98684	74.00066	74.01449	0	0.009686976	0.02023607
D	73.98681	74.00066	74.01451	0	0.009703148	0.02026986

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, C.; Wang, M. A Study on the X ¯ and S Control Charts with Unequal Sample Sizes. Mathematics 2020, 8, 698. https://doi.org/10.3390/math8050698

AMA Style

Park C, Wang M. A Study on the X ¯ and S Control Charts with Unequal Sample Sizes. Mathematics. 2020; 8(5):698. https://doi.org/10.3390/math8050698

Chicago/Turabian Style

Park, Chanseok, and Min Wang. 2020. "A Study on the X ¯ and S Control Charts with Unequal Sample Sizes" Mathematics 8, no. 5: 698. https://doi.org/10.3390/math8050698

APA Style

Park, C., & Wang, M. (2020). A Study on the X ¯ and S Control Charts with Unequal Sample Sizes. Mathematics, 8(5), 698. https://doi.org/10.3390/math8050698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on the X ¯ and S Control Charts with Unequal Sample Sizes

Abstract

1. Introduction

2. Estimation of Process Parameters with Unequal Sample Sizes

2.1. Location Parameter

2.2. Scale Parameter

3. Inequalities of the Variances of the Scale Estimators

4. Comparison of the Performance

4.1. Relative Efficiency

4.2. Empirical Biases and Variances

5. Construction of the Control Charts with Unequal Sample Sizes

5.1. The S Chart

5.2. The S and $S^{2}$ Charts with Probability Limits

5.3. The $\bar{X}$ Chart

6. Average Run Length and Standard Deviation of Run Length

7. Illustrative Examples

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Calculation of c 4 (n)

Appendix B. R Codes for Illustrative Examples

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Study on the X ¯ and S Control Charts with Unequal Sample Sizes

Abstract

1. Introduction

2. Estimation of Process Parameters with Unequal Sample Sizes

2.1. Location Parameter

2.2. Scale Parameter

3. Inequalities of the Variances of the Scale Estimators

4. Comparison of the Performance

4.1. Relative Efficiency

4.2. Empirical Biases and Variances

5. Construction of the Control Charts with Unequal Sample Sizes

5.1. The S Chart

5.2. The S and S 2 Charts with Probability Limits

5.3. The X ¯ Chart

6. Average Run Length and Standard Deviation of Run Length

7. Illustrative Examples

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Calculation of c 4 (n)

Appendix B. R Codes for Illustrative Examples

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2. The S and $S^{2}$ Charts with Probability Limits

5.3. The $\bar{X}$ Chart