1. Introduction
When there are more than two quality characteristics to be monitored in a process, multivariate control charts can be employed in the process monitoring context. In a multivariate normal process, there are two process parameters, i.e., the mean vector and the variance–covariance matrix. Simultaneous monitoring of these parameters is usually preferred over monitoring only one of them due to its better overall performance. We have two main simultaneous process parameters monitoring schemes: (i) a single-chart scheme and (ii) a double-chart scheme. In a single-chart scheme, only one chart for both process parameters is used, and in a double-chart scheme, one chart for each process parameter is employed. A single-chart scheme is usually preferred due to its simplicity, considering only one control chart should be administrated in a single-chart scheme in comparison to the double-chart scheme, in which two control charts should be administrated at the same time. Reynolds and Gyo-Young [
1], Hawkins and Maboudou-Tchao [
2], and Zhang and Chang [
3] studied double-chart cases, and Khoo [
4], Zhang et al. [
5], Wang et al. [
6], Sabahno et al. [
7,
8,
9], Sabahno and Khoo [
10], and Sabahno [
11] investigated single-chart cases. Sabahno et al. [
9] proposed new memory-less statistical control charts with fixed and adaptive design parameters (sample size, sampling interval, and control limits) for simultaneous monitoring of multivariate normal process parameters. They used two statistics each for monitoring one process parameter, but in the end, they combined them into one statistic.
Machine-learning techniques have been extensively used for process monitoring with control charts for different reasons, such as dimension reduction, pattern recognition, change point estimation, signal detection, identification, and fault diagnosis. Artificial neural network (ANN) is the most used ML technique in this regard. Some of the most notable research that employed ML techniques in control charts are that conducted by Chang and Ho [
12], Niaki and Abbasi [
13,
14], Cheng and Cheng [
15], Abbasi [
16], Salehi et al. [
17], Hosseinifard et al. [
18], Weese et al. [
19], Escobar and Morales-Menendez [
20], Apsemidis et al. [
21], Mohd Amiruddin et al. [
22], Diren et al. [
23], Yeganeh et al. [
24], Mohammadzadeh et al. [
25], and Sabahno and Amiri [
26], and Yeganeh et al. [
27,
28,
29].
ML structures have rarely been used to construct control charts in the literature. Niaki and Abbasi [
14] developed a perceptron neural network for monitoring and classifying mean shifts in multi-attribute processes. Hosseinifard et al. [
18] developed three ANN control charts for monitoring simple linear profile parameters (the intercept, the slope, and the residual variance). One of their control charts was involved in detection and identification, while the other two were only involved in detection. Mohammadzadeh et al. [
25] developed an SVR (support vector regression) control chart for monitoring a logistic profile by extending Hosseinifard et al.’s [
18] paper. Sabahno and Amiri [
26] developed different statistical and machine-learning-based control charts with fixed and variable design parameters to monitor generalized linear regression profiles. Yeganeh et al. [
27] extended Hosseinifard et al.’s [
18] work for social network surveillance. Yeganeh et al. [
28] proposed an ANN-based control chart to monitor binary surgical outcomes, while in another study, Yeganeh et al. [
29] proposed ML-based control charts for monitoring autocorrelated profiles.
The previously mentioned papers have utilized machine-learning (ML) techniques for regression to construct control charts. While our approach in this paper is similar to theirs, by proposing a special input set for ML structures (our first input scenario), we extend them to address multivariate processes and enable simultaneous monitoring of their parameters. Moreover, while they employed a single ML structure to build their control charts, we utilize multiple machine-learning techniques and extensively compare them in different scenarios. Additionally, all the ML control charts mentioned above solely trained their ML structures using a random shift size to represent out-of-control data. In contrast, we employ an alternative method in addition to this approach.
In this paper, for the first time in the literature, the ANN, SVM (support vector machine), and RF (random forest)-based control charts are proposed for simultaneous monitoring of multivariate normal process parameters. We use two different input scenarios (in one of them, for the first time the two statistics used by Sabahno et al. [
9] are utilized as the inputs) and two different control scenarios (detection and detection–identification). We also use two training methods to see which types of datasets suit each ML structure better (in one we train the ML structures with small shifts and in the other one, for the first time, we train them with both small and large mean shifts). The ML control charts are developed in the cases of two, three, and four quality characteristics, based on which their performances with one another, and with the proposed statistical control charts by Sabahno et al. [
9], are compared.
This paper is structured as follows: In
Section 2, machine-learning control charts are developed. In
Section 3, different models are developed, and then extensive numerical analyses in each scenario under different separate and simultaneous shift sizes are conducted. In
Section 4, an illustrative example is presented. In
Section 5, concluding remarks and suggestions for future research are discussed.
2. Machine-Learning Control Charts
In this section, three ML control charts are proposed. After obtaining the ML structure for each control chart, which is explained later in
Section 3, the upper control limit (UCL) of each control chart is obtained using the after-mentioned algorithm. First note that the design parameters in a control chart are the sample size
n, the sampling interval
t, and the probability of type-I error
.
The monitoring strategy for each ML control chart is as follows:
If at sample i, the ML structure’s output UCL, then the process is in control;
If at sample i, the ML structure’s output UCL, then the process is declared out-of-control.
The following algorithm can be used to compute the UCLs:
- Step 1.
Choose a value for and n;
- Step 2.
Choose and train an ML structure;
- Step 3.
Obtain the initial value of UCL by generating and sorting 10,000 in-control samples in ascending order using the ML structure and choosing the [10,000(1 − )]th value in the range;
- Step 4.
Run 10,000 simulations and adjust the UCL so that the average of 10,000 run lengths (ARL) is .
The best way to estimate the UCL is to employ the above algorithm. However, it might be very time- and energy-consuming, especially if one is conducting many numerical analyses and should estimate many UCLs for different parameter settings. An easier way with almost the same accuracy is to obtain the value of the UCL by generating and sorting 10,000 in-control samples in ascending order and then choosing the [10,000(1 − )]th value in the range, repeating this 10,000 times, and finally taking the average of these 10,000 values. This proposed approach works very well in memory-less schemes in most cases; however, it is better to confirm its result using step 4 of the above algorithm.
Instead of using a statistic and developing a control chart using it, one can use supervised ML techniques. As in this research, we only need linear (continuous) outputs from the ML techniques (because, as mentioned before, we then apply a classification method to their linear outputs); the types of ML techniques that can be used are limited because not all ML techniques are capable of regression. However, fortunately, the most popular ML techniques, namely ANN, SVM, and RF, are also capable of generating linear (regression) outputs.
In what follows in the next subsections, each of the ML techniques is shortly described. It should also be noted that after the ML structures are trained, they are tested and validated by achieving the desired in-control performance of the control chart. This is explained later in
Section 3.
2.1. ANN Control Chart
ANN is one of the most popular ML techniques that mimics human brain activity. It has one input layer, a different number of hidden layers, and one output layer, and each layer contains at least one node. ANNs with more than one hidden layer are called deep-learning techniques. ANNs can be used for both classification and regression. Determining the number of nodes in each layer and the number of hidden layers is very important in ANNs, which can be carried out using the trial-and-error method. For simple problems, one hidden layer usually works best. Regarding the number of nodes in the hidden layer, although there are some rules of thumb, there is no solid rule in this regard, and the number of nodes should be determined to obtain the desired performance.
Another problem in ANNs is determining the optimal values of the connection weights and node biases. There are several optimization algorithms used in ANNs for this purpose, among which are gradient descent, stochastic gradient descent, mini-batch gradient descent, Broyden–Fletcher–Goldfarb–Shanno (BFGS), momentum, Nesterov accelerated gradient, Adagrad, AdaDelta, and Adam. In this research, we use the BFGS optimization method, which is a variant of the gradient descent method. The reason for this is simply that it is the only optimization method that our selected R package (‘nnet’ package) uses (for more information about the computer package, refer to
Section 3). The BFGS method overcomes some of the weaknesses of the gradient descent method by seeking a stationary point of the cost function; the cost function, in this case, is the mean-squared error (MSE) for regression problems and the cross-entropy (negative log-likelihood) for classification problems.
In this paper, the ANN technique is used for regression, and the trained ANN’s structure’s output for sample
i is
ANNi. Different input scenarios, training methods, and output (control) scenarios that we use in this research for each ANN structure are explained in
Section 3.
2.2. SVR Control Chart
SVM is a kernel-based method and is a powerful classification and regression technique. It can be used for classification, novelty detection (one-class classification), and regression (called SVR in this case). By ignoring errors that are smaller than a certain threshold, and creating a tube around the true output, SVM can perform regression.
SVMs usually work in two main steps: (i) transformation of the input space to a higher-dimensional feature space through a non-linear mapping function; and (ii) construction of the separating hyperplane with maximum distance from the closest points (called support vectors) of the training set. It has been shown that maximizing the margin of separation improves the generalization ability of the classifier/regressor. Training of an SVM is finding the solution for a quadratic optimization problem.
In SVMs, the calculation of dot products in a high-dimensional space can be avoided by introducing a kernel function, which allows all the necessary computations to be performed directly in the input space. The most popular kernel functions are linear, polynomial, radial basis, and sigmoid. In this research, we use an R package (‘e1071’ package, which is explained more in
Section 3) for training and optimization, and we do not apply any optimization algorithms on our own to find the optimal values of the SVM structure’s parameters. However, there are several optimization algorithms for SVMs, some of which are as follows: sequential minimization optimization (SMO), trust region Newton method (TRON), and chunking. The computer package we chose uses the SMO algorithm. The SMO algorithm solves the SVM’s quadratic problem without using any numerical optimization steps. The adopted package also implements epsilon-support vector regression (ε-SVR) as the cost function for regression problems. The cost function in ε-SVR aims to minimize the deviations between the predicted values and the actual values using an ε-insensitive loss function. It also implements the hinge loss function, which penalizes misclassifications by introducing a linear error term, in classification problems. To construct a control chart, we use the SVM technique for regression; therefore, it is called SVR.
The same as the ANN case, the trained SVR structure’s output at sample
i is
SVRi, and the different input scenarios, training methods, and output (control) scenarios that we use in this research for each SVR structure are explained in
Section 3.
2.3. RFR Control Chart
Decision trees are tree-structured classifiers/regressors with three types of nodes: (i) the root node, which is the initial node and represents the entire sample and may be split further into more nodes; (ii) the interior nodes that represent the features of a dataset that, with the branches that represent the decision rules, are connected to other nodes; and (iii) the leaf nodes, which represent the outcome. In the regression case (the case we use in this paper), they start with the root of the tree and follow splits based on variable outcomes until a leaf node is reached and a real number-type result is given. Although decision trees work best for classification, they work very well in regression cases as well. The most popular decision tree algorithms are Iterative Dichotomiser 3 (ID3), C4.5, and CART (classification and regression tree).
Random forest (RF) is an ML technique that uses ensemble learning methods for regression or classification. The ensemble learning method is a technique that combines predictions (in the case of the RF technique, by taking the average in a regression case (called RFR in the case of regression) or choosing the class with the maximum number of occurrences in a classification case) from multiple machine-learning algorithms to make a more accurate prediction than a single model (in the RF case, the single model would be a decision tree). An RF technique is powerful and accurate, and it overcomes the over-fitting issue of individual decision trees. It usually performs great on many problems, including features with non-linear relationships. The most popular RF algorithm was introduced by Leo Breiman. It builds a forest of uncorrelated trees using a CART-like procedure. The computer package we use (‘randonForest’ package) employs a CART algorithm and uses the mean-squared error (MSE) as the cost function for the regression problems and the Gini impurity or the cross-entropy as the cost function for the classification problems. The main problem in using RF is choosing the number of trees to be included in the model. Using more trees is not always better as it might unnecessarily and significantly increase the computational times. The best number of trees varies from problem to problem and should be determined to obtain the minimum errors, as well as the desired performance.
The same as before, the trained RFR structure’s output at sample i is RFRi.
3. Model Development and Analysis
In this section, different control–input–training scenarios are modeled for the above three ML control charts, based on which numerical analyses are conducted afterward. We, for the first time, consider several input sets, training methods, and control scenarios to see which works best for each ML structure for building a multivariate control chart for simultaneous monitoring of process parameters. As there are two input scenarios for each ML structure, two different training methods are used for each. The two training methods are (i) training the ML structures with only a certain small shift size to familiarize them with the out-of-control situations or, (ii) training them with a certain small and a certain large shift size. We have the same number of in-control and out-of-control data in both training cases. Note that in all training scenarios, training the ML structures is performed with an output of 0 representing an in-control situation and an output of 1 representing an out-of-control situation.
In the first input scenario, the two statistics employed by Sabahno et al. [
9] are used. They used them in their statistical structure to develop a multivariate control chart for simultaneous monitoring of the process parameters. We, however, use these two statistics for the inputs of the ML structures. By doing so, we easily enable the ML structures to consider multivariate processes without adding complexity. These statistics are Hotelling
T2 and
W.
When the in-control process parameters are known (
and
), the Hotelling
T2 statistic is evaluated for each sample
i to monitor the process mean vector (
) as follows:
where
n is the sample size and
is the sample’s mean.
Regarding the process variability (
), they used the following statistic:
where
Si is the sample’s variance–covariance matrix and
p is the number of quality characteristics.
In the second input scenario, however, we use all the elements of the sample mean vector and variance–covariance matrix as the inputs.
For example, in the case of two quality characteristics, we have five inputs as follows:
, , , , and (covariance), where , and .
Note that according to the above paragraphs, as the process dimension (the number of quality characteristics) increases, the number of inputs in the second input scenario also increases. However, the number of inputs in the first input scenario is always two (T2 and W). This makes it easier and more efficient to use the first input scenario in higher dimensions.
We also consider two types of control schemes in this paper. In the first type, the goal is only the detection of the out-of-control situation, and in the other type, the goal is to identify the responsible process parameters as well. In the first one, we only have one chart/output with which we determine whether the process is in or out of control. In the second one, we have several charts/outputs to identify the responsible variables, as well as process parameters, for the out-of-control situation. In summary, two input scenarios, two training scenarios, and two control scenarios are involved.
Moreover, three different numbers of quality characteristics are considered, i.e., p = 2, p = 3, and p = 4. However, to reduce the paper’s size, we only consider the p = 3 case with the first input scenario and the p = 4 case with the first input scenario and the first control scenario (only detection). It should also be noted that as the process dimension (the number of variables) increases, the amount of data and diversity of the dataset increases as well in both training methods. In addition, since using the second input scenario is directly affected by the number of variables, increasing the number of variables increases the number of inputs in this scenario. In addition, as the dimension increases, the number of control charts using the second control scenario increases as well, and that could increase the false-alarm rates.
The ARL (average run length) and SDRL (standard deviation of run length) are used in this paper to measure each chart’s performance, but in this section, we only make comparisons based on the ARL. The ARL is the average number of runs before an out-of-control signal is detected by a control chart. A larger ARL value is preferred when the process is in control and as low as possible when out of control.
The ‘svm’ function of the ‘e1071′ package in R is employed to train the SVR structures, the ‘nnet’ package for the ANN structures, and the ‘randomForrest’ package for the RFR structures. In general, we try to use the default hyperparameters in each package, but those we had to change are explained separately in each scenario. However, one thing we had to change in all these packages is the output type, and we changed it from a classification output (the default set) to a linear (regression) output.
Note that most of the mean shifts and variation shifts in both
p = 2 and 3 cases are chosen similarly to those of Sabahno et al. [
9] to be able to compare the proposed control charts in each scenario with their proposed statistical charts. To avoid adding additional tables for the comparisons, we do not repeat their tables here and refer readers to see their work (the ARLs in their Table 4 for
p = 2 case and in their Table 5 for
p = 3 case are subjects of comparisons). Therefore, we only include the comparisons’ results in this paper. Moreover, similar to Sabahno et al. [
9], the sample size we use in this paper is 10 (
n = 10).
3.1. Scenario a: Control Charts for Detection
In scenario a, the cases in which only signal detection is important are investigated. Since there is only one control chart involved, assuming that the probability of type-I error is equal to 0.005, the performance measure is computed as ARL = = 200.
3.1.1. Scenario a1 (Control Type a, Input Set 1)
As mentioned before, only two inputs are considered for each ML control chart in this scenario, and they are T2 and W statistics.
Scenario a11 (Control Type a, Input Set 1, Training Method 1)
In this section, the shift size we select for the mean shifts is 0.2, and for the variance and covariance shifts, it is 1.2 times the in-control values. We consider 250 data with only shifted; 250 data with only shifted; 250 data with both means shifted; 250 data with only shifted; 250 data with only shifted; 250 data with both and shifted; 250 data with only covariance shifted; 250 data with , , and covariance shifted; and 250 data with all the parameters shifted together. For the in-control dataset, the same amount of data, i.e., 9 250 = 2250 data, are included. We train each ML structure with these in and out-of-control datasets.
In this scenario, a linear kernel is used for the SVR structure and the RMSE is computed as 0.51. For the ANN structure, four nodes (twice the number of inputs) in the hidden layer are used and the trained ANN structure has an RMSE of 0.49. For the random forest package, 100 trees are used with an RMSE of 0.53. The UCLs of the ANN, SVR, and RFR control charts are computed as 0.7073, 1.016, and 0.920144, respectively. The result of this analysis, which is reported in
Table 1, shows that the SVR and ANN charts perform better than the RFR chart. In general, as the mean shift increases, the SVR chart performs better, and as the variation shift increases, the ANN chart performs better. Another interesting result is that, as the shift size increases more than the values that the ML structure is trained with, the RFR chart performs worse, such that under the mean shift of size 2, it is not even able to detect the shift at all. However, this phenomenon does not happen in ANN and SVR charts.
Moreover, by comparing the results of
Table 1 to Sabahno et al.’s [
9] Table 4, one can see that in all cases, at least one of the ML control charts performs better than all their proposed control charts (fixed parameters and adaptive ones). Although we do not use any adaptive strategies, the proposed ML charts perform even better (much better in most cases) than their adaptive control charts.
In the case of three quality characteristics (
p = 3), for the out-of-control dataset, we consider 150 data with only
shifted; 150 data with only
shifted; 150 data with only
shifted; 150 data with both
and
shifted; 150 data with both
and
shifted; 150 data with both
and
shifted; 150 data with only
shifted; 150 data with only
shifted; 150 data with only
shifted; 150 data with both
and
shifted; 150 data with both
and
shifted; 150 data with both
and
shifted; 150 data with only covariance shifted; 150 data with all three variances and covariance shifted; and 150 data with all the parameters shifted together. For the in-control dataset, we again include the same amount of data, which in this case is 2250 data. The same hyperparameters as in the case of
p = 2 are used here as well for the ML structures. The RMSE of the ANN, SVR, and RFR structures are computed as 0.49, 0.51, and 0.53, respectively, and the computed UCLs are 0.724, 0.9955, and 0.9305, respectively. The results in
Table 2 show that the RFR chart mostly performs better in higher dimensions (compared with
Table 1), especially under large mean shifts, in which the deterioration in performance under shifts larger than the trained value is much less noticeable. For the ANN and SVR charts, it is kind of mediocre (in some cases they perform better, in some they perform worse); however, they perform better than the RFR chart even in higher dimensions. The other conclusions derived for the case of
p = 2 are valid here as well.
Moreover, by comparing the results in
Table 2 to Sabahno et al.’s [
9] Table 5, it is evident that in most compared cases, at least one of the proposed ML control charts performs better than all their proposed control charts (fixed design parameters and adaptive ones). Only in the case of (0.5, 0.8, 0.5) mean shift, together with no/small variation shift, does at least one of their proposed charts perform a little bit better than the best of the three proposed charts in this paper.
In the case of four quality characteristics (p = 4), for the out-of-control dataset, we consider 125 data with only shifted; 125 data with only shifted; 125 data with only shifted; 125 data with only shifted; 125 data with andshifted; 125 data with and shifted; 125 data with and shifted; 125 data with andshifted; 125 data with and shifted; 125 data with andshifted; 125 data with and shifted; 125 data with, and shifted; 125 data with and shifted; 125 data with and shifted; 125 data with only shifted; 125 data with only shifted; 125 data with only shifted; 125 data with only shifted; 125 data with and shifted; 125 data with and shifted; 125 data with and shifted; 125 data with and shifted; 125 data with and shifted; 125 data with and shifted; 125 data with , and shifted; 125 data with , and shifted; 125 data with , and shifted; 125 data with , and shifted; 125 data with only covariance shifted; 125 data with all four variances and covariance shifted; and 125 data with all the parameters shifted together. For the in-control dataset, we again include the same amount of data, which in this case is 3875 data. The same hyperparameters as in the case of p = 2 are used here as well for the ML structures. The RMSE of the ANN, SVR, and RFR structures are computed as 0.49, 0.51, and 0.52, respectively, and the computed UCLs are 0.7164, 1.0439, and 0.9332, respectively.
The results in
Table 3 show that the RFR chart mostly performs better again in higher dimensions (compared with both
Table 1 and
Table 2). For the ANN and SVR charts, it is again kind of mediocre (in some cases they perform better, in some they perform worse); however, they still perform better than the RFR chart even in higher dimensions. The other conclusions derived for the case of
p = 2 are valid here as well.
Scenario a12 (Control Type a, Input Set 1, Training Method 2)
In this scenario, we use the same input set as before, but with the second training method (trained with both small and large shifts). Here, for the out-of-control dataset, the shift size for the small mean shifts is 0.2; for the large mean shifts, it is 1; for the small variance and covariance shifts, it is 1.2 times the in-control values; and for the large variance and covariance shifts, it is 2 times the in-control values. For the small shifts, the following are considered: 150 data with only
shifted; 150 data with only
shifted; 150 data with both means shifted; 150 data with only
shifted; 150 data with only
shifted; 150 data with both variances shifted; 150 data with only covariance shifted; 150 data with both variances and covariance shifted; and 150 data with all the parameters shifted together. Similarly, for the large mean shifts, we have 150 data with only
shifted; 150 data with only
shifted; 150 data with both means shifted; 150 data with only
shifted; 150 data with only
shifted; 150 data with both variances shifted; 150 data with only covariance shifted; 150 data with both varianbces and covariance shifted; and 150 data with all the parameters shifted together. The total number of out-of-control data in this scenario is 2700; therefore, the same number of in-control data is used. For all the ML structures, the same hyperparameters as in the previous case are used. The RMSEs for the ANN, SVR, and RFR structures are 0.44, 0.53, and 0.47, respectively. The UCLs of ANN, SVR, and RFR control charts are computed as 0.8658, 0.5958, and 0.9729, respectively. The results for this case are reported in
Table 4.
The results in
Table 4 show that the ANN and SVR charts perform mostly better with the second training method, only in the cases of no/small variation shifts. On the contrary, this training method suits the RFR chart the most, considering it performs better than the previous training method in all the cases. In terms of which chart among the three performs better in this scenario, despite the significant improvements in the performance of the RFR chart, it still cannot outperform the ANN and SVR charts, as the ANN chart performs better in most compared cases, and in the rest (no or small variation shifts), despite the loss of performance in this training method, the SVR chart still performs better than the others. In addition, unlike the previous training method, the RFR chart does not experience a deterioration in performance as the shift size becomes larger than the trained value.
Moreover, by comparing the results of
Table 4 to Sabahno et al.’s [
9] Table 4, one can again see that in all the cases, at least one of the proposed ML control charts performs better (mostly much better) than all their charts.
In the case of
p = 3, for the out-of-control dataset, the following are considered: 125 data with
shifted 0.2 and 125 data shifted 1; 125 data with
shifted 0.2 and 125 data shifted 1; 125 data with only
shifted 0.2 and 125 data shifted 1; 125 data with
and
shifted 0.2 and 125 data shifted 1; 125 data with
and
shifted 0.2 and 125 data shifted 1; 125 data with
and
shifted 0.2 and 125 data shifted 1; 125 data with
shifted 1.2 times and 125 data shifted 2 times; 125 data with
shifted 1.2 times and 125 data shifted 2 times; 125 data with
shifted 1.2 times and 125 data shifted 2 times; 125 data with
and
shifted 1.2 times and 125 data shifted 2 times; 125 data with
and
shifted 1.2 times and 125 data shifted 2 times; 125 data with
and
shifted 1.2 times and 125 data shifted 2 times; 125 data with only covariance shifted 1.2 times and 125 data shifted 2 times; 125 data with all three variances and covariance shifted 1.2 times and 125 data shifted 2 times; 125 data with all the parameters shifted together with small shift sizes; and 125 data shifted together with large shift sizes. For the in-control dataset, we include the same amount of data, which in this scenario is 3750 data. By keeping the previous hyperparameters, the RMSE of the ANN, SVR, and RFR structures are 0.44, 0.52, and 0.5, respectively. The UCLs are computed as 0.8635, 0.5781, and 0.9783, respectively. The results are presented in
Table 5.
This table shows that again under no/small variation shifts, the SVR chart mostly performs better than the other two, but as the variation shift increases, the other two begin to perform better than the SVR chart, with the ANN chart having the best performance. By comparing
Table 2 and
Table 5 (the first and second training methods), the conclusions driven for the
p = 2 case (comparing
Table 1 and
Table 4) can be driven for the
p = 3 case as well. By comparing the cases of
p = 3 and
p = 2 (
Table 4 and
Table 5), we realize that the conclusions are similar to those of Scenario a11.
Moreover, by comparing the results of
Table 5 to Sabahno et al.’s [
9] Table 5, we can see that with the second training method, at least one of the ML control charts performs better than all their control charts, in all the shift cases.
In the case of four quality characteristics (p = 4), for the out-of-control dataset, we consider 100 data with only shifted 0.2 and 100 data shifted 1; 100 data with only shifted 0.2 and 100 data shifted 1; 100 data with only shifted 0.2 and 100 data shifted 1; 100 data with only shifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data with andshifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data withand shifted 0.2 and 100 data shifted 1; 100 data with and shifted 0.2 and 100 data shifted 1; 100 data with andshifted 0.2 and 100 data shifted 1; 100 data and shifted 0.2 and 100 data shifted 1; 100 data with only shifted 1.2 times and 100 data shifted 2 times; 100 data with only shifted 1.2 times and 100 data shifted 2 times; 100 data with only shifted 1.2 times and 100 data shifted 2 times; 100 data with only shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with and shifted 1.2 times and 100 data shifted 2 times; 100 data with , and shifted 1.2 times and 100 data shifted 2 times; 100 data with , and shifted 1.2 times and 100 data shifted 2 times; 100 data with , and shifted 1.2 times and 100 data shifted 2 times; 100 data with , and shifted 1.2 times and 100 data shifted 2 times; 100 data with only covariance shifted 1.2 times and 100 data shifted 2 times; 100 data with all four variances and the covariance shifted 1.2 times and 100 data shifted 2 times; and 100 data with all the parameters shifted together with small shift sizes and 100 data shifted together with large shift sizes. For the in-control dataset, we include the same amount of data, which in this scenario is 6200 data.
By keeping the previous hyperparameters, the RMSE of the ANN, SVR, and RFR structures are 0.43, 0.53, and 0.46, respectively. The UCLs are computed as 0.856, 0.5108, and 0.9735, respectively. The results are presented in
Table 6.
This table shows that again under no/small variation shifts, the SVR chart mostly performs better than the other two, but as the variation shift increases, the other two begin to perform better than the SVR chart, with the ANN chart having the best performance. By comparing
Table 3 and
Table 6 (first and second training methods), the conclusions driven for the
p = 2 case (comparing
Table 1 and
Table 4) can be driven for the
p = 4 case as well. The results in
Table 6 show that all the charts mostly perform better in higher dimensions when the second training method is used.
3.1.2. Scenario a2 (Control Type a, Input Set 2)
As mentioned before, different inputs for the ML structures are considered in this scenario. The sample’s mean vector and variance–covariance matrix are used and in the case of p = 2, there are going to be five inputs (individual variances, means, and covariance) as described in the following subsections.
3.2. Scenario b: Control Charts for Detection and Identification
In scenario b, several ML structures are involved, and consequently, so are several control charts in each control scheme. Each process parameter has its output (control chart), and if the corresponding chart signals, it means that this parameter (and consequently the variable associated with it) has shifted. However, as the number of quality characteristics increases, the number of control charts (outputs) also increases in this scenario. This might increase the false-alarm rates. As such, this scenario should be used with more caution, especially in larger dimensions. In addition, even before conducting any numerical analysis (next subsections), overall, worse performance compared to scenario a is expected in this scenario because more than one control chart is being monitored together. Having said that, the advantage of this scenario is not its performance, but helping one to identify the responsible variable parameter by allowing a little bit of performance to be sacrificed.
In the case of p = 2, since we have five process parameters, namely , and , five ML structures, each with its own control chart, are therefore required. These charts are monitored together and obtaining a signal from either one of them means that the process is out of control. Before we start developing control charts for each scenario, we should mention that it was more difficult to choose suitable hyperparameters for this scenario to obtain the desired ARL performance, especially in the case of p = 3. Therefore, we tried many combinations of hyperparameter values for the ML structures to obtain the desired overall performance. The desired performance, in this case, is computed (assuming the independency of control charts as well as equality of their type-I error probability) as follows: , where m is the number of control charts, and ARL = . Consequently, for the case of p = 2, in which a maximum of five control charts is required (m = 5), by using the above formulae, we have . This means that each control chart’s performance is , but they all together should have a performance of . Similarly, for the case of p = 3, individual s are computed as 0.000716, which results in ARL =. Note that, for simplicity, we assume that all the covariances are equal in the case of p = 3. Therefore, we only have one control chart for monitoring the covariance, making a total of seven control charts (m = 7, i.e., three to monitor the means, three to monitor the variances, and one to monitor the covariance).
3.2.1. Scenario b1 (Control Type b, Input Set 1)
Similar to Scenario a1, the inputs, in this case, are T2 and W statistics. Again, two different training methods are applied on each control chart (one trained with only small shifts and the other one trained with small and large shifts).
Scenario b11 (Control Type b, Input Set 1, Training Method 1)
In this scenario, the ML structures are trained with only small shift sizes. First, we generate 1000 in-control data that are going to be used in all control charts. Second, we generate 1000 out-of-control data with 0.2 shifts in and use it for the control chart that monitors ; 1000 out-of-control data with 0.2 shifts in and use it for the control chart that monitors ; 1000 out-of-control data with shifts of 1.2 times the in-control and use it for the control chart that monitors ; 1000 out-of-control data with shifts of 1.2 times the in-control and use it for the control chart that monitors ; and 1000 out-of-control data with shifts of 1.2 times the in-control and use it for the control chart that monitors . Now that we have five datasets, we need to train five different ML structures. For the ANN scheme, for monitoring , and , we use 5, 5, 5, 5, and 4 nodes in the hidden layers, respectively, to obtain an overall performance of 200. The RMSE of these ANN structures, respectively, are 0.49, 0.49, 0.48,0.48, and 0.49. Finally, the UCLs are, respectively, computed as 1.0292, 1.1628, 0.8846, 1.048, and 0.8393.
For the SVR scheme, the major difference was that the linear kernel did not work for all the structures in this scenario, and we had to try the radial kernel as well (other kernel types did not work at all). The kernel types we used for , and control charts are, respectively, radial, radial, radial, radial, and linear. The RMSEs and UCLs are as follows. The RMSEs are 0.52, 0.52, 0.52, 0.53, and 0.51, and the UCLs are 1.1174, 1.1204, 1.1089, 1.0972, and 1.283. Regarding the RFR scheme, 100 trees still worked well for each structure in this scenario. The RMSEs and UCLs for the RFR charts are computed as follows. The RMSEs are 0.53, 0.54, 0.52, 0.52, and 0.52, and the UCLs are 0.9576, 0.9409, 0.9597, 0.9681, and 0.9433.
According to the reported results in
Table 9, all the control charts mostly experience a deterioration in performance compared to when only one control chart is used (
Table 1). However, the RFR scheme mostly performs better in this control case when the mean shift is large. The worst deterioration in performance can be seen in the SVR scheme. It even experiences a deterioration in performance under shifts larger than the trained value in this scenario (unlike
Table 1). The RFR scheme, however, experiences that effect only when the variation shift size is very large in this scenario. Out of these three, the ANN is the best-performing scheme. Its performance is even very close to that of being reported in
Table 1 (used only for detection).
Moreover, by comparing the results of
Table 9 to Sabahno et al.’s [
9] Table 4, one can still see that in most cases (except mostly in the case of (0.5, 0.8) mean shift together with no/small variation shift), at least one of the proposed ML control charts performs better than all their proposed control charts, even though their control charts are only designed for detection. In the case of three quality characteristics, the construction of the dataset is the same as the case of
p = 2, except here we add the following: 1000 out-of-control data with 0.2 shifts in
and use it for the control chart that monitors
and 1000 out-of-control data with shifts 1.2 times the in-control
and use it for the control chart that monitors
. For the ANN scheme, for monitoring
,
and the covariance, we use 5, 5, 5, 5, 5, 5, and 4 nodes in the hidden layers, respectively, with RMSEs of 0.49, 0.49, 0.49, 0.49, 0.49, and 0.48, and UCLs of 0.9169, 1.2852, 0.9164, 0.822, 0.8787, 0.8193, and 0.7305. For the SVR scheme, the used kernels, respectively, are radial, radial, radial, radial, radial, radial, and linear. The RMSEs are 0.51, 0.51, 0.51, 0.52, 0.52, 0.52, and 0.52, and the UCLs are 1.1506, 1.0763, 1.0906, 1.0908, 1.1556, 1.0914, and 1.6437. Regarding the RFR scheme, the least numbers of trees that we had to use for each structure (to obtain the overall ARL of 200), respectively, 100, 100, 100, 500, 100, 300, and 100. The RMSEs are 0.53, 0.53, 0.53, 0.53, 0.52, 0.53, and 0.52, and the UCLs are 0.9485, 0.9585, 0.9474, 0.9577, 0.9647, 0.9647, and 0.9677. The results of this analysis are reported in
Table 10.
According to the results in
Table 10, the ANN chart performs better than the others. By comparing
Table 2 and
Table 10, one can conclude that when the identification is a goal as well, all the schemes perform worse. Also, both the SVR and RFR schemes experience significant deterioration in performance as the shift size becomes larger than the trained value in this scenario. By comparing
Table 9 and
Table 10 (
p = 2 and
p = 3 cases), we realize that the ANN scheme mostly performs worse in higher dimensions and the SVR scheme mostly performs better (except under no variation shift). Regarding the RFR scheme, only under small mean shifts and very large variation shifts, it mostly performs better in higher dimensions.
Moreover, by comparing the results in
Table 10 to Sabahno et al.’s [
9] Table 5, we can see that the number of cases in which at least one of our control charts performs better than all their charts is almost equal to the cases that at least one of their proposed charts performs better than ours. Once again, their charts are designed for detection, and in this scenario, we are looking for both detection and identification; therefore, the performance deterioration is normal.
Scenario b12 (Control Type b, Input Set 1, Training Method 2)
In this scenario, the ML structures are trained with both small and large shift sizes. Here again, 1000 in-control data are generated to be used in all control charts. Regarding the out-of-control dataset, we generate 500 out-of-control data with 0.2 shifts and 500 out-of-control data with 1 shift in and use it for the control chart that monitors ; 500 out-of-control data with 0.2 shifts in and 500 out-of-control data with 1 shift in and use it for the control chart that monitors ; 500 out-of-control data with shifts 1.2 times that of the in-control and 500 out-of-control data with shifts 2 times that of the in-control and use it for the control chart that monitors ; 500 out-of-control data with shifts 1.2 times that of the in-control and 500 out-of-control data with shifts 2 times that of the in-control and use it for the control chart that monitors ; and finally, 500 out-of-control data with shifts 1.2 times that of the in-control and 500 out-of-control data with shifts 2 times that of the in-control and use it for the control chart that monitors. As five datasets are considered, five different ML structures for each scheme are required.
For the ANN scheme, for , and , we use five nodes in all the hidden layers. The RMSE of these ANN structures, respectively, are 0.42, 0.42, 0.46, 0.46, and 0.43. Finally, the UCLs are computed as 1.1086, 0.9762, 0.9519, 0.985, and 1.007. For the SVR scheme, we use the radial kernel for all the structures. The RMSEs and UCLs are as follows. The RMSEs are 0.45, 0.44, 0.49, 0.49, and 0.46, and the UCLs are 1.0793, 1.1018, 1.0266, 1.0194, and 1.0103. Regarding the RFR scheme, 100 trees worked well for each structure. The RMSEs and UCLs for the RFR charts are as follows. The RMSEs are 0.45, 0.45, 0.49, 0.49, and 0.45, and the UCLs are 0.9998, 0.9998, 0.9937, 0.9961, and 0.999.
The results in
Table 11 show that in most cases, the ANN scheme performs better than the other charts. However, under no variation shift, and small variation shifts when the mean shift is also small, the RFR scheme performs better. By comparing the results of this table with its equivalent scenario when only detection was the goal (comparing
Table 4 and
Table 11), one can see the performance deterioration in all the schemes when identification is added to the goal as well. However, again same as in the previous case (Scenario b11), deterioration is at its lowest for the ANN chart. In addition, by comparing the results of two training methods (Scenario b11/
Table 9 and Scenario b12/
Table 11), one can see that the SVR and RFR schemes benefit from the second training method, with the RFR benefiting the most so that we do not see any deterioration in performance under shifts larger than the trained value. The ANN scheme, however, experiences a slight deterioration in performance in most cases.
Moreover, by comparing the results in
Table 11 to Sabahno et al.’s [
9] Table 4, we realize that although more cases are added to the ones in which at least one of their proposed control charts performs better, we can still see that the cases in which at least one of the proposed ML schemes performs better than all their proposed control charts are more, even though their control charts are only designed for detection.
In the case of three quality characteristics, the construction of the dataset is similar to the p = 2 case, with the only difference being that since two more process parameters ( and /charts are added in the case of p = 3, 500 out-of-control data with 0.2 shifts and 500 out-of-control data with 1 shift in are generated to be used for the control chart that monitors, and 500 out-of-control data with shifts 1.2 times that of the in-control and 500 out-of-control data with shifts 2 times that of the in-control are considered to be used for the control chart that monitors.
For the ANN control charts to monitor , , and the covariance, five nodes are utilized in each hidden layer, with RMSEs of 0.38, 0.38, 0.38, 0.45, 0.44, 0.44, and 0.27 as well as UCLs of 0.913, 0.9082, 0.96518, 0.9581, 0.9339, 0.9797, and 0.9951. For the SVR control charts, the used kernels are all radial. The RMSEs and UCLs are as follows. The RMSEs are 0.39, 0.39, 0.39, 0.48, 0.48, 0.48, and 0.29, and the UCLs are 1.0722, 1.0839, 1.1073, 1.0892, 1.0128, 1.0823, and 1.0281. Regarding the RFR control charts, the least numbers of trees that we had to use for each structure (to obtain the overall ARL of 200) are 100, 100, 100, 500, 300, 300, and 100. The RMSEs are 0.4, 0.4, 0.4, 0.45, 0.46, 0.46, and 0.29, and the UCLs are 0.7026, 0.7052, 0.7254, 0.3872, 0.4453, 0.4077, and 0.8023. Note that for the RFR structures in this scenario, for the first time, we had to activate a feature that the ‘randomForest’ package offers, and it is called ‘corr.bias’, which performs bias correction for the regression model. Without that feature being activated, we were not able to obtain the desired performance, no matter how many trees we tried.
The result of this analysis is reported in
Table 12. This table shows that in this case, the RFR scheme performs better than the other two in all the shift cases. By comparing this case with the first training method (
Table 10 and
Table 12), we realize that the RFR scheme performs better with the second training method, and on the contrary, the ANN and SVR schemes mostly perform worse. Comparing the
p = 3 and
p = 2 cases (
Table 11 and
Table 12), the RFR scheme mostly performs better in higher dimensions. On the other hand, the ANN and SVR charts mostly perform worse in higher dimensions (however, under very large mean shifts, the SVR scheme performs better). The highest deterioration in performance belongs to the ANN scheme. By comparing two process control scenarios (
Table 5 and
Table 12), one can see that all the schemes experience deterioration in performance, more so for the SVR scheme, as unlike the previous control case, it experiences a significant deterioration in performance as the shift size becomes larger than the trained value.
Moreover, by comparing the results in
Table 12 to Sabahno et al.’s [
9] Table 5, we can see that in the cases of large mean shifts and/or large variation shifts, at least one of their proposed charts performs better than all our proposed charts. Otherwise, at least one of our charts performs better than all their charts.
3.2.2. Scenario b2 (Control Type b, Input Set 2)
The individual elements of each sample’s mean vector and variance–covariance matrix are considered as the inputs in this scenario.
4. An Illustrative Example
A real case originally discussed by Hawkins and Maboudou-Tchao [
2] regarding a healthcare process for monitoring blood pressure and heart rate is used in this section to illustrate the application of the proposed ML schemes. The main indicators, in this case, are heart attack and stroke. The quality characteristics are
x1 = systolic blood pressure,
x2 = diastolic blood pressure, and
x3 = heart rate. They follow a multivariate normal distribution with the following in-control parameter values.
To identify the quality characteristic and the process parameter responsible for the chart signal, the second control scenario is used for this practical case. Based on the results of the numerical analyses section, the first training method for the ANN and SVR schemes and the second training method for the RFR scheme (because the results showed that the RFR scheme performs better with the second training method) are utilized. The mean shift size used for training is 0.2
(note that in the numerical analyses section, since all the standard deviations were equal to 1, we simply used 0.2
1 = 0.2) for small shifts, and similarly, it is 1
for large shifts. The shifted variances used for training are as they were in
Section 3 (because in both cases, they are multiplied by a coefficient). In addition, we assume that detection of the covariance shift is not a priority for the quality system; therefore, we only consider six control charts for monitoring
,
and
. The UCLs are computed using the proposed algorithm in
Section 2 with
= 0.005 and
n = 10. Also, the same R packages as in the simulations study section are used in this section. Similarly, the only changes we carried out in those packages’ default settings were changing the output type to regression and changing the number of trees in the RFR scheme, the number of nodes in the hidden layer in the ANN scheme, and the kernel type in the SVR scheme.
For the SVR scheme, radial kernels are used in all the structures. The RMSEs are 0.47,0.47, 0.51, 0.44, 0.46, and 0.52. The UCLs 1.0714, 1.0792, 1.0735, 1.0744, 1.0033, and 1.0932. For the ANN scheme, five nodes are used in the hidden layers. The RMSEs are 0.45, 0.45, 0.49, 0.42, 0.44, and 0.49. The UCLs are 0.9934, 1.7028, 1.0451, 0.9771, 0.9736, and 0.8285. For the RFR scheme, each structure’s number of trees is as follows: 100, 100, 100, 300, 100, and 500, respectively. The bias correction feature is only turned on for the last structure. The RMSEs are 0.4, 0.4, 0.43, 0.38, 0.41, and 0.47. The UCLs are 0.9858, 0.98, 0.9999, 0.9986, 0.9993, and 0.9914.
To see how each control chart would react in the case of an out-of-control situation, we make an artificial shift to the process and take ten consecutive samples from the process. To do so, we shift the third mean by 1.1 ( and the third variance by 1.4 (). It should be noted that we use the detection plus identification scenario; therefore, we have six control charts for each of the proposed ML schemes, and if any of those six charts signal in each scheme, we call the process out of control.
The results of ten consecutive random samplings from the process for each control scheme, i.e., SVR, ANN, and RFR, are reported in
Table 15,
Table 16 and
Table 17, respectively. Note that, since there are six control charts involved in each scheme (making it a total of 18 control charts), to reduce the paper size, they are presented in tables.
Regarding the SVM scheme (
Table 15), none of its control charts signal during the first ten consecutive samplings. Regarding the ANN scheme (
Table 16), the ANN6 chart, which is responsible for the detection of
shifts, signals at samples no. 2 and no. 8, and the ANN3 chart, which is responsible for the detection of
shifts, signals at sample no. 10. Regarding the RFR scheme (
Table 17), its RFR1 and RFR3 charts, which are responsible for the detection of
and
shifts, respectively, both signal at sample no. 10. Clearly in the case of RFR1, we received a false alarm from the RFR scheme, considering there is no shift in the first mean.
Note that, this was only a simple example to show how each proposed control scheme can be implemented in practice, and based on only 10 samples, no comparisons can be made. Performance comparisons were the purpose of the previous section.
5. Concluding Remarks
This paper proposed new control charts for simultaneous monitoring of multivariate normal process mean vectors and the variance–covariance matrix. For the first time, machine-learning techniques were used for this purpose. Three used ML techniques are ANN, SVM, and RF. We received linear outputs from these ML structures and then applied control chart rules to decide whether the process is in control or out of control. Two different input sets and two different training methods were employed for the proposed ML structures. In the first input set, two statistics (one representing the process mean vector and the other for the process variability) were employed, and for the second input set, we used each process parameter for each quality characteristic separately as inputs. In the first training method, we only trained the ML structures with a small shift size, and in the other method, a small and a large shift size were considered. We also used two different process control scenarios. In the first scenario, the only goal was the detection of the out-of-control situation, regardless of which variable and process parameter is responsible for it. In the second scenario, on the other hand, other than detection, identifying which variable (s)/process parameter (s) is responsible for the signal was also a goal, which involved several control charts to be monitored together.
For each of these control–input–training scenarios, the ML structures were trained, and control charts were developed. Numerical analyses were performed for the cases of two, three, and four quality characteristics. The results, in general, showed that depending on which control-input-training scenario is used, as well as the number of variables, each of these ML control charts performed better in some cases, and there is no absolute winner among them. However, considering how its decision-making procedure works based on dividing tree branches, the RFR scheme tended to mostly perform better when there were more inputs, more diverse training, and more quality characteristics (higher dimension). However, this did not happen in all cases (except for the diverse training part), and most importantly, even if its performance was improved by all these diversities, it did mean that it should perform better than the ANN and SVR charts (which actually in most cases it did not). It was also concluded that when identification was also a goal, the charts performed worse. However, this deterioration in performance was mostly at its lowest for one ML scheme (it differed based on the scenario).
We also compared the proposed ML charts with some recently developed multivariate statistical control charts with fixed and adaptive chart parameters (designed only for detection). For the case of p = 2, the results showed that in the detection-only scenario and with the first input set, at least one of the proposed ML charts performed better than all their proposed charts in all the shift cases, even though our proposed schemes are all fixed parameters. With the second input set together with the first training method, our proposed charts performed better in most cases, and together with the second training method, they performed better in all cases. Regarding the detection–identification scenario, our proposed ML charts still performed better in more cases, even though their charts have only been designed for detection and that usually by default means better performance. For the case of p = 3, the results showed that in the detection-only scenario, at least one of the ML charts performed better than all their proposed charts in most shift cases, and with the second training method, they performed better in all the shift cases. In the detection–identification scenario, one can say that the cases in which at least one of our proposed schemes performed better than all their charts and vice versa were almost the same. However, keep in mind that, unlike their charts, our proposed charts are also capable of identification. Lastly, an illustrative example based on a healthcare-related practical case was presented to show how the proposed schemes can be implemented in practice.
Highlighting the primary focus of this paper, our investigation centered the utilization of diverse machine-learning techniques in constructing control charts, effectively substituting traditional statistical methods. This exploration involved rigorous testing of different input sets and training methodologies to surpass the performance of statistical control charts. While our study concentrated on specific control charts, along with a limited selection of input sets and training approaches, it presents an opportunity for further exploration of a wide range of control charts, as well as diverse input sets and training methods, thus broadening the horizons of research in this field.
For future developments, one might be interested in trying different input sets, training methods, and even output sets for ML structures. Adding adaptive features to the proposed control charts would also be a major improvement. Since ML control charts have rarely been developed, developing them for many other applications and comparing them to traditional statistical control charts might also be interesting. In particular, since all the developed ML control charts are so far memory-less, developing memory-type ML control charts and comparing them to memory-type statistical control charts might also be interesting for some researchers. In addition, how to train the ML structures in the case of unknown distributions is another challenge that might be worth investigating by some other researchers.