1. Introduction
It is well known that even though many methods for preliminarily checking the normality of distribution, such as box plot, quantile–quantile (Q–Q) plot, histogram or observing the values of empirical skewness and kurtosis, are available. However, the results of those methods are inconclusive or not precise enough [
1,
2,
3]. Considering the importance of the level of certainty with which we can claim the normal distribution of the sample’s characteristic
X, the most formal and precise methods are needed. Normality tests are shown to yield the best results, based on which one can, with a certain level of significance, determine not only if the sample elements fit the normal distribution at all but can also determine the measure of concordance with normal distribution [
1,
4,
5,
6,
7,
8,
9]. Because of those properties, a wide range of normality tests was developed [
1,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17].
The next challenge was determining which test satisfies as many of the criteria for statistical tests as possible. For instance, a test primarily has to be powerful. Usually, the power of the test is calculated only through simulations [
2,
5,
6,
9,
10,
11,
12,
13,
14,
15,
16]. Often, so is the distribution of the test statistic [
2,
5,
9]. That is due to conditions for the weak law of large numbers or the central limit theorem not being satisfied. Hence, the problem of determining the distribution of the test statistic remains unsolved [
2,
9,
18,
19,
20]. The next challenge is knowing the correct amount of simulations [
20,
21,
22]. Additionally, the power of the test variates for different groups of alternative distributions (symmetric, asymmetric) and different sample sizes [
9,
10,
11,
12,
13,
14,
15,
16].
On certain occasions, some of the normality tests are more powerful than the other tests, yet, their application seems to be much slower and hard to implement. That diminishes their contribution importance [
1,
3,
6,
10,
11,
17]. Another issue is that many tests seem to have the power that differs for symmetric and asymmetric alternative distributions. We also need to consider some generally less powerful tests because some (such as the Jarque–Bera test or D’Agostino test) are useful [
10,
11]. For instance, overcoming mentioned problems can cause overcoming many other issues.
Currently, the most used normality tests are the Kolmogorov–Smirnov test and the Chi-squared test, followed by the Shapiro–Wilk test and the Anderson–Darling test [
1,
2,
10,
11,
12,
13,
14,
15,
16], even though some other tests are more powerful [
10]. That indicates how important the simplicity of implementation and fast performance are, even for low power tests [
5,
10].
In this paper, our goal is to contribute to this topic by developing a new normality test based on the 3σ rule. We define a new zone function that quantifies the deviation of the empirical distribution function (EDF) from the cumulative distribution function (CDF) for the obtained sample’s characteristic. The test statistic is the mean of the values of the zone function with sample elements as arguments.
In [
23,
24], we developed new Shewhart-type control charts [
25,
26] based on the 3
σ rule. The basic idea is to use the empirical distribution function for the means of all the samples to be controlled. We form control lines by using quantiles of those means for normal
distribution. That is due to a normal distribution
being assumed and used in quality control and central limit theorem [
24,
26].
Using the same principle in individual analyzing samples, the same control chart is used for a preliminary analysis of the normality of the referent sample distribution. Here the variance of the distribution is
σ2 Defining the proper statistic through adequate function for quantifying the level of sample deviation from the normal distribution based on the control chart zones enables us to do the above-mentioned [
23].
These were our first steps in the subject that brought us to the idea of developing a new test of normality. That consists of modifying some ideas in [
23,
24].
In this paper, we define the zone function given in [
23] with a modification that will be applicable in the case where the sample is not “in a control” state since in normality testing, unlike in quality control, some outliers do not essentially mean rejection of the null hypothesis. The outliers do not affect significant change in our test statistic value unless there are many of them. In many other tests, the opposite happens, which causes the rejection of the null hypothesis even when it should not be.
Finally, we provide some main characteristics of the test statistic’s distribution and table for various probabilities and sample sizes and the power analysis with a simulation study. We discuss both cases for known or estimated parameters. We use the sample mean and corrected sample variance, and both are unbiased, reliable, etc. [
2,
5]. Conclusions that make the test statistic better than others rely on our results obtained through Monte Carlo simulations and the results and comparative analysis available for other tests in [
5,
6,
9,
10,
11,
12,
13,
14,
15,
16].
An important notion is that multivariate normality testing is a topic that is still in need of research because of the problems of identifying the proper test in certain circumstances [
1,
27,
28]. Even though many results were obtained [
1,
27], new approaches are being developed and improved [
27,
28,
29,
30]. The test we developed in this paper can be a solid background for continuing the research in multivariate normality testing by extending our solution or investigating some new ones based on similar principles.
4. Comparative Analysis
In this section, we compare the power values of our test to the most used normality tests. We provide average power values for various alternative distributions and sample sizes for mentioned tests and our Quantile-Zone test. Alternative distributions listed in
Table 2 and
Table 3 are the ones for which we calculated the average power values of our test.
The tests we compared our test to are: the Kolmogorov–Smirnov test [
4] with its variant for estimated parameters (Lilliefors test) [
5], Chi-square test [
8], Shapiro–Wilk test [
9] and Anderson–Darling test [
7]. We discuss our test in both variants of given and estimated parameters.
We use the results for other tests approximated by the bisection method based on the ones obtained in [
10]. Note that in [
10], more alternative distributions were being used but were not separated. Instead, the authors provided the average power values. Additionally, we avoided using many alternative distributions that differ from the null distribution so that the histogram would be sufficient for a hypothesis rejection. That could cause an increase in power values for all the discussed tests, i.e., the results improved with big data [
34]. If the identical alternative distributions were in use, the advantage of our test would be even better.
This way, we can see that though our results are not as thorough and precise (in [
10], there were 1,000,000 simulations performed for every alternative distribution), they are still accurate and reliable enough.
We also note that the standard deviation of the power values is smaller than 0.3 for
in all the exposed cases, for
smaller than 0.2, and
smaller than 0.01. Hence, in most cases, changing the alternative distribution will not have a significant effect on the variation of the average power value, especially considering the choice of the alternative distributions and rare exceptions where the empirical power is lower (see the second paragraph of
Section 2.2 and the tables in
Section 2.3—power analysis).
The following tables and figures show the results of the comparison.
As we can see, in both cases of estimated and known parameters, and symmetric and asymmetric alternative distributions, the Quantile-Zone test is the most powerful, even for
(
Table 7 and
Table 8).
For large samples (), the Quantile-Zone test has the same power for both variants of known and estimated parameters.
The average powers for other tests are similar, therefore, choosing the right one could depend on the alternative distribution or the sample size only. In other words, other tests we mentioned could be considered equally powerful.
Therefore, the Quantile-Zone test is the best for normality testing in any circumstance.
All the figures and tables in the power analysis subsections,
Table 7 and
Table 8 and
Figure 8, indicate no consistency issues in our test. Moreover, our test has better consistency properties than other most used tests since the slope of our test’s power function curve approximation is steeper than for the other tests (
Figure 8). Even if that is not the case, higher average power values of our test would be the reason for surpassing the consistency issues.
5. Real Data Example
To control the quantity of protein in milk, we take 48,100 g packages from the production line. Measurements have yielded the results: 3.04, 3.12, 3.12, 3.22, 3.09, 3.13, 3.21, 3.18, 3.10, 3.18, 3.21, 3.18, 3.04, 3.11, 3.17, 3.06, 3.13, 3.12, 3.11, 3.07, 3.15, 3.05, 3.14, 3.18, 3.11, 3.21, 3.22, 3.13, 3.06, 3.07, 3.17, 3.22, 3.05, 3.19, 3.18, 3.20, 3.08, 3.20, 3.21, 3.09, 3.05, 3.14, 3.22, 3.08, 3.19, 3.18, 3.21, 3.06 (in %). The concentration of the protein in milk is usually between three and four percent. We take two examples.
5.1. Known Parameters Case
We assume that the milk packages meet the standard if the protein concentration is distributed by the normal distribution. We shall test this using the Quantile-Zone test.
Calculating the EDF of this sample and plotting the points
, we obtained the results shown in
Figure 9.
Using the results given in
Figure 9 and Formula (2), we achieve
.
For the level of significance
the critical region is
(
Table 1). Since
we reject the null hypothesis, i.e., the protein concentration in milk is not distributed by the normal
distribution.
5.2. Estimated Parameters Case
We assume that the milk packages meet the standard if the protein concentration is distributed by the normal distribution. We shall test this using the Quantile-Zone test.
Calculating the EDF of this sample and plotting the points
, we obtained the results shown in
Figure 10.
Using the results given in
Figure 10 and Formula (2), we achieve
.
For the level of significance
the critical region is
(
Table 4). Since
we reject the null hypothesis, i.e., the protein concentration in milk is not distributed by the normal distribution.
Even though here the EDF is essentially located well, the normality of distribution is not confirmed. That is due to , i.e., 8.3% of the sample elements equals 3.22, which does not satisfy the basic normality properties.