1. Introduction
Many products are formed by mixing two or more ingredients. One or more properties of each product are generally of interest to the manufacturer, who is responsible for mixing the ingredients. In every case, the measured property of the final product depends on the percentages or proportions of the individual ingredients that are present in the formulation [
1]. In the general mixture problem, the measured response is assumed to depend only on the proportions of the ingredients present in the mixture and not on the amount of the mixture. These proportions are connected by a linear constraint of the following type:
In some cases, the proportions of the mixture components may be subject to additional restrictions such as
for one or more components. Experiments with mixtures are generally modeled using Scheffé polynomial models [
2]. The linear Scheffé model has the following form:
Likewise, Scheffé’s quadratic model can be described as follows:
where
and
are unknown parameters that must be estimated, generally using least squares. Due to Equation (1), the form of the quadratic Scheffé model only contains linear terms and cross-product terms.
A general mixture model, in matrix terms, can be presented as or . To estimate the parameters in the β matrix via least squares, the following expression is commonly used , where the covariance matrix is . The vector of fitted values is given by and the residual vector is . The vector is assumed to follow a normal distribution, that is, . In regression analysis, the least squares method is widely employed for parameter estimation due to its simplicity and efficiency under certain conditions. However, several alternative methods offer advantages in specific scenarios. One such method is Maximum Likelihood Estimation (MLE), which estimates parameters by maximizing the likelihood function, provided the data follow a specific probability distribution. MLE is highly applicable and efficient, particularly in complex models or non-normal distributions, though it demands more computational resources compared to the least squares method. Robust regression techniques, including Huber regression and M-estimators, are designed to handle outliers and violations of assumptions such as homoscedasticity. These methods diminish the influence of extreme values on parameter estimates, making them more reliable when outliers are present. Ridge regression is another alternative, particularly useful when multicollinearity exists among the predictor variables. By introducing a regularization term (λ) to the least squares objective function, ridge regression shrinks the regression coefficients, thereby reducing overfitting, a feature especially valuable in high-dimensional datasets.
If an exact linear dependence between the columns of
exist, that is, if there is a set of all non-zero
such that
, then the matrix
has a rank inferior to
(predictor variable), and the inverse of
does not exist. In this case, many software packages would give an error message and would not calculate the inverse. However, if the linear dependence is only approximate, it is
, and then we have the condition usually identified as collinearity or ill-conditioning (some authors use the term multicollinearity) [
3]. In this case, many software packages proceed to calculate
without any signal foreseeing to the potential problem [
4]. When there is the presence of ill-conditioning, computer routines used to calculate
can give erroneous results; this means that the least squares solution may be incorrect. Even if
is correct, the variance of
, given in the diagonal terms in
, can be inflated by ill-conditioning [
5].
Prescott et al. (2002) [
3] have shown that the quadratic Kronecker model is a quadratic model specification that is less susceptible to ill-conditioning compared to the Scheffé model. They concluded that the quadratic Kronecker model always reduces the maximum eigenvalue of the information matrix (
), reducing ill-conditioning. When the presence of collinearity among the terms in the Scheffé model is a possibility, and the appearance of the complete model form is of concern, using the model known as the Slack Variable model makes sense. Reference to the Slack Variable model form appears in [
4,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. The justification for using the Slack Variable model is that it offers a lower degree of collinearity in terms of the adjusted model, which represents less numerical instability in the estimation of the model coefficients [
17,
18]; likewise, it reduces the variances of individual estimated regression coefficients, reduces the correlation between the estimators, and makes the model less dependent on the precise location of the design points [
4] have presented a literature review on Slack Variable models for mixture experiments.
Emphasizing the practical aspects of selecting appropriate mixture designs, numerous techniques have been developed for both the design and analysis of mixture experiments. These advancements focus on constructing and interpreting mixture models as well as analyzing mixture data. While substantial progress has been made in designing experiments for Scheffé models [
19], the optimal design for more general blending models, particularly those extending the Becker class, has not yet been systematically addressed. These models, which account for nonlinear blending effects, are crucial in practical applications where the relationships between response variables, parameters, and mixture factors are more complex. To address this gap, ref. [
20] proposed methods for determining D- and A-optimal designs for general blending models. By reformulating the optimal experimental design problem into an optimization problem—using either Nonlinear Programming or Mixed Integer Nonlinear Programming—their approach is applicable to quadratic and special cubic blending models, including those in the H2 class introduced by Becker. These methods are exemplified through applications in combustion science and fuel property characterization. Furthermore, a new class of models has been introduced to unify and extend existing statistical methodologies for analyzing mixture experiments [
21]. This class integrates the Scheffé and Becker models, thereby broadening the range of mixture component effects that can be captured. These models maintain simplicity when appropriate but are flexible enough to accommodate more complex phenomena as required. The proposed methodology significantly extends traditional approaches, with supplementary material available for further exploration.
Draper and John (1977) [
22] have presented another model alternative that proposes the inclusion of an inverse term. Models that include an inverse term are augmentations of the Scheffé polynomial with the additional terms of the form
, included to account for the possible extreme change in the response as
approaches zero [
22]. Draper and John (1977) [
22] have argued that the use of a model with inverse terms offers a better quality of fit than the Scheffé polynomial model, when an extreme change in the response behavior exists.
When any component of the mixture is equal to zero (
), to be able to include an inverse term such as
in the model, it is necessary to add a small positive quantity to each component of the mixture (
), say
[
22]. Draper and John (1977) [
22] suggested that a general rule for defining the value of
is simply to use a value between 0.02 and 0.05. In addition, they suggest that
. Draper and John (1977) [
22] have mentioned that values of
in that range are suitable for most problems. It is important to highlight that the determination of the value of
is not defined formally, or based on some mathematical criterion. The value of
is completely at the discretion of the researcher, without having any reference or criteria. Therefore, the definition of
, through some formal criteria, is a research opportunity.
Although the model with inverse terms can offer a better fit, in this article, we show that this model has certain disadvantages regarding the conditioning of the information matrix used for estimating the parameters by the least squares method. Derived from the above, a strategy is proposed to minimize the problems derived from the poor conditioning of the information matrix. This strategy consists of determining the value of for which the model presents better numerical stability.
As a result of this research, it was observed that the conditioning of the information matrix, and consequently the numerical stability of the model, varies with different values of . Therefore, it is proposed that the value of should be determined based on the model’s numerical stability. Specifically, the value of should be selected to optimize the conditioning of the information matrix, ensuring the most accurate least squares estimates.
2. Methods
In this section, we discuss various mathematical methods used in modeling mixture experiments, specifically focusing on the limitations and enhancements of the Scheffé polynomial model when dealing with extreme responses as certain components of the mixture approach boundary values (often zero). We also aim to introduce techniques such as pseudo components and how they can improve the numerical stability of the model, as well as provide a method to determine the adequacy of polynomial models in mixture designs.
In
Section 2.1, we discuss how the Scheffé polynomial model can be modified using inverse terms to address extreme changes in the response when one or more components of a mixture approach zero. The equations provided extend the Scheffé model to account for these conditions, if no component value exactly reaches zero but can approach very small positive values. The need to handle these situations arises because the basic Scheffé model struggles when the response shows a drastic change near boundaries.
Section 2.2 introduces the concept of pseudo components, which are used to simplify mixture models when components have lower limits (i.e., boundary constraints). The transformation of components into pseudo components helps to handle extreme changes in responses. The equations provided show how to transform the original component values into pseudo components to account for boundary effects, and this section emphasizes the need for adding a small value (denoted as
) when a component’s value is zero to stabilize the model. Next,
Section 2.3 explains how to assess the adequacy of polynomial models used for mixtures in terms of the numerical stability and collinearity. It describes how to calculate the condition number (CN) of the information matrix to measure the numerical stability, with smaller CN values indicating better model conditioning. Additionally, it proposes the use of the CN to determine the appropriate value of
, which is essential for improving the stability of the pseudo component transformations. Finally, in
Section 2.4, a discussion is presented on alternative problem formulations and regularization methods.
2.1. Mixture Model with Inverse Terms
As mentioned before, the Scheffé model can be re-parameterized; in fact, there are several ways to write a polynomial model of any order using the mixture restriction of Equation (1) [
3].
When there exists an extreme change in the response as the value of a certain component (
tends to a boundary (usually zero), the Scheffé model cannot cope well with this situation [
6]. For modeling extreme changes in the response as the value of some components tends to zero, [
22] proposed the following models:
and
Equations (5) and (6) are a modification of the Scheffé polynomial model with the additional term of the form
. From the above, it is assumed that the value of
never reaches zero; however, the value can be very close to zero, that is,
, where
is some small quantity, which is defined for each application of this model. Likewise, if only a couple of components are as likely to produce extreme changes in the response as
, then only these terms are included in the model [
1].
2.2. Use of Pseudo Components
When lower limit restrictions like those shown below are considered,
for all
and
lower pseudo components (L-pseudo components) are suggested for use in place of components in original units (
) [
1]:
If an extreme change in the response occurs as
approaches the bound
(smallest component value
), we define
, where
, so instead of the previous lower pseudo component definition shown in Equation (7), the transformation is redefined as follows [
1]:
Draper and John (1977) [
22] mentioned that when the value of any component of the mixture is equal to zero (
), then in order to include a term such as
in the model, a small positive amount must be added, say
, to each value of
. This is equivalent to working again with the pseudo components
, where this time, they are defined as follows [
1]:
As mentioned in the Introduction, the value of has not been defined by a formal criterion, leaving it to the discretion of the researcher. In the Results and Discussion, we will describe how to define the value of so that an improvement in the numerical stability of the model is obtained.
2.3. Adequacy of Polynomial Models for Mixtures
Next, we describe the procedure for determining the adequacy of the polynomial model used in terms of the conditioning of the information matrix. In this case, the objective is to determine the presence of collinearity (see the works of Cornell and Gorman (2003) [
23] and Prescott et al. (2002) [
3]).
Let be the eigenvalue of the information matrix (), that is, the solutions to the determinant equation , which is polynomial with roots.
There are several definitions of the conditional number (CN) of a matrix. The definition generally used in statistical applications is the square root of the ratio of the maximum to the minimum eigenvalue of
, denoted by the following [
3]:
(Some authors omit the square root and just take the ratio of the maximum to the minimum eigenvalue). If is close to singular, then is close to 0 and is extremely large. Small values of and large values of indicate the presence of collinearity. A smaller CN indicates more stability (better conditioning) in the least squares estimates.
This article uses the CN to determine the level of numerical stability of the model. Likewise, it is proposed to define the value of based on the CN of the information matrix. That is, the value of which results in the lowest value of CN is the value that will be used for the transformation to pseudo components of the form (9).
When there is no constant column in the
matrix, the adjusted total sum of the squares in the denominator of the multiple correlation coefficient (
must be replaced [
3]. We define
as the
j-th column of
and
as the matrix that results when column
is deleted from
. Then,
is the multiple correlation coefficient obtained by regressing
on
. When the first column of
is a constant column,
is usually calculated, for
as
When there is no constant column in the
matrix, the unadjusted multiple correlation coefficient can be obtained by
2.4. Alternative Problem Formulations and Regularization Methods
While model reformulation, such as the inclusion of inverse terms or the Slack Variable model, can help mitigate collinearity and improve stability, there are additional approaches that can further enhance the conditioning of the Fisher information matrix and ensure more reliable results. One of the primary concerns with mixture models, especially in the presence of ill-conditioning, is the direct inversion of the matrix
, which may fail in the case of near-linear dependencies between the predictor variables. In such cases, reformulating the problem can be highly beneficial. For instance, instead of solving the linear equations directly, regularization methods can be introduced to modify the objective function and reduce the influence of poorly conditioned parameters [
24].
As a first step, ridge regression can be applied to the least squares problem. This method involves adding a penalty term proportional to the square of the regression coefficients to the least squares objective function. The regularization parameter (λ) effectively shrinks the coefficients, which helps stabilize the inversion of the
matrix, improving the conditioning of the Fisher information matrix [
25]. By incorporating ridge regression into the mixture model estimation, the overall numerical stability of the model is enhanced, particularly in the presence of collinearity among the mixture components. Alternatively, Lasso regression, which uses L1 regularization, can also help in reducing collinearity issues. This method introduces a sparsity constraint, where some coefficients are driven to zero, thereby effectively reducing the number of predictors and mitigating the risk of ill-conditioning. In the context of mixture models, where the components may be highly correlated, Lasso can help identify the most influential predictors, ensuring better stability and interpretability [
26].
In addition to ridge and Lasso regression, robust regression techniques, such as Huber regression and M-estimators, can also be employed to enhance the numerical stability of mixture models. These methods are designed to handle outliers and deviations from the assumptions of homoscedasticity, which can improve the overall conditioning of the Fisher matrix by reducing the influence of extreme values. For mixture experiments where the response is sensitive to outliers or noise, robust regression techniques are highly recommended [
27]. Moreover, quantile regression could be considered in cases where the central tendency of the data is not fully representative of the underlying process. Unlike the ordinary least squares method, which focuses on the mean of the distribution, quantile regression estimates conditional quantiles, offering a more flexible and robust approach that can improve the robustness and conditioning of the estimated parameters [
28].
The conditioning of the Fisher information matrix can also be influenced by the choice of optimality criteria used to fit the model. For example, instead of minimizing the sum of squared residuals, more robust or alternative loss functions can be considered. Huber loss, for example, is less sensitive to outliers and provides a more stable estimation when the response is prone to extreme values or non-normal errors. By minimizing the Huber loss, the impact of outliers is reduced, which in turn helps in improving the conditioning of the Fisher matrix. Additionally, alternative design criteria, such as D-optimality or A-optimality, can be applied when selecting mixture designs. These criteria aim to maximize the information provided by the experimental design, improving the conditioning of the Fisher information matrix by ensuring that the mixture components are well spread and sufficiently informative. By optimizing the experimental design based on these criteria, it is possible to achieve better numerical stability and more reliable parameter estimates in the presence of collinearity [
29].
4. Conclusions
In this article, it was found that the conditioning of the information matrix, which represents the numerical stability of the model adjusted by least squares, changes with different values of in the transformation into lower limit pseudo components. An information matrix with poor conditioning can result in highly correlated estimators with a high standard error. Likewise, the estimated model is highly dependent on the precise location of the experimental points, due to the inflation of the variance of the estimators. The above reflects the importance of taking into consideration the numerical stability of the model when determining the model to be used. For the data used in this article, the first-order inverse model is a useful mode to be considered. Likewise, it contains fewer terms.
We can conclude that the numerical stability of the model can be improved by defining the value of , based on the CN. It is important to highlight that this determination offers a mathematical criterion for professionals who wish to include an inverse term in the regression model. Based on the results that we have derived in this article, we recommend its use in the mixture experiment setting both in design search algorithms and in model fitting.
There are several potential extensions of this work that could lead to further advancements in statistical analysis, particularly in the context of mixture experiments and regression models. Building on the findings related to the correlation of estimators and variance inflation, extensions of this work could integrate regularization techniques, such as ridge regression or Lasso, to further control for multicollinearity and stabilize parameter estimates. This could be particularly useful in complex mixture systems with many variables or when data are scarce. Investigating the robustness of the inverse models and their numerical stability in the presence of noisy or incomplete data could be another useful extension. Since experimental data are often imperfect, evaluating how well these models perform under different levels of noise or outliers could refine their applicability in real-world scenarios.