Non-Linear Saturated Multi-Objective Pseudo-Screening Using Support Vector Machine Learning, Pareto Front, and Belief Functions: Improving Wastewater Recycling Quality

Besseris, George

doi:10.3390/app14219971

Open AccessArticle

Non-Linear Saturated Multi-Objective Pseudo-Screening Using Support Vector Machine Learning, Pareto Front, and Belief Functions: Improving Wastewater Recycling Quality

by

George Besseris

Mechanical Engineering Department, The University of West Attica, 12241 Egaleo, Greece

Appl. Sci. 2024, 14(21), 9971; https://doi.org/10.3390/app14219971

Submission received: 25 August 2024 / Revised: 17 October 2024 / Accepted: 28 October 2024 / Published: 31 October 2024

(This article belongs to the Special Issue AI in Wastewater Treatment)

Download

Browse Figures

Versions Notes

Abstract

:

Increasing wastewater treatment efficiency is a primary aim in the circular economy. Wastewater physicochemical and biochemical processes are quite complex, often requiring a combination of statistical and machine learning tools to empirically model them. Since wastewater treatment plants are large-scale operations, the limited opportunities for extensive experimentation may be offset by miniaturizing experimental schemes through the use of fractional factorial designs (FFDs). A recycling quality improvement study that relies on non-linear multi-objective multi-parameter FFD (NMMFFD) datasets was reanalyzed. A published NMMFFD ultrafiltration screening/optimization case study was re-examined regarding how four controlling factors affected three paper mill recycling characteristic responses using a combination of statistical and machine learning methods. Comparative machine learning screening predictions were provided by (1) quadratic support vector regression and (2) optimizable support vector regression, in contrast to quadratic linear regression. NMMFFD optimization was performed by employing Pareto fronts. Pseudo-screening was applied by decomposing the replicated NMMFFD dataset to single replicates and then testing their replicate repeatability by introducing belief functions that sought to maximize credibility and plausibility estimates. Various versions of belief functions were considered, since the novel role of the three process characteristics, as independent sources, created a high level of conflict during the information fusion phase, due to the inherent divergent belief structures. Correlations between two characteristics, but with opposite goals, may also have contributed to the source conflict. The active effects for the NMMFFD dataset were found to be the transmembrane pressure and the molecular weight cut-off. The modified adjustment was pinpointed to the molecular weight cut-off at 50 kDa, while the optimal transmembrane pressure setting persisted at 2.0 bar. This mixed-methods approach may provide additional confidence in determining improved recycling process adjustments. It would be interesting to implement this approach in polyfactorial wastewater screenings with a greater number of process characteristics.

Keywords:

wastewater; recycling; process screening; fractional factorial designs; Pareto front; support vector machine learning; Lenth statistics; belief functions

1. Introduction

Water is a precious commodity with a rapidly dwindling supply. The scarcity of usable water has great geopolitical repercussions in the global circular economy. Consequently, wastewater stocks now represent a high-demand resource worldwide [1,2]. The 17 United Nations Sustainable Development Goals (UN SDGs) exclusively devote SDG #6 to “Water and sanitation”. Nevertheless, the circularity of water strongly interacts with several other crucial SDGs. To mention just a few, water accessibility impacts the pace of economic growth (SDG #8), which, in turn, is closely connected to the “Sustainable consumption and production” goal (SDG #12), and, eventually, it is water logistics that undisputably lay the foundations for the future of “Sustainable cities” (SDG #11) [3,4,5,6]. Therefore, the water economy and its sparkplug—water innovation—are critical aspects of sustainable development. The circular economy’s 10 R strategies—Refuse, Rethink, Reduce, Reuse, Repair, Refurbish, Remanufacture, Repurpose, Recycle, and Recover—certainly provide ample pathways for improvement [7,8,9,10]. Besides being cost efficient, technologies are developing to identify and exploit opportunities for recycling, recovering, and reusing wastewater stocks. Every application is built to focus on very specific needs—domestic (potable/non-potable) use, agricultural or aquacultural use, industrial use, recreational use, landscape irrigation, etc. [11,12,13]. The purpose of water usage is important from an engineering perspective, because it also designates the preferred physicochemical process to be undertaken in order to regulate the nutrient content profile of the treated wastewater stocks. Of course, the origins of water feeder streams (blue, green, gray, black, or yellow) will lead to them requiring different treatment, because of characteristics related to their biochemical oxygen demand (BOD) and their chemical oxygen demand (COD), their nitrogen and phosphorus content, and the level of presence of solids, pathogens, and micropollutants.

In recent water research, it has emerged that it is difficult to simultaneously exploit multiple types of resources solely relying on a single technology, especially when municipal and industrial wastewater stocks are so different [14]; only a narrow range of water products can be treated this way. The treatment of industrial wastewater streams is particularly demanding due to their potential to carry hazardous pollutants, thus making the removal of heavy metal ions a complex problem from an engineering and economic perspective [15,16,17]. There are numerous technologies that are currently available that depend on (1) classical removal techniques such as the membrane filtration, the adsorption unit operations, the Fenton processes, and advanced oxidation kinetics and (2) more contemporary combination-type applications such as membrane bioreactors and electrically assisted membrane adsorption bioreactors [18,19,20,21]. To achieve the level of resource efficiency and technology effectiveness that the circular economy transition framework foresees, smart innovation in wastewater “reuse, reduce, recycle and recover” processes should be accelerated [22]. Regardless of the wastewater source and the type of technology employed, efficiency innovations should boost a separation unit’s removal performance.

To achieve optimal performance in wastewater treatment processes, it is essential to screen and model contaminant removal efficiencies [23,24]. The high complexity of the chemical physics involved in the separation processes hampers the practical quantification of uncertainty in researching filtration phenomena at a plant scale. As a result, an extensive experimental search may be necessary in order to empirically profile and prioritize—with confidence—the main influences that control the quality/process characteristics. Such studies may also be under stringent restrictions that consider, as a pivotal parameter, the time to respond to a process improvement call, as well as the actual (required) duration to complete a full schedule of experiments. To overcome such hindrances, there are several design of experiments (DOE) procedures—in the realm of statistical engineering—that may assist process engineers and scientists [25,26,27,28,29,30]. DOE tools and techniques may aid in expediting customized quality improvement projects by optimizing research efforts and reducing the lead time required to arrive at consistent solutions. Moreover, DOE methodologies are part of the systematized Lean Six Sigma toolbox that is utilized to couple lean-and-green engineering with statistical engineering to assist achieving innovative and sustainable results [31,32,33,34]. A quick and economical way to program structured DOE trials is to implement fractional factorial designs (FFDs) [25,26]. Non-linear orthogonal arrays (OAs) are FFDs that have been applied in resolving essential research issues in filtration and recycling in order to improve the quality performance of a wastewater operation [35,36]. In published research, FFDs are implemented in experimental schemes, which are appropriate for both replicated and unreplicated trials; the choice usually depends on the degree of difficulty to gather enough tailor-made measurements. Replicated FFD datasets are easier to treat by using classical methods such as regression analysis and analysis of variance (ANOVA) [25,26,37]. Unreplicated factorial experiments, while indispensable in environmental analysis and water management, pose more problems in the analysis phase [38,39,40]. Researchers, in the search for FFDs, also seek to maximize the number of effects studied. This leads to the saturation of the selected FFD scheme. Consequently, the quantification of a residual error cannot be attained, which makes the overall effort rather subjective. To overcome this problem, there are a great number of multifactorial techniques that have been developed to interpret unreplicated experiments in different ways [41]. This paper will concentrate on a very compact FFD design suitable for simultaneously detecting potential curvilinear effects. It is the Taguchi-type L₉(3⁴) OA, which can program minimal experiments for up to four controlling factors [26]; it can be conveniently extended to resolve realistic output conditions that simultaneously involve several characteristics [26]. This sampling approach has been suggested due to the influence of the Taguchi method in the screening and optimization of chemometrical systems [42]. Although the Taguchi method is primarily used for optimally adjusting multiple controls, aiming to improve the performance of a single-characteristic response, methods have been devised to handle multi-characteristic cases, too [43,44,45].

The rationale behind this work is the use of the two-culture approach in aquametric modeling, i.e., statistical inference and algorithmic information [46]. The statistical analysis part is novel because the proposed method uses a well-known specialized technique, which is devoted to treating unreplicated datasets, so as to evaluate the repeatability of the predictions in replicated trials, using elements from evidence theory [47]. The selected multi-parameter screener is the classical Lenth method [48], with critical values for the four-factor three-level L₉(3⁴) OA adjusted by Ye and Hamada [49]. The examined algorithmic solver, operating on the fully replicated L₉(3⁴) OA dataset, is the support vector machine learning approach, which is an alternative artificial intelligence technique to neural networks [50]. Water innovation projects aspire to use the predictive capabilities of artificial intelligence to accomplish improved efficiencies in intricate wastewater operations [51,52,53]. Moreover, it has been suggested that there is value in aligning DOE techniques—in the Lean Six Sigma methodologies—to artificial intelligence technology in industrial applications [54]. This work expands on several recent attempts to adopt and apply the broad know-how from modern algorithmic engines in order to profile state-of-the-art wastewater processes, counting on small-structured datasets to describe the influence of multiple inputs on multiple outputs [55,56,57].

Additional motivation for this work came from the fact that small samples offer opportunities for exploration and evaluation when examined with a variety of methods. Usually, standard errors are larger in data that are collected from smaller samples, regardless of whether the data shape estimators are known or not. This may or may not affect the homoskedasticity assumption when contrasting between different controls. Unequivocally, in unreplicated–saturated multi-parameter FFD experiments, the size of the residual error is not known at all. Furthermore, issues of competing interactions occasionally linger when analyzing the main effects in the FFD datasets. The manifestations of interactions might also depend on the sample size, as well as on the spacing between the operating endpoints for the engaged controlling factors. This might be an important aspect in establishing the reliability of screening/optimization predictions, since the complication of confounding in FFD dataset profiling might be an interfering agent. Furthermore, there is always the risk of two types of correlations in a study that monitors multiple outputs (1) between the examined characteristic responses and (2) between the selected central tendency and the variability measure that is used to summarize the replicated characteristic responses. It should be noted that if a Taguchi approach is adopted for the replicated dataset of a single characteristic, then a “two-response unreplicated” form automatically arises from the necessary conversion of the replicate data entries to mean and signal-to-noise ratio (SNR) vectors.

The new proposal involves decomposing the replicated dataset and applying Lenth screening as many times as the number of replications. Using belief theory [47,58], the repeatability of the screening results is assessed on two levels: (1) seeking agreement for the labeled “strong-effect” groups across separate replications and (2) combining profiling results for the different characteristic responses to obtain a single “overall” screening solution for the active effects. This screening paradigm emulates real situations in which there may be partial knowledge with regard to the dichotomized classes of effects, belonging to either active or inactive groups. This uncertain and imprecise information, which forms the frequency of occurrence, has to be shaped according to a statistical threshold that is based on a pseudo-error [48]. The motivation behind the use of belief functions stems from a need to re-address the subjective nature of the saturated–unreplicated FFD dataset by quantifying the frequency of occurrence of the active effects across all replications. For this reason, attempts have been made to study screening outcome frequencies by introducing three different combination rules with respect to (1) the normalized combination rule (Dempster–Shafer theory [59,60,61,62,63,64]), (2) the conjunctive combination rule (Smets theory [65,66,67]), and (3) Yager’s theory [68,69]. To complement the diversity of the three combination rules, two decision-making criteria are implemented in the framework of the belief functions to seek (1) the maximum credibility and (2) the maximum plausibility. By blending criteria for different characteristics, a simultaneous prediction across characteristics is rated for a single overall classification (active/inactive). This is accomplished by considering each respective mass function vector as a different “opinion” to be combined, and perhaps discounted, wherever there is demonstrated ignorance. Discounting may be used if some intermediate threshold is achieved by the “not-so-active” effects. Even so, decision-making under ignorance may be quantified in terms of a degree of optimism.

Finally, the support vector machine learning approach [50,70,71] is used to perform regression analysis in an attempt to size the FFD dataset effects regardless of their curvature trends. Two support vector machine learning versions are employed to reinforce the predictions from an ordinary quadratic regression analysis, by testing (1) the quadratic model and (2) the optimizable model. The adoption of the support vector machine learning approach attempts to test the capability of this method, which is known to require small training samples, on a small, dense, but also structured FFD dataset. The rest of this paper is organized into four sections. The following section outlines the methodology, making a brief referral to theoretical concepts, introducing the case study, and stating the computational aids used to carry out the required calculations. In the “Results” section, screening predictions are provided by statistical and algorithmic analyzers. In the “Discussion” section, a comparison of multi-response optimizer solutions is included, using belief functions and Pareto fronts. A “Conclusion” section wraps up with the new insights that were gleaned form this effort.

2. Materials and Methods

2.1. The Case Study Data

The revisited case study is a real wastewater process optimization project that was recently published [36]. The recycling raw feed solution was paper-and-cardboard effluent from a papermill factory that received some pretreatment before this attempt to conduct a comprehensive wastewater ultrafiltration performance improvement. This case study is appealing from several perspectives: (1) it was a real application in a large ultrafiltration operation, (2) it used modern technology, (3) it promoted the “small-data” approach, (4) it was a lean-and-green quality enhancement project, (5) it dealt with very complex physicochemical processes, (6) it considered a multi-characteristic process, (7) it performed multifactorial/multi-parameter adjustments, (8) it investigated potential non-linearities in the relationships between inputs and outputs, (9) it “screened-and-optimized” an operation in a single step (Taguchi method), (10) it implemented a saturated FFD sampling scheme, and (11) it attempted a concurrent process enhancement. In brief, the three examined quality characteristics used to monitor the wastewater ultrafiltration performance status of the papermill factory were (1) the average permeate flux (J_p in L/(m²h)), (2) the chemical oxygen demand rejection rate (COD in %), and (3) the cumulative flux decline (SFD). The synchronous optimization goals were to maximize the average permeate flux and the COD rejection rate, as well as to minimize the cumulative flux decline. The controlling inputs that were designated to screen and optimize the overall ultrafiltration unit operation performance were (1) the transmembrane pressure (TMP), (2) the cross-flow velocity (CFV), (3) the temperature (T), and (4) the molecular weight cut-off (MWCO). The ranges of the four controlling factors were assigned as (1) TMP (1.0–3.0 bar), (2) CFV (0.463–1.041 m/s), (3) T (15–30 °C), and (4) MWCO (10–100 kDa); the four controlling factors are coded for ease of reference in this work as A, B, C, and D, respectively. The recommended trial design scheme was the Taguchi-type L₉(3⁴) OA [26], which was chosen in order to also accommodate potential quadratic effects. The experiments were performed in triplicate.

There are several motivating aspects behind an alternative OA dataset analysis path. First of all, it is a “small-data” problem. Thus, it is instructive to try other procedures in order to evaluate how different approaches approximate a final solution. This is particularly intriguing because the experiments were replicated and the repeatability of the measurements was deemed satisfactory. The authors used the Taguchi method to screen and optimize the three process characteristics, one response at a time. However, the Taguchi toolbox does not include a solver to carry out the multi-response problem. Surely, there are several successful applications showing how to resolve the multivariate problem in specific FFD datasets; there are techniques available such as MANOVA, desirability analysis, gray relational analysis, and so forth. But in the DOE research literature, there is no clear consensus about any specific treatment. Justifiably, the authors of the ultrafiltration case study used the utility concept, which is another useful option.

Another motivating factor arose from an observation during a preliminary inquiry, in which a strong correlation between two characteristics was revealed: the average permeate flux and the cumulative flux decline. It was not surprising that the two characteristics were positively correlated. This occurrence would simply cause the omission of one of the two correlated characteristic responses. Then, the concurrent screening-and-optimization procedure would proceed with a simpler model that would only involve two characteristic responses, instead of three. Nevertheless, a complication emerges as the two correlated process characteristics are set up to be conditioned on opposite optimization goals. Therefore, instead, all three process characteristics should partake in the screening-and-optimization scheme. The empirical model structure is used; however, the absence of interactions, the stochastic magnification of a dispersion estimation at small samples in an unknown population, and the statistical limitations of using means and signal-to-noise-ratio measures in small-data conditions might also be debated and might be additional motivating agents.

From the interaction plots in Figure 1, it appears that interactions visually prevail in most pairings, according to their patterns in each of the three-characteristic-response depictions. They may or may not be statistically significant, though. If the interaction effects are statistically significant, then they might influence the results that were originally related to the proposed quadratic model. Furthermore, different screening approaches may detect, in different ways, the interplay of influences emanating from the interplay between the stochastic uncertainty and the epistemic uncertainty. The “missed-out” interactions might not be a rare circumstance in DOE, because the confounding effect suspiciously lurks in the FFD screening schemes. Other issues that may be influential may be connected to maintaining homoscedasticity in ANOVA treatments under the small-sample condition; the randomization of non-normal small data may or may not appear as normal. Ostensibly, the variation around a small-sample mean still depends on the sample size n (∝1/

\sqrt{n}

). Then, small-data standard errors are magnified, raising the chances that the confidence intervals between different settings might overlap.

2.2. Data Analysis

2.2.1. Lenth Statistics and Belief Functions

The FFD sampler that was shown to guide the data collection task in the published case study is a non-linear L₉(3⁴) OA trial planner with a maximum capacity for handling four controlling factors, each examined at three preset settings, while demanding only nine specific factorial setting combinations. Its key feature is its reduced trial volume requirement, down to merely 1/9 of the demand that is otherwise necessitated by a full factorial design. This lean-and-green curvilinear-orientated design is populated with factorial settings in order to make full use of the recipe maker; i.e., it is adjusted to saturate the planner matrix. The recipe maker is an m × n array in which the n rows designate the factorial setting combinations in their preset trial run schedules. The m columns indicate the positions of the investigated controlling factors on the OA planner. This specific design offers an opportunity to quantify quadratic effects. In the examined saturation mode, the relationship between the number of runs and the number of factors becomes n = 2 × m + 1. The m controlling factor vectors are defined as X_j for 1 ≤ j ≤ m (m ϵ N), and their associated entries (settings) are symbolized as x_ij for 1 ≤ i ≤ n (n ϵ N), and 1 ≤ j ≤ m. The three-level notation comprises k_ij fixed levels for each jth factor (1 ≤ j ≤ m and 1 ≤ i ≤ n) with 1 ≤ k_ij ≤ 3 (k_ij ϵ N). The multi-characteristic response matrix is defined as R = {r_icd}, with 1≤ i ≤ n, 1 ≤ c ≤ L (L ϵ N), and 1 ≤ d ≤ D (D ϵ N), where L is the total number of examined characteristic responses and D is the total number of times that the OA scheme has been replicated.

The proposed approach uses D times the Lenth test [48] on each individual saturated OA replicate datapoints, using the critical values that were computed by Ye and Hamada [49] for the individual error rate (IER) and the experiment-wise error rate (EER), respectively. This means that the Lenth statistics are collected D times on the D “unreplicated” OA dataset screenings. If the estimated effects are ε_i_j for 1 ≤ i ≤ D and 1 ≤ j ≤ m, then the pseudo standard error (PSE) is

{P S E}_{i} = 1.5 \cdot {m e d i a n}_{|ε_{i j}| < 2.5 s_{o i}} |ε_{i j}| with s_{o i} = 1.5 \cdot m e d i a n |ε_{i j}| and t_{L, i, j} = \frac{ε_{i j}}{{P S E}_{i}}

where the

t_{L, i, j}

is the Lenth statistic for the jth effect, as it is evaluated after processing the ith OA-replicate dataset.

This work, in a practical manner, attempts to take into account influences on screening solutions due to stochastic uncertainty and epistemic uncertainty. The stochastic uncertainty contributions are fathomed by using the Lenth method on the replicate-partitioned saturated–unreplicated FFD datasets. Furthermore, the theory of belief functions is implemented to incorporate elements from the evidence theory in order to deal with the epistemic uncertainty part. By adopting Dempster–Shafer theory [59,60,61,62,63,64], the process used to perform the fusion of uncertain information is greatly facilitated, hence aiding our understanding of the influence of uncertainty in decision-making. Each separate OA dataset replication is treated as a different evidence opportunity. Unreplicated–saturated OA datasets may be viewed as “unmodeled” relationships, which might lead to “imprecise” predictions. This is because the unexplainable error remains an unknown after each factorial screening has been completed. The state of the belief in the assertion about a group of suggested active factors will be captured by combining successive screening results using evidence combination. Since the experiments are replicated, the prediction of strong effects can be cast in simple probabilities, which are translated into masses according to the selected evidence combination rules. To form those masses, it is necessary to simply count the number of times that an effect appears to be labeled as strong, out of the D times that the OA-scheme has been replicated. To be declared as a strong effect, its Lenth statistic must be greater than the corresponding critical value. To test for strong evidence in rounding up the active factor group, only the effects that exceed the critical value of EER are retained from each individual OA dataset replication. This approach encompasses the idea of the independence of information sources (different replications) and not disengaging source reliability, while enabling the embedding of the potential variability in the level of conflict among the sources of evidence into the decision-making process. To revisit some rudimentary notions about belief functions, firstly, it is constructive to define the frame of discernment, Ω, which is the set of N exclusive and exhaustive hypotheses {H_i} for 1 ≤ i ≤ N. Consequently, the power set, P(Ω), consists of 2^N hypotheses in the Ω: P(Ω) = {∅, {H₁}, {H₂}, …, {H_N}, {H₁ ∪ H₂}, …, Ω}. The mass function, m, is defined as the frequency of occurrences, in ratio form, for each of the Ω elements in the closed range [0,1], i.e., m: P(Ω)→[0,1]. Given a subset A with A ⊆ Ω and m(A) > 0, subset A is called a focal element. The focal element and its corresponding mass function constitute a belief structure. The mass function possesses two properties:

\sum_{A \subseteq Ω} m^{Ω} (A) = 1 and m^{Ω} (\emptyset) = 0

The definition for the basic belief function, bel(X), ∀ X ⊆ Ω and A ⊆ Ω (A ≠ ∅), is

b e l (X) = \sum_{A \subseteq X} m^{Ω} (A)

Similarly, the definition for the plausibility function, pl(X), is

p l (X) = \sum_{A \cap X \neq \emptyset} m^{Ω} (A)

To effectuate the intermixing of several pieces of evidence, there are several combination rules that permit straightforward evaluation by incorporating information with respect to the degree of conflict among sources. Among the most popular combination rules, three alternatives will be employed: (1) the normalized rule (the Dempster–Shafer rule), (2) the conjunctive combination rule (the Smets rule) [65,66,67], and (3) Yager’s rule [68,69]. The Dempster–Shafer rule of combination for the two-source case,

m_{1 \oplus 2}^{Ω} (A)

, is defined as

m_{1 \oplus 2}^{Ω} (A) = \{\begin{matrix} \frac{f_{12} (A)}{1 - f_{12} (\emptyset)} \forall A \subseteq Ω a n d A \neq \emptyset \\ 0 i f A = \emptyset \end{matrix}

where f₁₂(A): 2^Ω→[0,1] is the conjunctive rule function:

f_{12} (A) = \sum_{X \cap Y = A} m_{1}^{Ω} (X) m_{2}^{Ω} (Y) \forall A \subseteq Ω

The TBM conjunctive combination rule,

m_{1 \cap 2}^{Ω} (X),

therefore, is defined as

where the intersection symbol in subscript form denotes, only in this case, the conjunctive operator.

Finally, Yager’s combination rule,

m_{12}^{Ω}

, is

m_{12}^{Ω} (A) = \{\begin{matrix} f_{12} (A) \forall A \subseteq Ω, A \neq Ω a n d A \neq \emptyset \\ f_{12} (Ω) + f_{12} (\emptyset) i f A = Ω \\ 0 i f A = \emptyset \end{matrix}

To provide a broader perspective on the decision-making process, the application of the decision rules according to belief functions will be based on (1) the maximization of plausibility and (2) the maximization of credibility.

2.2.2. Screening Using Support Vector Machine Learning Variations

Input–output data may be denoted in paired sets of (x_i, y_i), with 1 ≤ i ≤ n. Support vector machine learning may be used in regression applications because it relies on a small number of observations from the training data to form the optimal hyperplane [70,71,72]. Therefore, support vector machine learning is used in order to linearly fit a function, in generic terms, y = f(x), in order to best capture this input–output variable relationship. The goal is to estimate an optimal hyperplane, which is a linear function that dichotomizes the maximal margin that is created by the separation of different classes of the input–output vector datapoints. The datapoints assist in forming the maximal margin that are defined as the support vectors (1 ≤ i ≤ l). In simplified form, the function is written as

y = f(x) = <w,x> + b where w ⊆ R^d and b ∈ R

where <w,x> is the inner product of w and x, and b is the intercept. The Vapnik formulation [70,71,72] relies on the slack variables ξ_i and ξ_i^* to tackle situations where infeasible constraints on the optimization problem become relevant to the goal of minimizing the norm of w. Therefore, this extended formulation for the objective function becomes

Minimize \frac{1}{2} {‖w‖}^{2} + C \cdot \sum_{i = 1}^{l} (ξ_{i} + ξ_{i}^{*})

The positive constant C balances the levelness of the fitted function in competition with those fitting errors, which appear to be larger than the prespecified error constant, ε. The two conditions to accomplish this are given by the following inequalities:

\begin{matrix} y_{i} - 〈w, x_{i}〉 - b \leq ε + ξ_{i} with ξ_{i} \geq 0 \\ - (y_{i} - 〈w, x_{i}〉 - b) \leq ε + ξ_{i}^{*} with ξ_{i}^{*} \geq 0 \end{matrix}

By composing a primal objective function, a Lagrangian is used, which incorporates the objective function and the two conditions with two sets of multipliers each, these are (1) η_i and η_i^* to adjust the two slack variables ξ_i and ξ_i^* and (2) α_i and α_i^* to hinge the two conditions on the Lagrangian function. By obtaining the extrema of the Lagrangian function, it has been shown that the w and b are straightforward to estimate:

w = \sum_{i = 1}^{l} (α_{i} - α_{i}^{*}) x_{i}

and

\max \{y_{i} - 〈w, x_{i}〉 - ε| α_{i} < C o r α_{i}^{*} > 0} \leq b \leq \min \{y_{i} - 〈w, x_{i}〉 - ε| α_{i} > 0 o r α_{i}^{*} < C}

Thus, the hyperplane is defined by estimating w based only on the support vectors, and its equation is simply <w,x> + b = 0. The function between input and output is given, then, by the support vector expansion:

y = f (x) = \sum_{i = 1}^{l} (α_{i} - α_{i}^{*}) 〈x_{i}, x〉 + b

2.3. The Methodological Outline

The proposed methodological approach may be summarized in the following steps:

(1): Define the group of the controlling factors and the group of the quality characteristic responses that would form the examined input–output relationship.
(2): Organize the input adjustments according to the non-linear fractional factorial planner of choice that best accommodates all settings.
(3): Ensure that research time and costs are minimized by saturating the sampling scheme and by curbing the number of replicates.
(4): Test the replication adequacy, data normality, and correlations among the characteristic responses, extending the checks within each participating factorial setting, too.
(5): Perform factorial screening on the small multi-response dataset using a series of unreplicated single-response Lenth tests on separate replications for each characteristic response.
(6): Implement combination rules based on the evidence theory of belief functions using criteria for maximum plausibility and maximum credibility to identify the leading regressors.
(7): Compare factorial activeness predictions among characteristic responses using quadratic regression, quadratic support vector machine learning, and optimizable support-vector machine learning by contrasting the goodness-of-fit performances among the different results.
(8): Propose a synchronous optimal solution using a multi-objective Pareto front approach.
(9): Confirm and discuss optimal solution differences with other known multi-response, multi-parameter solvers.

2.4. The Computational Support

The pairwise correlations between the three quality characteristics, the combined data normality tests (Kolmogorov–Smirnov and Shapiro–Wilk tests), and their respective kurtosis and skewness estimations for each individual characteristic, as well as the normality tests per factorial setting for each individual characteristic, have been computed using the statistical software IBM SPSS (v.29). Using the same commercial software, the F-test change for retaining potential regressors, the Akaike Information Criterion (AIC), the Amemiya Prediction Criterion (PC), Mallows’ Prediction Criterion (CP), the Schwartz Bayesian Criterion (SBC), the Durbin–Watson test statistic, and the VIF collinearity score were also all estimated for each (individual-characteristic) best-fitted model to ensure agreement among goodness-of-fit measures and conditional metrics.

The specialized computational work was carried out on the statistical freeware platform R (v. 4.3.0) [73]. The basic visual dataset screening of the three characteristic responses (the average permeate flux, COD rejection rate, and cumulative flux decline) was conducted using boxplots, violin plots, bag plots, and Q-Q plots from the R-package “graphics” (v. 4.3.0), the “vioplot()” module from the R-package “vioplot” (v. 0.5.0) [74], the “bagplot()” [75] module from the R-package “aplpack” (v. 1.3.5), and the module “qq_conf_plot()” from the R-package “qqconf” (v. 1.3.1). The design “stripes” was selected as the preferred contrasted display option from the “plotsummary()” module of the R-package “aplpack” (v. 1.3.5). The L₉(3⁴) OA array was constructed using the module “param.design()” from the R-package “DOE.base” (v.1.2-2). Quadratic regression analysis of the three quality characteristics in three-level-coded form (−1, 0, 1) was accomplished by the module “lm()” in the R-package “stats” (4.3.0). The R-package “ibelief” (v. 1.3.1) was deployed to complete the engagement of belief functions by the mass function declaration, the evidence combination, and decision-making. Three combination rules were tested to combine masses in the “DST()” module: (1) the Smets criterion, (2) the Dempster–Shafer criterion, and (3) Yager’s criterion. Two decision rules were explored in the module “decisionDST()”: (1) the maximum plausibility and (2) the maximum credibility.

By implementing the computational software MATLAB (v.R2024a), fitting performances were compared using (1) quadratic regression, (2) quadratic support vector machine learning, and (3) optimizable support vector machine learning for each individual characteristic response on their combined datasets; the feature importance scores, which were sorted using the F-test algorithm, supplemented the predictions for the three individual characteristic responses. Finally, two MATLAB (v.R2024a) Pareto front procedures were employed to deal with the synchronous multi-response, multi-parameter optimization task: (1) the genetic algorithm module for multi-objective optimization (“gamultiobj()”) and (2) the “paretosearch()” solver.

3. Results

3.1. Initial Data Screening

The data screening process begins with tiled QQ-plots for each replicate dataset and quality characteristic. As shown in Figure 2, all nine datasets appear symmetrically distributed and fit closely around their centerlines. On the other hand, exploring replicate shape patterns by violin plot screening [74] reveals several instances of skewed motifs (Figure 3). Median locations are maintained uniform across different replicates. However, the distribution shapes within each quality characteristic may also vary for different settings. For example, in the case of the average permeate flux, the “bulging pattern” of the data distribution and its orientation are not uniform; the mode of the second and third settings bottoms toward lower J_p values (Figure 3A). Tail effects are more exaggerated in the second replicate. Similar comments may be extended to the COD rejection rate behavior, only this time, it is the first replicate dataset that exhibits stronger skewness, while the third replicate is more balanced (Figure 3B). Therefore, it is not clear whether a propensity is present and whether it is co-operative across all characteristics. Finally, the cumulative flux decline data are more consistently concentrated toward lower values (Figure 3C); skewness tendencies are more uniform for this quality characteristic. At first glance, there is no indication that the three quality characteristics might share parallel trends.

To probe potential relationships among characteristic responses, the pairwise correlations of the three quality characteristics have been tabulated in Table 1; the upper and lower confidence intervals were set at 95% (IBM SPSS v.29). Prior to estimating their respective correlations, the data from the three replicates were consolidated into a single group for each quality characteristic. It can be observed that there is a strong positive correlation between the average permeate flux and the cumulative flux decline. This positive correlation is expected, given the underlying physical principles of permeate flux and flux decline. Their correlation coefficient is estimated to have a value of 0.83; the 95% confidence interval ranges from 0.66 to as high as 0.92. This implies that the factorial modeling may be simplified by reducing the original three-response problem to a two-response problem. It can also be observed that there is a weaker negative correlation between the COD rejection rate and the cumulative flux decline since their estimated correlation coefficient value of −0.39 is statistically significant (α = 0.05). To determine whether it was plausible to further simplify the model structure, a bagplot screening [75] was performed (Figure 4). It is clear that the only correlation worth retaining is indeed that between the average permeate flux and the cumulative flux decline (middle bagplot), which illustrates the “sufficiently elongated” bag effect.

It is observed that there are no outlier points beyond the bagplot fence. The zone between the loop and the fence is narrow—for all three bagplots—an indication that the formed bags solidly support the depth 2D median estimations.

3.2. Multifactorial Solver Screening and Selection

A customary approach to recovering any statistical strong effects is to use the Lenth method [48]. To test at the same time for predictor non-linearity and effect repeatability, three independent evaluations (unreplicated arrangement) were conducted for each replicate dataset. The critical values to test the effect significance were attempted at two statistical thresholds in order to control (1) the individual error rate (IER) and (2) the experiment-wise error rate (EER), both at a level of 0.05. The IER and EER cut-off values for I = 8 were taken from Tables 5 and 6 in reference [49]. The IER and EER values are 2.20 and 4.87, respectively. The outcomes from the individual screenings are listed in Table 2. Regarding the average permeate flux, the statistically strong effects are identified to the linear part of factors A (TMP) and D (MWCO). The |t_Lenth| values of factor A are repeatable (7.72, 8.02, 7.41) and exceed both the IER and EER thresholds. Similarly, the |t_Lenth| values of factor D are also repeatable (6.71, 7.01, 6.55) and exceed both the respective IER and EER cut-offs. The linear part of factor C appears to be IER-significant in one of the three datasets. The quadratic part of factor A appears to be IER-significant in two out of the three replications. Considering the COD rejection rate, it is a consistent finding that no effect transpires as influential in all three attempts. Finally, with respect to the cumulative flux decline, the linear terms of factors A and D appear to be IER-significant in two out of the three replications, returning |t_Lenth| values of (2.74, 4.08) and (3.16, 4.10), respectively. Given the outcomes from the repeated application of the Lenth-type saturated–unreplicated screening on separate replicate datasets, additional insights were sought by also using single-response predictions in combined replicate datasets. Quadratic linear regression, the quadratic support vector machine, and the optimizable support vector machine (MATLAB R2024a) were employed to compare predictions across the different approaches.

In Figure 5, the feature importance scores for the average permeate flux, the COD rejection rate, and the cumulative flux decline are displayed. The feature importance scores are sorted using the F-test algorithm (MATLAB R2024a). Only the influence of factors A and D on the average permeate flux is sizable (Figure 5A). This result agrees with the general solution to the Lenth screening method. Considering the COD rejection rate, the predominant effect is identified to factor D (Figure 5B). This outcome conflicts with the Lenth diagnostics, which detected no effect. Finally, the cumulative flux decline appears to be modulated by both factors A and D (Figure 5C). This finding could be said to be in partial agreement with the respective solution using the Lenth statistics, as it was found that it occurred in two-thirds of the respective replications, and it was also only IER-significant.

The above results, which relied on sorting the feature importance scores, were common to all three tested solvers. Table 3 lists the prediction performance for the three attempted solvers in terms of (1) RMSE, (2) MSE, (3) R², (4) MAE, and (5) MAPE %. The performance indicators for the trained model predictions of the average permeate flux output show that quadratic linear regression and the optimizable support vector machine solutions are comparable. The modeling metrics for quadratic linear regression were reproducibly computed to be (1) a coefficient of determination of 0.91, (2) an MAE estimation of 4.90, and (3) an MAPE % estimation of 12.80. According to the warning message that was issued from the MATLAB package, the optimizable SVM results were not reproducible. They are shown here for demonstration purposes.

Each time the solver was run, the combination of the optimized hyperparameters—the type of kernel function, the Box constraint, the epsilon value, and the automated decision-making on whether to standardize the dataset or not—were altered by the solver, never indicatively improving on the performance that was achieved by quadratic regression analysis. Consequently, it was decided to suspend the analysis for the other two characteristic responses using the optimizable SVM. The performance of quadratic regression analysis against the performance of the quadratic SVM in the COD rejection rate response became more emphatic, as shown in Table 3. The quadratic regression achieved RMSE, R², MAE, and MAPE% estimates of 1.90, 0.91, 1.61, and 3.76, respectively. The difference in performance in terms of the coefficient of determination between the quadratic linear regression and the quadratic SVM is indicative of their estimated values of 0.91 vs. 0.71. Finally, the corresponding comparison of the multifactorial solvers was also completed for the cumulative flux decline response (Table 3). The solver preference again favors the linear regression technique in all prediction performance estimators. The respective quadratic regression analysis provided lower RMSE, MAE, and MAPE% estimates at 0.63, 0.48, and 9.24, respectively. The difference in performance, in terms of the coefficient of determination, between quadratic linear regression and the quadratic SVM, is nevertheless not so distinctive, since their two trained models returned R² estimations of 0.91 and 0.88, respectively.

3.3. Multifactorial Optimal Settings with Quadratic Effects

Upon completing the solver performance testing on the trained models in the previous subsection, it was decided that the quadratic regression model would be used to proceed to the factorial analysis phase. In Table 4, the regression coefficients are tabulated for each of the three quality characteristics. The multiple adjusted R² values for the combined replicated datasets were computed, and their values were found to be higher than 0.94 for all three characteristic responses. Using a Bonferroni family-wise error rate of 0.05, the linear and quadratic part of factor A (TMP) is retained for the average permeate flux, as well as the linear term only of factor D (MWCO). Regarding the COD rejection rate response, the statistically contributing effects are identified to (1) the linear part of factor A (TMP), (2) the linear term of factor B (CFV), (3) the quadratic part of factor C (temperature), and (4) the linear and quadratic terms of factor D (MWCO). Finally, the cumulative flux decline response is modulated by (1) the linear term of factor A (TMP), (2) the linear term of factor B (CFV), (3) the linear term of factor C (temperature), and (4) the linear term of factor D (MWCO).

The resulting response functions for the individual structural modeling of the three quality characteristics are

J_p = 57.37 + 18.04⋅A − 8.83⋅A² + 15.81⋅D

(1)

COD = 38.71 + 2.01⋅A + 3.29⋅B + 3.58⋅C² − 5.32⋅D + 4.07⋅D²

(2)

SFD = 5.53 + 1.58⋅A − 0.51⋅B + 0.46⋅C + 1.70⋅D

(3)

The maximization of the average permeate flux response is attained by maximizing factor D (MWCO) and setting it at 100 kDa. Moreover, the optimal setting for factor A is simply estimated at

\frac{\partial J_{p}}{\partial A} = 0 \Rightarrow A = \frac{- 18.04}{2 \cdot (- 8.83)} \approx 1.0 (the third setting in the (- 1, 0, 1) standardization of the original levels)

Hence, the optimal setting for factor A (TMP) is 3.0 bar. For optimal COD rejection, factors A (TMP), B (CFV), and C (temperature) should be set at their highest values: 3.0 bar, 1.041 m/s, and 30.0 °C, respectively. Additionally, to obtain the optimal setting for factor D,

\frac{\partial C O D}{\partial D} = 0 \Rightarrow D = \frac{- (- 5.32)}{2 \cdot (4.07)} \approx 0.65

The optimal setting is within the second and third preset adjustment; it is closer to the third setting. Hence, for factor D(MWCO), the recommendation seems to be a setting of 100 kDa. Finally, in minimizing the cumulative flux decline response, factors A (TMP), B (CFV), C (temperature), and D (MWCO) may be simply set at 1.0 bar, 1.041 m/s, 15.0 °C, and 10 kDa, respectively. Based on this analysis, the optimal settings for the average permeate flux and COD rejection rate point in the same direction as expected. If the cumulative flux decline is to be taken in account, there will be agreement only on the CFV optimal setting for all three characteristics. However, it is the TMP and MWCO factors that will control the direction of concurrent optimization, depending on the weights that each characteristic response will receive in a realistic scenario.

3.4. Multi-Objective Information Fusion Using Lenth Screening Statistics and Belief Functions

Belief functions from the evidence theory will be utilized to assist the assessment of the information fusion from the “three subjective sources”, i.e., the average permeate flux, COD rejection rate, and cumulative flux decline. The repeated unreplicated–saturated Lenth profiler predictions straightforwardly provide the related information to form the empirical mass functions. From the Lenth statistics of Table 2, we accumulated the frequency of occurrences on a factorial basis. Thus, there are only two factors (A and D) that appear to generate substantial variability in some of the three sources (process characteristics), and to a different extent. The exclusive and exhaustive propositions that will define a frame of discernment need to be partitioned to 2² (=4) propositions. Consequently, the power set, P(Ω), can be easily formulated as

P(Ω) = {∅, ‘A&D active’, ‘A&D inactive’, Ω}

From Table 2, it can be noted that regarding the three independent J_p-response screenings of the average permeate flux response, strong and consistent profiling outcomes (>EER = 4.87) were identified to only two factors, A and D; this occurrence was observed in three out of three screenings. In the power set, this is counted as the focal element of the proposition “A&D active”, with a contributing mass in the belief structure: m (‘A&D active’) = 1, and, due to the mass function of the power set for the average permeate flux, m_jp is m_jp = (0, 1, 0, 0). In a similar fashion, the profiling of the COD rejection rate response failed to uncover any active effects; thus, factors A and D are deemed inactive. As a result, we have m (‘A&D inactive’) = 1, and the mass function for the COD rejection rate will be m_COD = (0, 0, 1, 0). The last “source” is the cumulative flux decline, which is substantially differentiated with respect to the behaviors of the two previous characteristics. In this case, two out of the three screening predictions suggest that factors A and D are active. However, both outcomes are only significant with respect to the threshold based on the individual error rate (>IER = 2.20); both of them are below the cut-off value, which was dictated by the experiment-wise error rate (<EER = 4.87). Consequently, the mass function for the cumulative flux decline becomes m_SFD = (0, 0.667, 0.333, 0). But the information from this source is not of the same parity as that for the average permeate flux response. Hence, it should be discounted before entering the fusion process. Two discounting rates were chosen for testing, one at d = 0.90, which is fairly optimistic, and one at d = 0.5, which may be considered more neutral.

After generating the three mass functions, it was revealed that there was high conflict among the three sources. Quantitively, this is shown in Table 5. Regardless of the discounting rate, the “closed world” assumption information fusion (Dempster–Shafer combination rule) could not be carried out. The “open world” assumption (Smets combination rule) outcome is the empty set, while Yager’s combination rule and the Dubois–Prade disjunctive combination rule [76] both agreed on transferring either mass to complete ignorance.

In light of this manifestation, a conflict management exercise is demonstrated. In Table 6, it is shown that the conflict was clearly attributed to two specific sources: the average permeate flux and COD rejection rate.

Acquired by removing the average permeate flux source, the two-source combination results for the COD rejection rate and the cumulative flux decline are shown in Table 7. It is the inactiveness proposition of the two factors A and D that receives the maximum degree of belief, according to the Dempster–Shafer normalized criterion and irrespective of the discounting factor value. However, the Smets conjunctive combination rule, Yager’s combination rule, and the Dubois–Prade combination rule are dependent on the discounting factor. At a discounting rate of 0.5, the inactiveness of the two factors is associated with a degree of belief of 0.633. At a discounting rate of 0.9, this degree of belief is lowered to 0.397 for all three ruled outcomes. The last scenario is to test the information fusion of the two sources of the COD rejection rate and the cumulative flux decline. In Table 8, it is shown that it is the activeness of the two factors, A and D, that receive the maximum degree of belief from applying the Dempster–Shafer normalized criterion, irrespective of the discounting factor value. The larger degree of belief for Yager’s rule and the Dubois–Prade rule occurs at a discount factor of 0.5, while for the Smets rule, a matching degree of belief is computed for a discount factor of 0.9; in all three cases, this value is 0.835. This high estimation may be due to the fact that the two sources were found earlier to be correlated. However, if a non-idempotent rule is used (the average of the mass functions), then the degree of belief of factor-activeness is lowered to 0.56 [77].

4. Discussion

Considering the various prediction solutions obtained in the “Results” section, new opportunities have arisen for exploring underlying relationships. A quick visual way to recap the dataset similarities is to employ the “stripes” design for an informative plot summary. Its standardizing facility, in terms of furnishing a concentrated and robust description of the combined dataset propensities, is shown in Figure 6. Likewise, the 50% boxplot area is narrower for the average permeate flux and the COD rejection rate; it is distinctively larger for the cumulative flux decline, which might imply that it is susceptible to greater variability. It is surprising that in all three datasets, it is the 25% of the data below the median that is confined to a thinner (not-symmetrical) stripe. However, it is only the distribution curve of the cumulative flux decline that exposes a skewness, in view of the fact that the data mode is situated in the vicinity that favors lower measurements. To affirm the potentiality of such visual artifacts, tests of normality were carried out according to the Kolmogorov–Smirnov and Shapiro–Wilk procedures. The combined replications for each quality characteristic response are compared in Table 9. Both tests confirm the symmetry of the measurements, as shown in Table 9 (IBM SPSS v.29). To assess the shape statistics for the three characteristic responses, the skewness and kurtosis estimations are provided in Table 10 (IBM SPSS v.29). The low skewness statistic values, along with their mediocre standard errors, may attest to the symmetry for each of the three datasets. On the other hand, the kurtosis estimations may not be so discerning between mesokurtic or platykurtic tendencies.

To further support the above arguments, it is beneficial to have a robust view of the contrasting trends in the combined datasets, for each of the three characteristic responses, on QQ-plot displays. Figure 7 shows that in all three graphs, datapoints above the trendline consistently outnumber those below it. However, all datapoints, in all three cases, are fairly well behaved, since they are all well contained within their 95% confidence bands.

The next step is to inspect for any instabilities within each factor setting and delineate possible sources of discrepancies in the resulting predictions. From Table 11, the data that are traced to the third setting of factor A (TMP) appear to deviate from normality (IBM SPSS v.29) only in the case of the COD rejection rate—a result in agreement with both statistical procedures performed (Kolmogorov–Smirnov test and Shapiro–Wilk test at α = 0.05). For the factor B (CFV) settings, there are occurrences when normality may be debated, mostly at the first level, while there is a split decision on the second level for both the average permeate flux and the cumulative flux decline datasets (Table 12). The normality screening results are differentiated among the settings of factor C (temperature). Both statistical tests reject the normality assumption, at the second and third level for the COD rejection rate response data and at the first level for the cumulative flux decline response data (Table 13). Finally, there are more mixed data propensities in connection to factor D (MWCO) (Table 14).

Both tests agree on the digressing normality trends in the COD rejection rate response data related to the third setting of factor D, and, respectively, to the second level for the cumulative flux decline response data. The average permeate flux response data receive a split decision that it is identified to the first level.

The visualization of this exhibited anisotropy, within each factor setting, is greatly facilitated by depicting factorial effects in level-partitioned violin plots. Screening for the average permeate flux response accentuates this wider diversification of the behavior of the effects, which is attributed to specific factorial settings (Figure 8). Using this screening approach, it emerges that factor A and factor D are the effects that might influence the average permeate flux the most; the strength differential is clearly formed by the adjustments of the first and third level. Violin lengths and the bulging motifs of the distributions also broadly vary. For example, for factor A, the data associated with the first level are rather symmetric, while the next two levels generate data with alternating skewed protrusions. Even more impressive are the effects owing to factor B, which is deemed, nevertheless, inactive. The first two levels produce data that are reverse images of each other, while the data created by the third level are shaped into an “ink pot”. Consistent skewness in all three levels is exhibited only in connection to factor D.

A strong non-uniform dispersion in the COD rejection rate response data is portrayed in Figure 9. The only strong effect is identified to factor D, which, through its first- and second-level adjustments, generated non-overlapping violin patterns, thus demonstrating the lowering effect of factor D. The conditional influence of the second level of the temperature is somehow unexpected, compared to the first and third levels. A great variety of motifs are produced in reference to the cumulative flux decline response data (Figure 10). The length, skewness, and bulging characteristics of the violins may not be considered predictable for any of the four controlling factors. What may be discerned, though, is that factors A and D may appear active when tuned and contrasted at their respective end levels.

Screening results that take into account the sequential evaluation of change statistics, aa well as the accompanying significance of the F-test score change when successively entering and retaining additional regressors, are tabulated for all three quality characteristics in Table 15, Table 16 and Table 17. The inclusion of particular factorial contributions in the active effects list is justified, and it is individually accounted for. In all cases, the achieved significance that measured the change in the modeling effort was below an error rate of α = 0.05 (IBM SPSS v.29). All regression models improved significantly with the addition of key factorial contributions, as indicated by the Akaike Information Criterion (AIC), the Amemiya Prediction Criterion (PC), the Mallows’ Prediction Criterion (CP), and the Schwartz Bayesian Criterion (SBC) (IBM SPSS v.29). There is a collective agreement among the four model selection criteria about finalizing the group of statistically important effects, which were deemed worth retaining. Particularly impressive is the change in the performance of the CP, which sharply varied from the first to the last predictor for (1) the average permeate flux from 94.014 to 3.549, (2) the COD rejection rate from 27.376 to 3.594, and (3) the cumulative flux decline from 138.756 to 5.000. It is important to note that with this method, factor B is additionally retained for the COD rejection rate response. Furthermore, to model the cumulative flux decline, it is indicated that all four effects should be retained.

As can be seen in Table 15 and Table 17, the Durbin–Watson statistic estimations for the regression modeling of the average permeate flux and cumulative flux decline suggest that there is no autocorrelation in the residuals. On the contrary, the residual analysis of the regression results for the COD rejection rate (Table 16) suggests that there is a strong negative autocorrelation. In Table 15, Table 16 and Table 17, it becomes evident that there are no multicollinearity issues among the examined predictors for any of the three quality characteristic responses; the variance inflation factor estimations were very stable, at a value of 1.0, in all instances.

To synchronously find an optimal solution for the three quality responses, two MATLAB (R2024a) procedures were employed: (1) the genetic algorithm module for multi-objective optimization (“gamultiobj”) and (2) the “paretosearch” solver. Both modules require that the objective functions be in the form of functions that collectively must be setup to be minimized. To conveniently proceed with this phase, Equations (1)–(3) from the “Results” section will be utilized. Since the responses of the average permeate flux and the COD rejection rate are intended to be maximized, the initialized objectives that enter the minimization MATLAB procedures of the “gamultiobj” and “paretosearch” solvers should be simply transformed to their inverse responses, 1/J_p and 1/COD, respectively, in order to fit in the host (platform) minimization scheme. The generated 3D Pareto fronts, after running the two procedures, are shown in Figure 11. On the labeled axis of the two graphs, “Objective(1)” and “Objective (2)” are the modified responses, 1/J_p and 1/COD, respectively, which are computed from Equations (1) and (2). “Objective(3)” is simply the SFD response, which is taken unchanged from Equation (3). The spread of the three-way objective scores does not show exactly the same pattern for the two solvers. However, there is an area in both 3D solution spaces that both solvers agree on. To refine the solution space, the three 2D Pareto fronts are drawn in Figure 12 for the “gamultiobj” solver version. It is discerned that for the relationships of Objective(1) vs. Objective(2), Objective(2) vs. Objective(3), and Objective(2) vs. Objective(1), a reasonable front outcome could, respectively, be (1) (3.5, 0.02), (2) (2, 0.019), and (3) (0.02, 0.019). It appears that favorable output values would be translated to J_p ~ 50 L/(m²h), COD ~ 50%, and SDF ~ (2, 3.5). The solution that more closely compromises all three responses, given the fact that J_p and SFD are positively correlated but also are constrained by opposite optimization goals, is described by run #6: TMP = 2.0 bar, CFV = 1.041 m/s, T = 15 °C, and MWCO = 50 kDA. Likewise, the three 2D Pareto front profiles, using the “paretosearch()” solution, are shown in Figure 13. Basically, the outcome solution this time is (1) (3.5, 2), (2) (2.5, 0.0185), and (3) (0.02, 0.0175). By inverting the Objective(1) and Objective(2) scores for J_p and COD, respectively, the solution again matches the one dictated by run #6.

5. Conclusions

In the circular economy, innovations in wastewater treatment are a priority. From an engineering perspective, wastewater treatment is a complex physicochemical process. Part of its complexity is tied to the wastewater resource origins, the available technological options, and the end usage of the water products. Small-data optimization may aid in guiding wastewater process improvement for a multitude of conditions at a factory level. Fractional factorial design trials can reduce the need for large-scale experiments, minimizing operational downtime. However, small-data problems require multilateral analysis, which is facilitated by a combination of statistical and algorithmic screening/optimization methods. An interesting case, due to its high criticality, involves the improvement of the colloidal removal efficiency in a paper-mill effluent treatment system due to four controlling factors, while at the same time tempering filtration membrane fouling issues. The analysis showed a positive correlation between the average permeate flux and the cumulative flux decline, although they had opposite optimization goals. A variety of informational and statistical tools were used to provide additional insights into the multi-objective multi-parameter FFD problem. Replicate Lenth screening statistics were converted to belief structures for strong effects. The feasibility of a concurrent solution was examined by treating the characteristic responses as different sources; various combination rules in the belief function framework were tested. A three-characteristic information fusion process led to high conflict. Therefore, different scenarios revealed that the credibility was maximized using the two responses: the average permeate flux and the cumulative flux decline. The transmembrane pressure and the molecular weight cut-off were the active factors in agreement with the feature importance score, which was estimated from the F-test algorithm in the support vector machine learning regression solver. Using the three-process-characteristic Pareto-front optimizer, the optimal solution matched the original results, only suggesting a molecular weight cut-off setting at 50 kDa. This approach can be expanded to include a larger number of process characteristics and consider potential interactions between different controlling factors.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that were used in this work were taken from ref. [36] in Table 4.

Conflicts of Interest

The author declares no conflicts of interest.

References

European Investment Bank. Wastewater as a Resource; European Investment Bank: Luxembourg, 2022; Available online: https://www.eib.org/attachments/publications/wastewater_as_a_resource_en.pdf (accessed on 9 August 2024).
World Bank. Wastewater Is Not a Waste; World Bank Group: New York, NY, USA, 2024; Available online: https://www.worldbank.org/en/news/video/2020/03/19/wastewater-is-not-a-waste (accessed on 9 August 2024).
United Nations. Sustainable Development Goal 6: Clean Water and Sanitation; United Nations: New York, NY, USA, 2024; Available online: https://www.un.org/sustainabledevelopment/water-and-sanitation/ (accessed on 9 August 2024).
United Nations. Sustainable Development Goal 8: Decent Work and Economic Growth; United Nations: New York, NY, USA, 2024; Available online: https://www.un.org/sustainabledevelopment/economic-growth/ (accessed on 9 August 2024).
United Nations. Sustainable Development Goal 11: Sustainable Cities and Communities; United Nations: New York, NY, USA, 2024; Available online: https://www.un.org/sustainabledevelopment/cities/ (accessed on 9 August 2024).
United Nations. Sustainable Development Goal 12: Responsible Consumption and Production; United Nations: New York, NY, USA, 2024; Available online: https://www.un.org/sustainabledevelopment/sustainable-consumption-production/ (accessed on 9 August 2024).
Morseletto, P.; Mooren, C.E.; Munaretto, S. Circular economy of water: Definition, strategies and challenges. Circ. Econ. Sustain. 2022, 2, 1463–1477. [Google Scholar] [CrossRef]
Koseoglu-Imer, D.Y.; Oral, H.V.; Sousa Coutinho Calheiros, C.; Krzeminski, P.; Güçlü, S.; Almeida Pereira, S.; Surmacz-Gorska, J.; Plaza, E.; Samaras, P.; Binder, P.M.; et al. Current challenges and future perspectives for the full circular economy of water in European countries. J. Environ. Manag. 2023, 345, 118627. [Google Scholar] [CrossRef] [PubMed]
Peydayesh, M.; Mezzenga, R. The circular economy of water across the six Continents. Chem. Soc. Rev. 2024, 53, 4333–4348. [Google Scholar] [CrossRef]
Dragomir, V.D.; Dumitru, M. The state of the research on circular economy in the European Union: A bibliometric review. Clean. Waste Syst. 2024, 7, 100127. [Google Scholar] [CrossRef]
Fernandes, E.; Cunha Marques, R. Review of Water Reuse from a Circular Economy Perspective. Water 2023, 15, 848. [Google Scholar] [CrossRef]
Tzanakakis, V.A.; Capodaglio, A.G.; Angelakis, A.N. Insights into Global Water Reuse Opportunities. Sustainability 2023, 15, 13007. [Google Scholar] [CrossRef]
Capodaglio, A.G. Urban Wastewater Mining for Circular Resource Recovery: Approaches and Technology Analysis. Water 2023, 15, 3967. [Google Scholar] [CrossRef]
Soo, A.; Kim, J.; Shon, H.K. Technologies for the wastewater circular economy—A review. Desal. Water Treat. 2024, 317, 100205. [Google Scholar] [CrossRef]
Dhokpande, S.R.; Deshmukh, S.M.; Khandekar, A.; Sankhe, A. A review outlook on methods for removal of heavy metal ions from wastewater. Sep. Purif. Technol. 2024, 350, 127868. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, G.; Jiao, Y.; Quana, B.; Lua, W.; Su, P.; Tang, Y.; Wang, J.; Wu, M.; Xiao, N.; et al. Critical analysis on the transformation and upgrading strategy of Chinese municipal wastewater treatment plants: Towards sustainable water remediation and zero carbon emissions. Sci. Total Environ. 2023, 896, 165201. [Google Scholar] [CrossRef]
Sniatała, B.; Al-Hazmi, H.E.; Sobotka, D.; Zhai, J.; Mąkinia, J. Advancing sustainable wastewater management: A comprehensive review of nutrient recovery products and their applications. Sci. Total Environ. 2024, 937, 173446. [Google Scholar] [CrossRef] [PubMed]
Ahmed, M.; Mavukkandy, M.O.; Giwa, A.; Elektorowicz, M.; Katsou, E.; Khelifi, O.; Naddeo, V.; Hasan, S.W. Recent developments in hazardous pollutants removal from wastewater and water reuse within a circular economy. npj Clean Water 2022, 5, 12. [Google Scholar] [CrossRef]
Botelho Junior, A.B.; Tenório, J.A.S.; Espinosa, D.C.R. Separation of Critical Metals by Membrane Technology under a Circular Economy Framework: A Review of the State-of-the-Art. Processes 2023, 11, 1256. [Google Scholar] [CrossRef]
Addagada, L.; Goel, M.; Shahid, M.K.; Prabhu, S.V.; Chand, S.; Sahoo, N.K.; Rout, P.R. Tricks and tracks in resource recovery from wastewater using bio-electrochemical systems (BES): A systematic review on recent advancements and future directions. J. Water Process Eng. 2023, 56, 104580. [Google Scholar] [CrossRef]
Davis, M.L. Water and Wastewater Engineering: Design Principles and Practice, 2nd ed.; McGraw Hill: New York, NY, USA, 2019. [Google Scholar]
Vinayagam, V.; Sikarwar, D.; Das, S.; Pugazhendhi, A. Envisioning the innovative approaches to achieve circular economy in the water and wastewater sector. Environ. Res. 2024, 241, 117663. [Google Scholar] [CrossRef] [PubMed]
Burn, D.H.; McBean, E.A. Optimization modelling of water quality in an uncertain environment. Water Resour. Res. 1985, 21, 934–940. [Google Scholar] [CrossRef]
Rehana, S.; Rajulapati, C.R.; Ghosh, S.; Karmakar, S.; Mujumdar, P. Uncertainty Quantification in Water Resource Systems Modeling: Case Studies from India. Water 2020, 12, 1793. [Google Scholar] [CrossRef]
Box, G.E.P.; Hunter, W.G.; Hunter, J.S. Statistics for Experimenters—Design, Innovation, and Discovery; Wiley: New York, NY, USA, 2005. [Google Scholar]
Taguchi, G.; Chowdhury, S.; Wu, Y. Quality Engineering Handbook; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
Kumar, R.; Maurya, A.; Raj, A. Emerging technological solutions for the management of paper mill wastewater: Treatment, nutrient recovery and fourth industrial revolution (IR 4.0). J. Water Process Eng. 2023, 53, 103715. [Google Scholar] [CrossRef]
Anderson-Cook, C.M.; Lu, L.; Brenneman, W.; De Mast, J.; Faltin, F.; Freeman, L.; Guthrie, W.; Hoerl, R.; Jensen, W.; Jones-Farmer, A.; et al. Statistical engineering—Part 1: Past and present. Qual. Eng. 2022, 34, 426–445. [Google Scholar] [CrossRef]
Anderson-Cook, C.M.; Lu, L.; Brenneman, W.; De Mast, J.; Faltin, F.; Freeman, L.; Guthrie, W.; Hoerl, R.; Jensen, W.; Jones-Farmer, A.; et al. Statistical engineering—Part 2: Future. Qual. Eng. 2022, 34, 446–467. [Google Scholar] [CrossRef]
Schall, S. Statistical engineering: Synergies with established engineering disciplines. Qual. Eng. 2022, 34, 468–472. [Google Scholar] [CrossRef]
Chugani, N.; Kumar, V.; Garza-Reyes, J.A.; Rocha-Lona, L.; Upadhyay, A. Investigating the green impact of lean, six sigma and lean six sigma: A systematic literature review. Int. J. Lean Six Sigma 2017, 8, 7–32. [Google Scholar] [CrossRef]
Yadav, Y.; Kaswan, M.S.; Gahlot, P.; Duhan, R.K.; Garza-Reyes, J.A.; Rathi, R.; Chaudhary, R.; Yadav, G. Green Lean Six Sigma for sustainability improvement: A systematic review and future research agenda. Int. J. Lean Six Sigma 2023, 14, 759–989. [Google Scholar] [CrossRef]
Yadav, S.; Samadhiya, A.; Kumar, A.; Majumdar, A.; Garza-Reyes, J.A.; Luthra, S. Achieving the sustainable development goals through net zero emissions: Innovation-driven strategies for transitioning from incremental to radical lean, green and digital technologies. Resour. Conserv. Rec. 2023, 197, 107094. [Google Scholar] [CrossRef]
Elemure, I.; Dhakal, H.N.; Leseure, M.; Radulovic, J. Integration of Lean Green and Sustainability in Manufacturing: A Review on Current State and Future Perspectives. Sustainability 2023, 15, 10261. [Google Scholar] [CrossRef]
Abou-Shady, A. Recycling of polluted wastewater for agriculture purpose using electrodialysis: Perspective for large scale application. Chem. Eng. J. 2017, 323, 1–18. [Google Scholar] [CrossRef]
Sousa, M.R.S.; Lora-García, J.; López-Pérez, M.-F.; Santafé-Moros, A.; Gozálvez-Zafrilla, J.M. Operating Conditions Optimization via the Taguchi Method to Remove Colloidal Substances from Recycled Paper and Cardboard Production Wastewater. Membranes 2020, 10, 170. [Google Scholar] [CrossRef]
Fisher, R.A. Statistical Methods, Experimental Design, and Scientific Inference; Oxford University Press: Oxford, UK, 1990. [Google Scholar]
Stewart-Oaten, A.; Bence, J.R.; Osenberg, C.W. Assessing effects of unreplicated perturbations: No simple solutions. Ecology 1992, 73, 1396. [Google Scholar] [CrossRef]
Pagliari, P.H.; Ranaivoson, A.Z.; Strock, J.S. Options for statistical analysis of unreplicated paired design drainage experiments. Agr. Water Manag. 2021, 244, 106604. [Google Scholar] [CrossRef]
Pilar Callao, M. Multivariate experimental design in environmental analysis. Trends Anal. Chem. 2014, 62, 86–92. [Google Scholar] [CrossRef]
Hamada, M.; Balakrishnan, N. Analyzing unreplicated factorial experiments: A review with some new proposals. Stat. Sin. 1998, 8, 1–41. [Google Scholar]
Stone, R.A.; Veevers, A. The Taguchi influence on designed experiments. J. Chemometr. 1994, 8, 103–110. [Google Scholar] [CrossRef]
Derringer, G.; Suich, R. Simultaneous optimization of several response variables. J. Qual. Tech. 1980, 12, 214–219. [Google Scholar] [CrossRef]
Carlson, R.; Nordahl, A.; Barth, T.; Myklebust, R. An approach to evaluating screening experiments when several responses are measured. Chemom. Intell. Lab. Syst. 1991, 12, 237–255. [Google Scholar] [CrossRef]
Besseris, G.J. Concurrent multiresponse multifactorial screening of an electrodialysis process of polluted wastewater using robust non-linear Taguchi profiling. Chemom. Intell. Lab. Syst. 2020, 200, 103997. [Google Scholar] [CrossRef]
Breiman, L. Statistical modeling: The two cultures. Stat. Sci. 2001, 16, 199–231. [Google Scholar] [CrossRef]
Shafer, G.A. Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
Lenth, R.V. Quick and easy analysis of unreplicated factorials. Technometrics 1989, 31, 469–473. [Google Scholar] [CrossRef]
Ye, K.Q.; Hamada, M.J. Critical values of the Lenth method for unreplicated factorial designs. Qual. Tech. 2000, 32, 57–66. [Google Scholar] [CrossRef]
Vapnik, V. Statistical Learning Theory; Wiley-Interscience: New York, NY, USA, 1998. [Google Scholar]
Singha, S.; Pasupuleti, S.; Singha, S.S.; Singh, R. Prediction of groundwater quality using efficient machine learning technique. Chemosphere 2021, 276, 130265. [Google Scholar] [CrossRef]
Malviya, A.; Jaspal, D. Artificial intelligence as an upcoming technology in wastewater treatment: A comprehensive review. Environ. Technol. Rev. 2021, 10, 177–187. [Google Scholar] [CrossRef]
Hanoon, M.S.; Ahmed, A.N.; Fai, C.M.; Birima, A.H.; Razzaq, A.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Application of artificial intelligence models for modeling water quality in groundwater: Comprehensive review, evaluation and future trends. Water Air Soil Poll. 2021, 232, 411. [Google Scholar] [CrossRef]
George, M.; Blackwell, D.; Rajan, D. Lean Six Sigma in the Age of Artificial Intelligence: Harnessing the Power of the Fourth Industrial Revolution; McGraw-Hill: New York, NY, USA, 2019. [Google Scholar]
Besseris, G. Micro-Clustering and Rank-Learning Profiling of a Small Water-Quality Multi-Index Dataset to Improve a Recycling Process. Water 2021, 13, 2469. [Google Scholar] [CrossRef]
Besseris, G. Wastewater Quality Screening Using Affinity Propagation Clustering and Entropic Methods for Small Saturated Nonlinear Orthogonal Datasets. Water 2022, 14, 1238. [Google Scholar] [CrossRef]
Besseris, G. Datacentric Similarity Matching of Emergent Stigmergic Clustering to Fractional Factorial Vectoring: A Case for Leaner-and-Greener Wastewater Recycling. Appl. Sci. 2023, 13, 11926. [Google Scholar] [CrossRef]
Denceux, T. Decision-making with belief functions: A review. Int. J. Approx. Reason. 2019, 109, 87–110. [Google Scholar] [CrossRef]
Dempster, A.P. Upper and lower probabilities induced by a multi-valued mapping. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
Dempster, A.P. A generalization of Bayesian inference. J. Royal Stat. Soc. B 1968, 30, 205–247. [Google Scholar] [CrossRef]
Dempster, A.P. New methods for reasoning towards posterior distributions based on sample data. Ann. Math. Stat. 1966, 37, 355–374. [Google Scholar] [CrossRef]
Shafer, G. Constructive decision theory. Int. J. Approx. Reason. 2016, 79, 45–62. [Google Scholar] [CrossRef]
Shafer, G. A mathematical theory of evidence turns 40. Int. J. Approx. Reason. 2016, 79, 7–25. [Google Scholar] [CrossRef]
Shafer, G. Dempster’s rule of combination. Int. J. Approx. Reason. 2016, 79, 26–40. [Google Scholar] [CrossRef]
Smets, P. The combination of evidence in the transferable belief model. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 447–458. [Google Scholar] [CrossRef]
Smets, P.; Kennes, R. The transferable belief model, Artif. Intell. 1994, 66, 191–243. [Google Scholar]
Smets, P. Analyzing the combination of conflicting belief functions. Inform. Fusion 2007, 8, 387–412. [Google Scholar] [CrossRef]
Yager, R.R. Decision making under Dempster-Shafer uncertainties. Int. J. General Sys. 1992, 20, 233–245. [Google Scholar] [CrossRef]
Yager, R.R. Decision making using minimization of regret, Int. J. Approx. Reason. 2004, 36, 109–128. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Smola, A.J.; Scholkopf, B. A tutorial on support vector regression. Stat. Comp. 2004, 14, 199–222. [Google Scholar] [CrossRef]
R Core Team. R (Version 4.3.1): A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 16 June 2023).
Hintze, J.L.; Nelson, R.D. Violin plots: A box plot-density trace synergism. Am. Stat. 1998, 52, 181–184. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Ruts, I.; Tukey, J.W. The bagplot: A bivariate boxplot. Am. Stat. 1999, 53, 382–387. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Representation and combination of uncertainty with belief functions and possibility measures. Comp. Intel. 1988, 4, 244–264. [Google Scholar] [CrossRef]
Murphy, C. Combining belief functions when evidence conflicts. Decision Support Syst. 2000, 29, 1–9. [Google Scholar] [CrossRef]

Figure 1. Two-factor interaction plots of the four controlling factors (A–D) for (A) the average permeate flux, (B) the COD rejection rate, and (C) the cumulative flux decline.

Figure 2. QQ-plot replicate screening of the three quality characteristics: (1) average permeate flux (J_p), (2) COD rejection rate (COD), and (3) cumulative flux decline (SFD).

Figure 3. Violin plot replicate screening of the three quality characteristics: (A) average permeate flux (J_p), (B) COD rejection rate (COD), and (C) cumulative flux decline (SFD).

Figure 4. Bagplot screening of the three quality characteristics: (1) average permeate flux (J_p), (2) COD rejection rate (CODr), and (3) cumulative flux decline (SFD).

Figure 5. Feature importance scores that accompany the quadratic and optimizable support vector machine: (A) average permeate flux (J_p), (B) COD rejection rate (COD), and (C) cumulative flux decline (SFD) characteristic responses in combined replicate datasets (MATLAB R2024a).

Figure 6. “Stripes” design plot summary for the combined dataset of (A) average permeate flux (J_p), (B) COD rejection rate (COD), and (C) cumulative flux decline (SFD).

Figure 7. QQ-plot depictions of the combined dataset of (A) average permeate flux (J_p), (B) COD rejection rate (COD), and (C) cumulative flux decline (SFD).

Figure 8. Violin plot factorial screening for the average permeate flux (J_p) at all three preset levels.

Figure 9. Violin plot factorial screening for the COD rejection rate (COD) at all three preset levels.

Figure 10. Violin plot factorial screening for the cumulative flux decline (SFD) at all three preset levels.

Figure 11. Three-objective Pareto front generated by: (A) the “gamultiobj()” module and (B) the “paretosearch()” solver (MATLAB R2024a). On the axis, the labels stand for the minimized Objective(1)—1/J_p, Objective(2)—1/COD, and Objective(3)—SFD.

Figure 12. Three 2D Pareto front profiles generated by the “gamultiobj()” module (MATLAB R2024a): (A) Objective(1) vs. Objective(3), (B) Objective(2) vs. Objective(3), and (C) Objective(2) vs. Objective(1).

Figure 13. Three 2D Pareto front profiles generated by the “paretosearch()” module (MATLAB R2024a): (A) Objective(1) vs. Objective(3), (B) Objective(2) vs. Objective(3), and (C) Objective(2) vs. Objective(1).

Table 1. Pairwise correlations between the three quality characteristics: (1) average permeate flux (J_p), (2) COD rejection rate (COD), and (3) cumulative flux decline (SFD) (IBM SPSS v.29).

QC1 vs. QC2	Correlation	Count	Lower C.I.	Upper C.I.
Jp-COD	−0.219	27	−0.553	0.175
SFD-COD	−0.388	27	−0.669	−0.010
SFD-Jp	0.830	27	0.658	0.920

Confidence interval level set at 95%.

Table 2. Non-linear factorial effect screening for the three characteristics and the three replicated datasets using Lenth statistics.

	Quality Characteristic
Factorial Effects	J_p			COD			SFD
Replicate #	1	2	3	1	2	3	1	2	3	IER	EER
A	7.72	8.02	7.41	0.73	0.60	0.67	2.74	1.68	4.08	2.20	4.87
A²	3.53	2.41	0.95	0.28	0.07	0.08	0.89	0.06	1.13	2.20	4.87
B	0.36	0.42	0.77	1.38	0.83	1.12	1.04	0.65	0.85	2.20	4.87
B²	1.14	0.45	0.58	0.02	0.13	0.27	0.48	0.68	0.03	2.20	4.87
C	0.13	3.49	0.12	0.37	0.24	0.01	0.67	0.74	0.83	2.20	4.87
C²	0.45	0.89	0.75	0.60	0.74	0.67	0.34	0.21	0.51	2.20	4.87
D	6.71	7.01	6.55	2.00	1.64	1.67	3.16	1.81	4.10	2.20	4.87
D²	0.88	0.15	0.38	0.76	0.74	0.80	0.48	0.32	0.16	2.20	4.87

Table 3. Performance of the modeling based on linear regression (quadratic), quadratic support vector machine learning, and optimizable support vector machine learning for (1) average permeate flux (J_p), (2) COD rejection rate (COD), and (3) cumulative flux decline (SFD) characteristic responses in combined replicate datasets (MATLAB R2024a).

Characteristic response: average permeate flux (J_p)
Model type	Status	RMSE	MSE	R²	MAE	MAPE %	Hyperparameters
Linear regression	Trained	6.29	39.58	0.91	4.90	12.80	Terms—quadratic
SVM	Trained	8.07	65.11	0.86	6.73	17.20	Kernel function—quadratic Standardize data—yes
SVM	Trained	6.43	41.31	0.91	5.03	12.69	Optimized hyperparameters Kernel function—cubic Box constraint—8.2907 Epsilon—0.35806; standardize data—yes
Characteristic response: COD rejection rate (COD)
Model type	Status	RMSE	MSE	R²	MAE	MAPE %	Hyperparameters
Linear regression	Trained	1.90	3.61	0.91	1.61	3.76	Terms—quadratic
SVM	Trained	3.42	11.70	0.71	2.65	6.45	Kernel function—quadratic Standardize data—yes
Characteristic response: cumulative flux decline (SFD)
Model type	Status	RMSE	MSE	R²	MAE	MAPE %	Hyperparameters
Linear regression	Trained	0.63	0.40	0.91	0.48	9.24	Terms—quadratic
SVM	Trained	0.73	0.53	0.88	0.61	11.35	Kernel function—quadratic Standardize data—yes

Table 4. Regression coefficients (quadratic model) for the characteristic responses: (1) average permeate flux (J_p), (2) COD rejection rate (COD), and (3) cumulative flux decline (SFD).

	Quality Characteristic
	J_p		COD		SFD
Coefficients	Estimate	p-Value	Estimate	p-Value	Estimate	p-Value
Intercept	57.37	<0.001	38.71	<0.001	5.53	<0.001
A	18.04	<0.001	2.01	<0.001	1.58	<0.001
A²	−8.83	<0.001	−0.4	0.461	0.54	0.014
B	−0.18	0.886	3.29	<0.001	−0.51	<0.001
B²	−290	0.191	0.79	0.155	−0.49	0.024
C	−2.62	0.047	−0.56	0.083	0.46	<0.001
C²	2.85	0.198	3.58	<0.001	0.11	0.591
D	15.81	<0.001	−5.32	<0.001	1.7	<0.001
D²	1.54	0.479	4.07	<0.001	0.36	0.09
Multiple-Adjusted R²	0.94		0.96			0.95

Table 5. Mass function estimations for the three sources (average permeate flux, COD rejection rate, and cumulative flux decline) and four combination rule criteria at two discounting rates for the third source.

Discounting Rate	Mass Function Element	Dempster–Shafer Rule	Yager’s Rule	Smets Rule	Dubois–Prade Rule
0.5	1	0	0	1	0
	2	NaN	0	0	0
	3	NaN	0	0	0
	4	NaN	1	0	1
0.9	1	0	0	1	0
	2	NaN	0	0	0
	3	NaN	0	0	0
	4	NaN	1	0	1

Table 6. Mass function estimations for the two sources (average permeate flux and COD rejection rate) and four combination rule criteria.

Mass Function Element	Dempster–Shafer Rule	Yager’s Rule	Smets Rule	Dubois–Prade Rule
1	0	0	1	0
2	NaN	0	0	0
3	NaN	0	0	0
4	NaN	1	0	1

Table 7. Mass function estimations for the two sources (COD rejection rate and cumulative flux decline) and four combination rule criteria at two discounting rates for the third cumulative flux decline.

Discounting Rate	Mass Function Element	Dempster–Shafer Rule	Yager’s Rule	Smets Rule	Dubois–Prade Rule
0.5	1	0	0	0.335	0
	2	0	0	0	0
	3	1	0.665	0.665	0.665
	4	0	0.335	0	0.335
0.9	1	0	0	0.603	0
	2	0	0	0	0
	3	1	0.397	0.397	0.397
	4	0	0.603	0	0.603

Table 8. Mass function estimations for the two sources (average permeate flux and cumulative flux decline) and four combination rule criteria at two discounting rates for the third cumulative flux decline.

Discounting Rate	Mass Function Element	Dempster–Shafer Rule	Yager’s Rule	Smets Rule	Dubois–Prade Rule
0.5	1	0	0	0.297	0
	2	1	0.835	0.703	0.835
	3	0	0.165	0	0.165
	4	0	0	0	0
0.9	1	0	0	0.165	0
	2	1	0.703	0.835	0.703
	3	0	0	0	0
	4	0	0.297	0	0.297

Table 9. Tests of normality for the combined dataset of (A) average permeate flux (J_p), (B) COD rejection rate (COD), and (C) cumulative flux decline (SFD) (IBM SPSS v.29).

	Kolmogorov–Smirnov Test ^a			Shapiro–Wilk Test
	Statistic	Df	Sig.	Statistic	Df	Sig.
J_p	0.096	27	0.200 *	0.980	27	0.860
COD	0.082	27	0.200 *	0.968	27	0.549
SFD	0.124	27	0.200 *	0.947	27	0.182