2.1. Influence of Input Variables on the Responses
A scatterplot graph is a significant plot in statistical analysis as it measures the effect of each process variable on the output response. With each process variable comprising three coded values (−1, 0, 1), a response versus process variable graph was plotted for each variable (
Figure 1). The relationship between the process variable and the response was first assumed to be a linear model. Thus, the gradient of the plots may be used to represent the correlation. A correlation with a positive value signifies that a direct proportionality exists between the process variable and the response output, whereas a correlation with a negative value indicates that the process variable is indirectly proportional to the response output. The correlations (displayed on the top left corner of the graphs) for magnetite nanoparticle dosage, temperature, and voltage supply were −0.100 (
Figure 1a), 0.215 (
Figure 1b), and −0.055 (
Figure 1c), respectively. Therefore, increasing either magnetite nanoparticle dosage or voltage supply resulted in a decrease in biogas yield, whereas a rise in temperature was followed by a rise in biogas yield.
The increasing order of the absolute values of the correlations revealed the following: temperature (0.215) > magnetite nanoparticles dosage (0.100) > voltage supply (0.055). Therefore, these results indicated that temperature had the highest effect on biogas yield, whereas voltage supply had the least effect on biogas yield. Similarly, %COD removed and current density were mostly influenced by temperature; their respective correlations (not displayed on the figures) were 0.324 and 0.243 for temperature, −0.222 and −0.177 for magnetite nanoparticle dosage, and −0.148 and −0.045 for voltage supply, respectively.
These results offer a starting point for statistical analysis. The assumption that was made in this section was that the data points resemble a linear model, however, this had to be verified using the fit summary analysis.
2.2. Regression Model and Fit Summary
Response transformation is a significant tool of any data analysis. Response power transformation in statistical analysis encompasses the use of several mathematical functions to the response/s. The mathematical functions that were investigated by the Design Expert software for transformation were: no transformation, natural log, inverse square root, square root, base 10 log, logit, inverse, arcsine square root, and power. The transformations for biogas yield, COD removal, and current density models are shown in
Table 1. The power transform depends on the maximum response/minimum response ratio [
20]. A maximum response/minimum response ratio above 10 normally indicates that the chosen mathematical function must be transformed. On the other hand, a ratio below 3 indicates that the transformation has a small effect [
20]. The minimum responses for biogas yield, %COD removed, and current density were 435.0 mL/g VS
fed, 53.0%, and 12.0 mA/m
2, whereas the maximum responses were 568.0 mL/g VS
fed, 98.0%, and 26.8 mA/m
2, respectively. Thus, the maximum response/minimum response ratios for biogas yield, COD removal, and current density were 1.31, 1.85, and 2.23, respectively. For all models, the maximum response/minimum response ratios were below 3, and this indicates that no transformation was required and that transforming the output variables would not make much difference.
With no transformation required on the responses, the next step is to determine the type of models. The Design Expert software presents several useful statistical tables that can be used to identify the type of model to select for extensive study. An important table in statistical analysis that is useful for selecting the model type is the fit summary table. The fit summary section collects the significant statistical parameters used to select the starting point for the actual model.
Table 2,
Table 3 and
Table 4 show fit summaries for biogas yield, %COD removed, and current density, respectively. Linear, cubic, quadratic, and two-factor interaction (2FI) models are the type of models that were investigated. In statistical analysis, the model with the highest
p-value, F-value, predicted R
2, and adjusted R
2 is regarded as the model that will best fit the data points [
21]. For all responses, the quadratic model had the highest lack-of-fit
p-value and predicted R
2 values. On the other hand, the cubic model had the greatest adjusted R
2 values, followed by the quadratic model. Therefore, these results suggested that the best fit model was either the quadratic model or the cubic model. However, the cubic model was aliased since there are no sufficient unique design points to predict all coefficients of the model. In fact, if this model was chosen, the least-squares estimations would not be unique, resulting in 2D plots with shapes that are misleading. For this reason, the quadratic model was selected in this investigation, and because most statistical values of this type of model were the highest.
2.3. Analysis of Variance (ANOVA) Test and Fit Statistics
The analysis of variance (ANOVA) test is one of the most significant tests in statistical analysis. The ANOVA test helps a researcher to observe the selected effects and the coefficients of the model [
20].
Table 5,
Table 6 and
Table 7 depict the ANOVA outputs for biogas yield, %COD removed, and current density, respectively. F-value, Probability > F, coefficient of determination R
2, and lack-of-fit are statistical values that assess how the chosen regression model best fits the investigational data points [
16]. An F test is used to ascertain the significance of the means between operating conditions. The F-values of 886.81, 161.67, and 175.87 denoted that the regression models were significant for biogas yield, %COD removed, and current density. The model of biogas yield had the highest F-value of 886.81, which denoted that it was the most robust model. The
p-value is another important statistical parameter that is strongly connected with the F-value. In statistical analysis, a
p-value denotes the probability for the regression model [
20]. From the ANOVA tables, the
p-values were less than 0.0001 for biogas yield, %COD removed, and current density. Therefore, for the biogas yield, %COD removed, and current density models, there is <0.0001% probability that F-values this large (886.81, 161.67, and 175.87) are likely due to noise. The null hypothesis has to be discarded if the
p-value is significantly small; in other words, a
p-value below the alpha value (∝) [
21,
22]. This investigation was taken at a confidence interval (%CI) of 95%, thus, the alpha value was
. Therefore, a
p-value below 0.05 is regarded as significant, whereas a
p-value greater than 0.1 is not significant and thus should be ignored. For biogas yield and %COD removed, the
p-values for the terms A, B, C, AC, A
2 B
2, and C
2 were below 0.05, suggesting that these model terms will have a significant effect if added to the models. On the other hand, the values that were below 0.05 for current density were A, C, AC, A
2 B
2, and C
2. For biogas yield and %COD removed, the terms with probability values above 0.10 were AB and BC, whereas for current density, the terms were B, AB and BC. These terms were not significant and therefore had to be removed from the models.
Another statistical term that is important in ANOVA is the lack-of-fit test. The selected model should have insignificant lack-of-fit [
20]. This condition occurs when the
p-value is greater than 0.10. For all models, the lack-of-fit values were above 0.10, which means the proposed regression models fit well.
Equations (1)–(3) show model equations (in coded form) for biogas yield (mL/g VS
fed), COD removed (%), and current density (mA/m
2), respectively [
23]:
The model equations, expressed in terms of actual input variables, for biogas yield (mL/g VS
fed), COD removed (%), and current density (mA/m
2) are shown in Equations (4)–(6), respectively:
Another significant table in statistical analysis is the fit statistics table (
Table 8). The table contains important statistical terms, namely standard deviation, mean, coefficient of variation, coefficient of determination (R
2), adjusted R
2, predicted R
2, adequate precision, and PRESS. In statistical analysis, the R
2 term is used to calculate how well the proposed regression model fit the investigational data points [
24]. Essentially, the R
2 coefficient measures the percentage of change of the response variable (y) in the neighborhood of y that is described by the regression model. The coefficient of determination lies between 0 and 1. A value that is approximately equal to 1 is recommended, as it denotes that the regression model is the best fit. The model of the biogas yield was more robust than that of %COD removed and that of maximum current density, revealing a significantly high R
2 of 0.9994.
However, one of the disadvantages of an R
2 value is that, even if the process variable is substantial, the addition of a process variable to the regression model always increases the value of R
2. Thus, most statisticians prefer the adjusted R
2 over the traditional R
2 [
24]. One of the most important advantages of the adjusted R
2 is that it is not increased by the addition of a process variable. Similarly to the R
2, the adjusted R
2 of biogas was the greatest (0.9982), further proving that it is the most robust model.
The predicted R2 is another important parameter in fit statistics, which indicates the estimated coefficient of determination for the proposed regression model. The predicted R2 values for biogas yield, %COD removed, and current density were 0.9903, 0.9480, and 0.9658, while the adjusted R2 values were 0.9982, 0.9904, and 0.9912, respectively. The respective difference between the predicted R2 and adjusted R2 were 0.0079, 0.0424, and 0.0432. The differences were less than 0.2, which suggested reasonable agreement. In other words, there was no problem with either the experimental data or the regression models.
The statistical term adequate precision evaluates the limits of the estimated output to the predicted error; in other words, the ratio of signal/response-to-noise [
21]. A high adequate precision denotes an extremely high difference between the estimated response output and the accompanying error. Generally, an adequate precision that is above 4.0 denotes adequate model discrimination [
21]. In this investigation, the adequate precisions for biogas yield, %COD removed, and current density were 88.55, 41.67, and 41.29, respectively. These values were above 4.0, which indicated that the model discrimination was satisfactory. Therefore, the selected model expressions can be used to navigate the design spaces of the responses because the predicated response outputs are less influenced by error. The highest adequate precision of 88.55 was found on the model of biogas yield. This means that this model had the highest signal to noise ratio, which further proves that it is the most robust model.
The coefficient of variation denotes the standard deviation represented as a percentage of the mean of the response variable [
20]. Essentially, the coefficient of variation is used to measure the dispersion of experimental data points around the mean. A low value is recommended since it indicates a more authentic model equation. The coefficient of variation of the biogas yield, %COD removed and current density models were 0.3822%, 1.83%, and 2.37%, respectively. These values were low, with the biogas yield revealing the lowest value of 0.3822%, thus indicating a more robust model. Thus, the standard deviation of biogas yield (1.86) was extremely low compared to its associated mean of 485.37, which denotes that its regression model was more satisfactory than the other models.
Predicted residual error sum of squares (PRESS) is used to calculate the error variation. Specifically, this statistical coefficient illustrates how the variation in the process variable in a model equation cannot be described by the regression model [
20]. Generally, a very low PRESS indicates that the regression model is the best fit. On the other hand, a high PRESS is an indication that the model equation is not the best fit. The PRESS values for biogas yield, %COD removed, and current density were 2.18, 1.82, and 0.7.80, respectively. These values were low, which suggested that each of the models was a best fit.
2.4. Validation of the Models
Diagnostic tests are extremely important in statistics to confirm that the assumptions for the ANOVA test are met; the regression models must be validated [
21]. The predicted versus actual plot (
Figure 2) is one of the useful plots for model validation. The response values of the predicted versus actual plot should be scattered along the 45° line [
21]. As is evident from the figures, the points of biogas yield (
Figure 2a), %COD removal (
Figure 2b), and current density (
Figure 2c) were scattered along the 45° line, indicating that all regression models were able to reasonably predict the experimental data points.
The leverage plot is another useful graph in model validation (
Figure 3). The average leverage (
) may be defined by Equation (7):
where
denotes the number of model coefficients, and
is the total number of experimental runs.
In this investigation, the value of
was 9 and that of
was 15; therefore, according to Equation (7), the average leverage was 0.6. An experimental run with a leverage value greater than two times
or a leverage value of 1 is regarded as having a very high value of leverage [
21]. From the biogas yield (
Figure 3a), %COD removed (
Figure 3b), and current density (
Figure 3c) graphs, there is no leverage value above 1 or greater than 2 × 0.6 = 1.2, denoting that the leverage values were not too high.
The last diagnostic tool that was used to validate the models was the externally studentized residuals versus run number graph (
Figure 4). The externally studentized residuals versus run number plot portrays the number of t-values in an investigational run that fall off from the rest of the data points; in other words, “Outliers” [
20]. The graph has boundary lines that are dependent on the degree of freedom as well as the tail value. Equation (8) may be used to calculate the tail value:
The alpha value () is 0.05 while the total number of experimental runs () is 15. Therefore, according to Equation (8), the tail value is 0.0017.
The degree of freedom for residual (DOF) may be obtained from Equation (9):
where n is the number of process variables.
In this investigation, the value of n was 3. Therefore, according to Equation (9), the degree of freedom for residual is 5.
According to the one-tailed/two-tailed t-table [
21], at a degree of freedom of 5 and
of 0.0017, the residual limit (
) is
.
From the graphs of biogas yield (
Figure 4a), %COD removed (
Figure 4b), and current density (
Figure 4c), all externally studentized residuals were within the residual limits (
), suggesting that the responses need not be transformed. Therefore, all investigational data points fitted well by the models, which suggests that there was no problem with either the experimental data points or the regression models.
2.5. Surface Graphs
A three-dimensional surface (3D) plot is a projection of a contour plot. A 3D plot makes it simpler to see the actual representation of a model. The effect of the process variables on biogas yield, %COD removed, and current density is depicted in
Figure 5.
Figure 5a,b display 3D graphs for biogas yield of voltage supply and temperature (BA), and magnetite nanoparticle dosage and temperature (CA), respectively. The results show that biogas yield firstly increases with either an increase in temperature or voltage supply or magnetite nanoparticle dosage. Temperature is essential in anaerobic digestion since it enhances the metabolic rate, and thus biogas yield. According to Ohm’s law, voltage is directly proportional to current. Therefore, increasing voltage will increase current density and electrical conductivity, and hence biogas yield. On the other hand, magnetite nanoparticles can help to enhance the interspecies electron transfer between archaea and microorganisms, which improves biogas yield. However, biogas yield will reach the maximum value of 563.0 mL/g VS
fed when temperature, voltage supply, and magnetite nanoparticle dosage are at about 32 °C, 0.77 V, and 0.53 g, respectively, which is at a medium level. After passing the optimum conditions, biogas yield then decreased with an increase in any of the process variables. Essentially, the activity of microorganisms decreases as the temperature of the system becomes higher, which as a result slows down the microbial production of protons and electrons [
25]. Therefore, biogas yield decreased above 32 °C. On the other hand, high voltage results in water hydrolysis, which leads to hydroxides and oxygen, which hinder the digestion process [
26]. Therefore, biogas yield was reduced after 0.77 V. With regards to magnetite nanoparticle dosage, higher dosages generally result in toxicity that hinders the activity of the methanogenesis stage, which is why above 0.53 g, biogas yield decreased.
From
Figure 5c,d, the optimum %COD removed was approximately 97% and was achieved at a temperature of 32 °C, voltage supply of 0.77 V, and magnetite nanoparticle dosage of 0.53 g. Likewise, current density had the highest value of approximately 26 mA/m
2 (
Figure 5e,f), which occurred at the same optimum operating conditions as for biogas yield and %COD removal. Most researchers have found the optimum temperature value to be 35 °C for a bioelectrochemical system [
27,
28], which is higher than what was obtained, hence, their systems employed more energy. According to Xu et al. [
4], the optimum voltage supply is 0.8 V. The optimum value that was obtained from this investigation was closer to this value.
Even though optimum conditions have been found for biogas yield, %COD removed, and current density using a graphical method, the optimum conditions have to be confirmed by means of a numerical optimization method.
2.6. Numerical Optimization Ramps and Desirability
The last step in this investigation was to find the optimal solution using a numerical method. Numerical optimization ramps were used to obtain the optimum temperature, voltage supply, and magnetite nanoparticle dosage with the aim of optimizing the response variables. The possible goals were set to maximize the responses. Unlike a five-pluses method that requires one goal to be important, all goals were equally important in this investigation, and therefore, the goal importance was set to three pluses (+++) [
21]. According to
Figure 6, at optimal process conditions of temperature (32.2 °C), voltage supply (0.77 V), and magnetite nanoparticle dosage (0.53 g), the Design Expert software revealed that the optimum values for biogas yield, %COD removed, and maximum current density were 563.02 mL/g VS
fed, 97.52%, and 26.05 mA/m
2, respectively. At the optimum conditions, the results revealed a maximum combined desirability of 89.9%.
Our previous studies on the synergistic effect of the bioelectrochemical system and magnetite nanoparticles were conducted at 40 °C and at a magnetite dosage of 1 g [
2,
5,
19] as shown in
Table 9. Out of these studies, the best performing bioelectrochemical system was a MEC with cylindrical electrodes, which revealed the highest biogas yield of 548.0 mL/g VS
fed [
19]. However, the current investigation had a higher optimum biogas yield of 563.02 mL/g VS
fed (than our previous investigations), which was found at a lower temperature of 32.2 °C and lower magnetite dosage of 0.53 g. Thus, the current study has improved the performance of the bioelectrochemical system while reducing energy usage and magnetite consumption.
Figure 7 depicts the response surface of desirability performance in terms of A: temperature, B: voltage supply, and C: magnetite nanoparticle dosage.
Figure 7a,b prove that the desirability value increased with an increase in temperature for values between 25 and 32.2 °C. This is due to the fact that, by increasing the temperature from atmospheric values to mesophilic values, the electrical conductivity of a solution is improved, which then enhances the flow of electrons and protons, and hence, biogas production and contaminants removed. Thus, the desirability is increased by an increase in temperature. However, if the temperature is above 32.2 °C, certain other factors, such as instability and low microbial activity, should be taken into consideration, as they reduce the performance of a MFC [
29].
On the other hand, the vertical axis of
Figure 7a and the horizontal axis of
Figure 7c shows desirability as a function of voltage supply. It is evident from the graphs that, in general, if the amount of voltage supply is increased for values between 0.3 and 0.77 V, the desirability is increased. However, the graph approaches the optimum value of 0.77 V and undergoes a transformation; the desirability performance begins to drop with an increase in voltage supply. Generally, pH divergence is evident at a high voltage supply and the activity of the bioanode is usually inhibited due to the formation of toxic substances, thus reducing the overall performance of a MEC [
26].
The
y-axis of
Figure 7b,c depicts the desirability of the system for maximum magnetite nanoparticle dosage. As is evident from the graphs, for magnetite nanoparticles dosage between 0.1 and 0.50 g, an increase in dosage results in an increase in desirability. The figures confirm that after 0.50–0.53 g magnetite nanoparticle dosage, desirability begins to decrease, possible due to inhibition.
Experimental work was then executed at the optimum process conditions (32.2 °C, 0.77 V, and 0.53 g) with the aim of validating the results of the predicted models. A deviation of below 4% was obtained when comparing the two results, which indicated that the results of the models were more or less the same as the experimental data. Therefore, the estimated model equations can be used for the synergism of a microbial electrolysis cell and magnetite nanoparticles at any combinations of temperature, voltage supply, and magnetite nanoparticle dosage that lie within the investigated ranges.