1. Introduction
The rising demand for freshwater, stemming from burgeoning population numbers, expansive industrial endeavors, and rapid urban growth, has driven us into a global water sustainability crisis [
1,
2]. This crisis manifests in several ways: deteriorating freshwater resources due to over-exploitation of surface waters and groundwater reserves, protracted periods of drought impaired by climate shifts, the encroachment of seawater into freshwater reserves, accelerated depletion of groundwater levels, and increased salinization and contamination of our freshwater bodies. Such precarious circumstances jeopardize both national and global water security [
3]. As a response to these pressing challenges, the world is increasingly turning to the desalination of brackish water and saltwater as primary sources of a dependable water supply [
4]. This process involves the extraction of salt from water, making it suitable for diverse uses such as agricultural irrigation, industrial processes, and daily household needs [
5,
6]. Highlighting the immense promise of this technique, the United Nations put forward that saltwater desalination can augment our water reserves beyond what the natural hydrological cycle can offer.
Addressing the escalating need for freshwater, the primary strategies being adopted are desalination through thermal methods and membrane-based separation techniques [
5]. Digging deeper into these methods, two prominent processes emerge: reverse osmosis (RO)—a membrane-based technology, and multistage flash distillation (MSF)—a heat-driven purification approach [
7]. Thermal desalination, primarily using the MSF method, is frequently lauded for its efficiency, particularly when handling water with a high salt concentration. The resulting water or ‘permeate’ from this method is of superior quality [
4]. On the other hand, the global desalination landscape is dominated by the RO system, with a staggering 90% of facilities around the world harnessing this technique. The RO method leverages semi-permeable membranes to sieve out salts and other unwanted minerals from water [
8]. Its popularity is not unwarranted; RO is recognized for its economical nature, impressive salt rejection rates, and the high quality of the resultant water. Its adaptability is another selling point—RO systems have found homes in diverse environments, ranging from standard water and sewage treatment plants to cutting-edge wastewater recovery and reuse initiatives. Nonetheless, the RO method is not without its challenges. Factors such as the requirement for high feed pressures; considerable energy and cost expenditures; the need for periodic membrane maintenance; and fluctuations in temperature, energy, and costs depending on the time of day or season, continually push researchers and technologists to innovate and improve RO systems [
7].
Membrane fouling has long been identified as one of the fundamental challenges associated with the RO desalination technique. However, recent studies suggest that this problem can be considerably mitigated, if not wholly resolved, by integrating the RO process with nanofiltration (NF), especially in the realm of seawater desalination. The incorporation of NF into RO systems substantially enhances the efficiency of the desalination process by actively reducing the presence of organics, pollutants, and ionic strength, and by softening the water. This combined approach also leads to a notable decrease in the overall expenditure tied to desalination [
9,
10]. The appeal of NF extends beyond its capacity to complement RO. Researchers laud its multiple benefits, such as cost-effective operation and maintenance, impressive water throughput (flux), operation at reduced pressures, and its relatively low installation expenses [
10]. That said, a singular reliance on NF is not without its drawbacks. While it can treat various water impurities effectively, NF in isolation falls short in adequately reducing the high salinity inherent in seawater to produce potable water. This shortcoming is where the combined might of NF and RO shines through, achieving better desalination results [
11]. Emerging research in the realm of membrane technologies, be it NF, RO, or innovative hybrid separation processes, consistently underscores that energy consumption stands as the most daunting challenge in their operational spectrum. It is evident that the desalination processes are composed of multifaceted components like permeate rate, conductivity, recovery, rejection rate (RR), permeate quality (PQ), pH (potential hydrogen) levels, energy consumption, and associated costs. In a pursuit to optimize these factors, various experimental undertakings have advocated for blended solutions, such as the NF–RO combination and the NF–SWRO–MSF hybrid process. In these innovative configurations, NF primarily functions as an efficient pre-treatment stage, preparing the ground for the more intensive RO desalination process [
12,
13].
Generally, membrane processes in desalination have predominantly been based on linear or polynomial correlations, even though these methods may generalize the complex dynamics of the desalination procedure [
14,
15,
16]. In contrast to these mathematical and traditional paradigms, which often settle for lower precision, the modern technological landscape, marked by the beginning of Industry 4.0 and the growing field of the Industrial Internet of Things (IoT), has seen artificial intelligence (AI) algorithms making substantial inroads into various industrial spheres [
17]. Rather than the novel computing methodologies which prioritize absolute accuracy and unerring truth, the world of soft computing embraces the nuances of imprecision, accepts degrees of truth, grapples with uncertainty, and is amenable to approximations in order to accomplish a specific objective [
18,
19,
20]. This adaptive approach has been validated using cutting-edge studies, which reveal that sophisticated intelligent systems possess the prowess to assimilate experimental data with impeccable accuracy in the realm of desalination [
4,
21,
22,
23,
24,
25]. Nevertheless, the journey of integrating data-driven algorithms into the ecosystem of desalination processes, including RO and NF, has not been devoid of hurdles. Artificial Neural Networks (ANNs), although revolutionary, have come under scrutiny for certain limitations: their propensity for overfitting, challenges with plant-specific performance optimization, the intricacies of their internal tuning parameters, and concerns over data homogeneity and inadequate neuron configurations [
26].
Achite et al. [
27] introduced a hybrid model known as the M5-Gorilla Troops Optimizer (GTO), built upon a blend of the M5 and GTO algorithms. This research employed nine diverse parameters including raw water production (RWP), water turbidity, conductivity, TDS, salinity, pH, water temperature (WT), SM, and O2 as inputs for CD modeling. The comparative analysis highlighted in the study robustly positions the M5-GTO model as superior in CD modeling accuracy against numerous established models such as multiple linear and nonlinear regression, an artificial neural network, multivariate adaptive regression splines, an M5 model tree, k-nearest neighbor, a least-squares support vector machine, a general regression neural network, and random forest (RF). This was exemplified by the recorded values of various error metrics and correlation coefficient criteria for the M5-GTO, all demonstrating significant improvements over the least effective algorithm, the LSSVM. The findings further ranked M5 and RF algorithms second and third, respectively. Moreover, the research provided insight into the significant impact of RWP and WT on CD alterations, showcasing the inverse relationship of RWP with CD and the direct relation of WT with CD. In the study conducted by Firzin et al. [
28], a novel BLSTM-HHO algorithm was introduced for an improved groundwater table (GWT) and drought predictions. This innovative algorithm notably outperformed other benchmark algorithms such as the standalone BLSTM, LSTM, ANN, SARIMA, and ARIMA in terms of prediction accuracy and performance criteria. Similarly, in the research conducted by Yan et al. [
29] an innovative approach to predicting influent ammonia nitrogen (NH3-N) concentration in wastewater treatment plants was proposed, leveraging the synergistic capabilities of the rolling decomposition method and deep learning algorithms. The integrated model delineated in the study notably circumvents information leakage during the decomposition process, showcasing enhanced performance compared to standalone GRU models as evidenced by reduced RMSE, MAE, and MAPE values. The study robustly underscores the superiority of the proposed model over other integrated models trained with information leakage, highlighting notable strides in prediction accuracy. Recognizing these challenges, the scientific community has proposed an advanced model called the long short-term memory (LSTM) model. This model is further strengthened by integrating a powerful metaheuristic optimization algorithm—the crow search algorithm (CSA)—promising enhanced performance and adaptability in desalination processes.
In an effort to handle challenges faced in the field of desalination modeling, we sought to leverage the advanced capabilities of the LSTM network. The primary focus of this study was to refine the model’s performance, particularly for the uncertainty analysis and prediction of the hybrid NF–RO desalination process. To realize this aim, we employed a state-of-the-art metaheuristic optimization technique, the CSA. A noteworthy novelty of our methodology was the inclusion of the Particle Swarm Optimization (PSO) algorithm. This algorithm was instrumental in recognizing the optimal combination of features, serving as a foundation to our in-depth modeling, statistical evaluations, and graphical analysis. Such a strategic approach was intended to maximize the accuracy of the hybrid algorithms (LSTM-CSA) as well as the LSTM in its single form. For modeling purposes, the six most important parameters were accurately sourced from seventy-three distinct data points. The chosen independent parameters covered a diverse range: temperature (°C), time duration (h), pressure levels (kg/cm2), feed flow rate (m3/h), and feed conductivity (μs/cm). On the other hand, our target or dependent parameter was identified as the permeate conductivity (μS/cm). Through the feature selection insights collected from the PSO algorithm, two distinct datasets were crafted. The result of our study is expected to show a comprehensive evaluation of these models, with both training and testing datasets subjected to performance metrics like the root mean square error (RMSE) and the mean absolute error (MAE). The finding was that explaining models powered by the metaheuristic algorithms notably outperformed the singular LSTM model in terms of accuracy.
It is worth mentioning that the use of deep learning in the desalination process, as illustrated in the given abstract and introduction, holds significant promise in revolutionizing the efficacy and sustainability of water desalination [
6,
30,
31,
32]. The application of LSTM networks integrated with an optimized metaheuristic CSA in the hybrid NF–RO desalination process enhances the overall performance, offering notable advantages. One of the primary benefits is the increased accuracy in predicting the desalination process outcomes, with substantial reductions in root mean square error (RMSE) and mean absolute error (MAE), demonstrating the reliability of AI models. This integration also optimizes energy usage, enabling the identification and exploitation of energy-saving opportunities for more sustainable operation. Additionally, AI contributes to the development of advanced brine treatment techniques, minimizing waste and maximizing resource utilization by facilitating the extraction of valuable resources from the brine. The models accurately evaluate and mitigate the uncertainty associated with predictions, ensuring more consistent and reliable desalination results. The comprehensive optimization of various desalination factors, including permeate rate and conductivity, underscores the significant enhancement AI brings to water treatment and desalination, addressing and overcoming many challenges inherent in traditional methods [
33,
34,
35].
The goals and objectives of this research are primarily centered around tackling global water scarcity by enhancing desalination processes, specifically focusing on the hybrid NF–RO process. This research aims to develop a model for evaluating the performance of NF–RO based on permeate conductivity, utilizing deep LSTM integrated with an optimized metaheuristic CSA and LSTM-CSA. By adopting uncertainty Monte Carlo simulation, this study evaluates the uncertainty attributed to prediction, aiming to prove the reliability of both LSTM and the LSTM-CSA in terms of various performance statistical criteria. This research further addresses the key challenges faced in the RO desalination technique, aiming to enhance its efficiency by integrating it with NF, and subsequently optimizing various parameters involved in desalination processes. This study also seeks to leverage the advanced capabilities of the LSTM network and the CSA to refine the model’s performance, particularly for the uncertainty analysis and prediction of the hybrid NF–RO desalination process. This paper is committed to maximizing the accuracy of the hybrid algorithms and providing a comprehensive evaluation, contributing to the optimization of energy usage, identification of energy-saving opportunities, and the development of more sustainable operating strategies in desalination processes.
2. Experimental Methods
In this study, we aimed to model a complex hydro-environmental system using six parameters, which were sampled at seventy-three different points. The independent variables consisted of temperature (°C), time (h), pressure (kg/cm
2), flow feed (m
3/h), and conductivity feed (μS/cm), while the dependent variable was permeating conductivity (PC) (μS/cm). The data were obtained from the Saline Water Desalination Research Institute (SWDRI) of the Saline Water Conversion Corporation (SWCC), Saudi Arabia [
36], and open-source data were collected from [
8]. Refer to [
36] for details of the experimental analysis and set-up discussion; the diagram is presented in
Figure 1a. An essential stage in developing deep learning models is the processing of gathered data and the selection of the right model combination. The intricacy of these built models can be tackled by employing a multiple heuristic approach such as the CSA and understanding the extent of AI systems to manage complex and nonlinear-system-like desalination.
In this work, we employed a Monte Carlo simulation (
Figure 1b) to investigate the uncertainty, and the PSO metaheuristic algorithm was used to solve this problem. In
Figure 1c is the objective function for selecting optimal features using the PSO algorithm. Further pre-processing using normalization to improve the accuracy of the prediction was carried out. The model development also takes into consideration 5k-fold cross-validation to provide a more comprehensive insight into how the model performs across different subsets of data. The 5k-fold cross-validation provides a more reliable assessment of how a model will perform on unseen data by using different data subsets for training and validation [
37]. It is widely used in ML to tune hyperparameters and assess the generalization capability of models.
Figure 2 presents the descriptive statistics for visualization that provide a summary of the main aspects of the data, offering a snapshot of their main characteristics [
37]. It is often the first step in data analysis, simplifying large amounts of data in a sensible way. In the scenario described in
Figure 2a indicating the temperature in an NF/RO system to show a sharp drop, several factors could be responsible, and it is crucial to examine the experimental setup and conditions to determine the exact cause. It might be related to an experimental error, where an issue with the equipment, such as a malfunctioning temperature sensor, could give inaccurate readings. Another possibility could be a sudden change in the system parameters or operating conditions, such as a sudden influx of feed water at a different temperature or a change in the ambient conditions. An unexpected drop in temperature could also relate to the system’s performance, indicating potential issues or inefficiencies within the NF–RO process itself, such as unexpected heat loss. It is crucial to thoroughly investigate the system, review the experimental procedure, and check the equipment to accurately diagnose the reason behind the sudden temperature drop.
In this paper, deep learning and CSA are applied to improve the performance and efficiency of hybrid NF–RO desalination processes. Specifically, a LSTM network, a type of recurrent neural network, is used to model the desalination process. The LSTM network is adept at learning from sequences of data, which makes it suitable for modeling complex processes such as desalination where numerous variables interact over time. LSTM is integrated with the CSA, a metaheuristic optimization algorithm. The CSA is inspired by the behavior of crows and is used to optimize the parameters of the LSTM network, ensuring that the model makes the most accurate predictions possible. Before developing this LSTM-CSA model, an uncertainty analysis is conducted using a Monte Carlo simulation to evaluate potential uncertainties related to the predictions made by the model. This innovative approach utilizing both LSTM and CSA is designed to enhance the prediction of the performance of the NF–RO desalination process based on permeate conductivity, a key parameter in assessing the effectiveness of the desalination process. This research aims to optimize energy usage and suggest more sustainable operating strategies for desalination, ultimately leading to enhanced water recovery rates, reduced energy consumption, and improved overall efficiency of the desalination process. In a more technical understanding, the LSTM network is trained using historical data from the desalination process, learning to identify patterns and relationships that are not easily discernible. It then uses this learning to make predictions about the performance of the desalination process under various conditions. The CSA is applied to fine-tune the parameters of the LSTM network, ensuring that it operates as effectively as possible. The integration of deep learning with the CSA thus presents a robust and innovative approach to optimizing the performance of NF–RO desalination processes, contributing to advancements in addressing global water scarcity challenges. The performance criteria used to calculate the models’ accuracy are presented in the following equations [
38,
39]:
where
and
indicate the predicted and computed values
, with N as the means for the data points.
2.1. Long-Term Memory Neural Network (LSTM)
The LSTM neural network emerges as an advanced variation of the recurrent neural network (RNN). Its inception was specifically aimed at addressing the inherent shortcomings of traditional RNNs, especially their inability to retain memory over extended sequences (see
Figure 3). A significant triumph of LSTM over its predecessor is its prowess in resolving issues tied to vanishing and exploding gradients, which historically plagued the training of RNNs. This enhanced capability not only rectifies these problems but also augments the accuracy of models based on recurrent networks—a notable advancement as reported in the literature [
40]. Diving deeper into their structure, LSTM neural networks introduce a set of intricate memory cells. Unlike the transient memory of conventional RNNs, these memory cells exhibit a longer retention span. The configuration of these cells can be visualized as interconnected chains, echoing the modular repetition in conventional RNNs. In their research, Li et al. [
41] emphasized the unique structure of these repeating units. LSTM neural networks boast a range of specialized components, each designed meticulously to process and hold onto extensive sequences. These components not only ensure that information is accurately captured but also make it possible for the network to decide what to remember and what to discard, granting LSTM neural networks their signature ability to remember sequences over a protracted period.
At the heart of the LSTM structure lies the memory cell, an intricate assembly pivotal to the LSTM neural network’s prowess. This memory cell is crafted as a network of sigmoid neurons interconnected in a manner that offers a unique self-feedback loop. Such a configuration allows the memory cell to fine-tune its operations, selectively choosing what information to retain and what to dispose of over prolonged durations. Aside from the memory cell and hidden state, LSTM neural networks have three specialized gates, each serving a distinct purpose. The research of Zaytar & El Amrani [
42] emphasized the essence of these gating mechanisms. Their ability to judiciously regulate the influx and efflux of information renders the LSTM neural network its unique capability to maintain relevant data and discard the superfluous, ensuring optimum performance over extended sequences.
2.2. Crow Search Algorithm (CSA)
The CSA is an optimization algorithm inspired by the social behavior and memory capability of crows, specifically their ability to hide food and remember its location for future use. This nature-inspired heuristic approach is relatively new compared to other swarm intelligence techniques, yet it has been shown to have competitive performance for certain optimization problems [
43,
44]. The program emulates the crow’s behavior by dividing the search space into various sub-regions, each symbolizing a potential food source. It then modulates the positions of these sub-regions, drawing parallels to how crows hide and recover food. Within the CSA algorithm, a collective optimization method, agents are employed where each one represents a crow, facilitating the exploration of the search area. These agents collaborate, sharing insights about the most promising food sources identified so far. In this distance-centric communication method, agents in closer proximity exchange more extensive information compared to those positioned farther away [
44].
In addition to its exploration prowess, the CSA incorporates an exploitation strategy that homes in on pinpointing the optimal solution. This mechanism adjusts the positions of agents relative to the most promising food source identified thus far. The CSA has demonstrated its efficacy in solving optimization challenges in fields like engineering design, feature selection, and image processing. Furthermore,
Figure 4 presents a comprehensive flowchart detailing the CSA’s process, highlighting its primary stages. The swarm of crows is initialized randomly within a d-dimensional space. The fitness of each crow is assessed using a specific fitness function, and based on this evaluation, an initial memory value is assigned. Every crow saves its hiding location in its memory variable, denoted as
mi. To update its position, a crow selects another random crow, represented as
xj, which in turn produces a random value. If this value surpasses the awareness probability threshold ‘AP’, the crow
xi will track
xj to pinpoint the location
mj.
3. Results
In recent times, the convergence of the Industrial Revolution 4.0 with the adoption of emerging Artificial Intelligence (AI) and Internet of Things (IoT) technologies has marked a significant advancement in the fields of wastewater treatment and desalination plants. This revolution represents the peak of efforts aimed at effectively harnessing these advanced technologies to address crucial water management challenges. Within this context, the integration of ML techniques into the operational framework of desalination processes has gained substantial attention. This integration is seen as a pivotal approach to optimize and precisely control various aspects of the desalination process. Previous studies have already underscored the potential and feasibility of employing ML methodologies in this domain. In this section, we employed the novel deep learning-based LSTM algorithm and CSA as evolutionary optimization data-driven techniques to simulate the PC (μS/cm) in a desalination plant. As mentioned above, a stationary analysis was carried out through the application of Augmented Dickey–Fuller (ADF) and Phillips–Perron (PP) tests. This approach was adopted to address the nonstationary nature of each individual input as well as the independent and dependent variables. Although this sort of data mining and enhancement is not a conventional procedure in advanced engineering problems like desalination plants, its significance has been highlighted and recognized as a critical stage in the field of data analysis. The result for the training phase based on several performance indicators is presented in
Table 1. Prior to model development, appropriate feature engineering owing to the complexity of the system was conducted using GA algorithms. Hence, the GA used a robust nonlinear input feature selection approach.
The GA was developed in MATLAB 2022b to select the best optimal objective function for model development. The GA feature can help researchers select the most important input variables for modeling and predicting the PC of a complex environmental system.
Table 1 shows the indicators based on MAE, MSE, and RMSE, which help to evaluate the accuracy and reliability of the predicted models. It is worth noting that the deep learning LSTM model is a type of recurrent neural network (RNN) commonly used for sequence prediction tasks, such as time series data like PC in this case. For this purpose, modeling was performed using three LSTM, LSTM-GA, and LSTM-CSA algorithms. Python language was used for all implementations on a PC with a 3 GHz Core i7 processor and 64 GB of RAM. The lower the MSE, the closer the predicted values are to the actual values. In this case, LSTM-CSA-M2 has the lowest MSE (0.1447), indicating that it has the best overall predictive accuracy among the four models. Similar to MSE, lower RMSE values suggest better predictive performance. Here, the LSTM-CSA-M2 again has the lowest RMSE, indicating that it makes predictions with the smallest average error magnitude. It can be seen that metaheuristic optimization techniques are primarily used for solving optimization problems. It is well-suited for problems where finding the optimal solution is challenging due to complex, non-linear, and multi-dimensional search spaces; hence, its superiority is not surprising. It is well known that the choice between the CSA and the LSTM model depends on your specific problem domain, the nature of your data, and your goals. If your goal is to optimize parameters or find optimal solutions in complex spaces, the CSA might be more suitable. On the other hand, if the problem is to capture the temporal patterns, the LSTM model could be the better choice.
Figure 5 presents the probability cumulative distribution function between the observed and simulated variables. In the analysis regarding NF–RO desalination utilizing LSTM and LSTM-CSA models, it is vital to precisely assess potential errors and experiment shortcomings. Uncertainty analysis, performed using a Monte Carlo simulation prior to model development, should be critically evaluated for comprehensive and accurate execution using various error criteria. The integration of LSTM with the CSA requires precise tuning; any integration issues could negatively affect the results, leading to inaccuracies in permeate conductivity predictions. This research’s reliance on diverse parameters for modeling introduces the possibility of another potential error. Accurate and consistent measurements are fundamental to ensure the reliability of model predictions. Concerning the observed temperature drop (
Figure 2a) in the NF–RO system, a thorough investigation is necessary to determine if it is an experimental error, a system issue, or a valid result. Comprehensive analysis, considering all potential errors, is crucial for validating the research findings and their real-world applicability in desalination processes.
The CDF is a statistical concept used to compare the distribution of observed (empirical) data with that of simulated (model-generated) data. It provides insights into how closely the simulated data matches the observed data in terms of their distribution and the likelihood of specific values occurring. The graph displayed that LSTM-CAS-M2 has the highest agreement between the observed and simulated PC. A good match between the observed and simulated CDFs suggests that the simulated PC data closely resemble the observed data in terms of their distribution. Since the CDFs align well, this indicates that the model’s predictions are consistent with the real-world desalination data. The comparison of CDFs is widely used in various fields, such as climate modeling, financial risk assessment, and quality control. Further numerical comparison of the results is presented using MAE measures. The MAE indicated the average absolute difference between the predicted and actual values. It gives a sense of the average magnitude of errors without squaring them. From the training results it can be seen that the MAE of LSTM-M1 = 0.0945, the MAE of LSTM-M2 = 0.0644, the MAE of the LSTM-CSA-M1 = 0.0399, and the MAE of the LSTM-CSA-M2 = 0.0945. The results show that the LSTM-CSA-M1 has the lowest MAE (0.0399), implying that it has the smallest average absolute error in predicting PC during the training phase (see
Table 2). On the other hand, the LSTM-CSA-M2 has the lowest value for RMSE (0.3804), suggesting that it is the most accurate model in predicting PC in the training phase (see
Figure 6).
However, the LSTM-CSA-M2 performs the best across all three metrics (MSE, RMSE, and MAE) among the four models during the training phase for PC. The LSTM-CSA-M1 has the second-best performance, with the lowest MAE indicating the smallest average absolute error. These metrics collectively offer insight into the accuracy and precision of the model’s predictions, helping to identify which model is most suitable for predicting PC in the desalination plant’s training phase. In intricate processes like desalination, where uncertainties arise from both human activities and the characteristics of the source water, the connections among physicochemical factors are prone to being nonlinear due to these uncertain factors. The evidence in
Table 2 indicates that the PC simulation was shown to be a satisfactory and reliable simulation in the testing phase. The quantitative comparison of the testing phase depicted that MSE measures the average of the squared differences between the predicted and actual values. Lower MSE values indicate that the model’s predictions are closer to the actual values. In this case, the LSTM-CSA-M1 and LSTM-CSA-M2 have the lowest MSE values, suggesting that these models have better overall prediction accuracy in comparison to the other models. The numerical accuracy for each model in the testing phase was as follows: LSTM-M1 (MSE = 0.3168, RMSE = 0.5628, MEA = 0.1468), LSTM-M2 (MSE = 0.4284, RMSE = 0.6545, MEA = 0.1884), the LSTM-CSA-M1 (MSE = 0.1985, RMSE = 0.4456, MEA = 0.1985), the LSTM-CSA-M2 (MSE = 0.1992, RMSE = 0.4463, MEA = 0.1292). Similarly, the lower RMSE values indicate better predictive performance. Similarly, LSTM-CSA-M1 and LSTM-CSA-M2 have the lowest RMSE values, implying that these models are better at minimizing prediction errors. In this case, the LSTM-CSA-M2 has the lowest MEA value, suggesting that it has the smallest average prediction error. Based on these metrics, it appears that the models with the prefix LSTM-CSA (LSTM with CSA) tend to perform better than the other models (LSTM without CSA) across all three metrics. Additionally, between the LSTM-CSA-M1 and the LSTM-CSA-M2, the latter (LSTM-CSA-M2) seems to outperform the former with slightly lower values in most metrics (see
Figure 7).