4.1. Experimental Setup
This study is based on the interval-constrained multi-objective optimization scheduling model for the integrated energy system of islands designed in
Section 3, and experiments were conducted using an interval multi-objective optimization algorithm based on an improved PPO-CLIP and meta-learning. This section includes simulations of emergency scenarios in scheduling to verify the model’s excellent handling capabilities for various uncertainties. To demonstrate the effectiveness of the proposed algorithm, several interval multi-objective optimization algorithms were selected for performance comparison, including the IP-MOEA [
43], ICMOABC [
44], IMOMA-II [
45], and CIMOEA [
46] algorithms.
The experiments were carried out using a high-performance computing system equipped with an Intel Core i7-13700K processor (13th generation) and an NVIDIA GeForce RTX 4070 graphics card. The programming environment consisted of PyCharm (version 2023.1), with Python (version 3.11) employed for the implementation of algorithms and preliminary testing. For comparative analysis, all of the experiments were conducted in a MATLAB (version R2023b) environment to ensure uniformity and efficiency in algorithm execution [
47].
In the training of the multi-objective optimization scheduling model for the integrated energy system of islands, the key experimental parameters are as presented in
Table 1 [
48]. In the meta-learning settings, the meta-learning rate was set to
to control the update speed of the meta-model during the learning process. The model underwent training through 10,000 meta-iterations, with each sub-problem being updated five times. To ensure stability during the meta-learning process, the gradient clipping norm for meta-learning was set at 1.0, with a decay cycle of 1000 and a decay magnitude of one-tenth of the current learning rate. The initial learning rates for the actor and critic networks were set at
with a reward discount factor of 0.97 and a regularization parameter adjusted to 0.01. The GAE
parameter was 0.95, and the soft update rate of the target network was adjusted to 0.03 to facilitate smooth transitions of network parameters, with a random action probability offset of 0.1 (only during the training phase). The network architecture comprises actor and critic networks, each configured with six layers using ReLU activation functions, with weights updated through the Adam optimizer.
Table 2 lists the rated power and bounds for the experimental equipment, with some of the device information derived from
Appendix B. The model encompasses a diverse array of energy devices, including both renewable energy collection units and energy conversion systems, each with specifically designated quantities and power capacity ranges. The renewable energy devices featured in the system comprise wind turbines, photovoltaic panels, and wave energy converters. The quantity range for wind turbines is set from 0 to 15 units, for photovoltaic panels from 0 to 400 units, and for wave energy converters from 0 to 120 units. The output of these devices is established based on predictive data to accommodate varying environmental energy demands. The energy conversion devices include EB, ER, AR, WSHP, SD, and HFC. The quantity limitations for EB and chillers are capped at 20 units, absorption chillers at 10 units, WSHP range from 10 to 20 units, and seawater desalination units fall within the same range, while hydrogen fuel cells have a maximum limit of 300 units. The power parameters for each device are set according to their design standards, for instance, EB at 66 kW, electric chillers at 45 kW, absorption chillers at 180 kW, WSHP at 60 kW, seawater desalination units at 70 kW, and the parameters for Hydrogen Fuel Cells are focused on the consumption of hydrogen.
4.2. Results and Discussion
This study conducts a simulation of the integrated energy system for the island depicted in
Figure 1, employing a 24 h scheduling period with 1 h dispatch intervals. The Extreme Learning Machine (ELM) [
49] method is utilized to generate forecasts based on historical data for wind, solar, and wave energy outputs, as well as daily predictions for electrical, cooling, heating, and water loads over the 24 h scheduling period. As observed in
Figure 6, the wind energy output is relatively stable, while solar energy displays significant variability during daylight hours, peaking at midday. Wave energy production is comparatively low, albeit with a slight increase during the evening. Regarding load predictions, electrical demand peaks during the daytime high-demand period, then gradually declines. Cooling and heating loads exhibit a distinct counter-cyclical pattern, with cooling demand peaking during the day and heating demand more pronounced at night. The water load remains relatively stable, with minor fluctuations during certain periods.
We conducted an analysis of wind power output predictions for the integrated energy system on the island, comparing the effects across different confidence intervals ranging from 70% to 98% (as shown in
Figure 7). A 95% confidence interval was ultimately selected as the optimal confidence level because it balances forecast accuracy with the avoidance of resource wastage due to overly broad intervals and reduces risks associated with prediction uncertainty [
50]. At the 95% confidence level, interval forecasts for various system metrics were performed, with the results displayed in
Figure 8, clearly marking the upper and lower bounds of the predicted intervals.
Based on the output data from renewable energy sources and load forecasts, an interval-based multi-objective optimization model for the integrated energy system of islands was solved. The objective functions aimed to minimize daily economic costs and maximize the output from renewable energies. The resulting Pareto frontier is shown in
Figure 9.
We conducted a detailed analysis of the energy dispatch performance of an integrated energy system on an island over a 24 h period. As shown in
Figure 10, the system dynamically adjusts the output of various energy devices in response to fluctuations in load demand throughout the day, ensuring continuity and diversity of energy supply. During daylight hours, the system fully utilizes solar energy devices for power supply. At night, as solar devices cease to produce energy, the system adjusts the operation of other energy devices, maximizing the output of renewable energy and minimizing costs. This dispatch strategy not only optimizes energy utilization efficiency but also enhances the system’s adaptability to changes in energy demand, further confirming its effectiveness and economic viability in practical applications.
We evaluated the performance of MOMAML-PPO compared to four other algorithms (IP-MOEA, ICMOABC, IMOMA-II, and CIMOEA) within the integrated energy optimization scheduling model for islands. All evolutionary algorithms were tested with a population size of
and
iterations. As shown in
Table 3, MOMAML-PPO achieved a hypervolume [
51] index of 0.6214, surpassing the other algorithms, indicating its superior capability in exploring the solution space more comprehensively in ICMOPs. Moreover, the computation time of MOMAML-PPO was only 92.4 s, significantly lower than the other algorithms, demonstrating higher time efficiency. The uncertainty of the algorithm is quantified by the mean of the sum of the widths of the objective function intervals for all individuals. Although MOMAML-PPO displayed slightly higher uncertainty compared to CIMOEA, its performance remained within a range of 3.2141. This can be attributed to the fact that the DRL model was designed to address a class of problems rather than a specific experimental case. In contrast, CIMOEA is tailored for direct computation on specific experimental data, resulting in lower uncertainty values than MOMAML-PPO after approximately 80 generations of population evolution, as supported by statistical analysis. Through a lateral comparison between the DRL method and the interval multi-objective evolutionary algorithm, MOMAML has demonstrated outstanding generalization performance. After sufficient training, this algorithm can quickly devise scheduling solutions for different instance scenarios. Not only does MOMAML solve problems faster, meeting the demands of real-time scheduling, but it also shows considerable advantages in the HV index, indicating a higher quality of its solution set.
We investigated the application of the enhanced PPO-CLIP model in the optimization and scheduling systems of comprehensive island energy. The training efficiency and performance of the model were assessed through various parameter initialization strategies. During the experimental design, the model’s weight coefficients were uniformly set at 0.5, and all average reward values were normalized. The number of iterations was fixed at 10,000, processing the same 20 data sets in each iteration to ensure consistency and accuracy in the evaluations.
We performed sensitivity analyses on two pivotal parameters affecting the performance of our model: the learning rate,
, of the actor–critic network and the discount factor,
, for reward computation, as illustrated in
Figure 11 and
Figure 12. The experiments allocated equal weights of 0.5 to objectives
and
, to ensure the model remained balanced in its pursuit of these distinct goals while pursuing these distinct goals. A total of 5000 iterations were executed, sufficient to evaluate the model’s long-term behavior and stability. The rewards computed in each iteration were normalized.
At a learning rate of , the model demonstrated both high and stable reward trajectories. When the learning rate was increased to and further to , the model tended to either overshoot the optimal solution or oscillate around it, resulting in significant reward volatility and challenges in achieving convergence. Conversely, at a lower learning rate of , the model exhibited greater stability in the final phases, albeit with slower convergence compared to a learning rate of . Therefore, a learning rate of proved optimal, effectively balancing accelerated learning with stability and higher mean rewards.
In
Figure 12, we examined the impact of varying discount factors,
, on model performance. A setting of
optimized the balance between immediate and future rewards, enhancing long-term average reward. Higher discount factors, such as
, led the model to over-prioritize long-term rewards, potentially compromising responsiveness to immediate changes. Conversely, lower values of
, specifically 0.9 and 0.8, caused an overemphasis on short-term rewards, neglecting long-term strategic development and subsequently diminishing overall performance. Selecting
facilitated a balance between long-term strategic considerations and immediate responses, thus enhancing the model’s adaptability and robustness across various task environments.
To validate the efficacy of the proposed meta-model parameter initialization strategy, we established one experimental group and two control groups. The control groups were evaluated using domain parameter transfer and random initialization strategies, with each undergoing 10,000 training iterations. The average reward values were normalized for comparison. As depicted in
Figure 13, the meta-model parameter transfer strategy (represented by the red line) demonstrated higher average reward values from the early stages of the experiment and reached a stable state after approximately 3500 iterations. This observation suggests that the meta-model parameter transfer strategy significantly enhances the model’s initial performance and accelerates the learning process. Moreover, this strategy not only boosts the model’s rapid adaptability but also enhances its stability and efficiency over long-term iterations.
In comparison, the adjacent sub-model parameter transfer strategy (blue line) initially exhibited lower performance. However, as the iterations progressed, its performance gradually improved, showing comparable results to the meta-model parameter transfer in the mid to late stages of iteration. This indicates that the parameters of adjacent sub-models can gradually adapt and optimize the current model’s performance with sufficient iterations. Compared to the aforementioned strategies, random initialization (green line) showed overall poorer performance, particularly in the initial stages of iteration, highlighting the importance of appropriate parameter initialization in complex system optimization and the challenges that random initialization may pose at the start of model learning. The meta-model parameter transfer method not only improved the training efficiency but also optimized the model’s long-term stability and performance during iterations, confirming the effectiveness of using meta-learning for parameter initialization to enhance model training efficiency.
To assess the resilience of the model under emergent weather conditions, consider two scenarios involving torrential rainfall, which differ in the timing of the rain. In both scenarios, solar power devices are rendered inoperative due to weather constraints, while all wind turbines operate at their maximum rated power of 8 kW. According to the results of the experiment shown in
Figure 14, in extreme weather conditions devoid of solar input, wind energy emerges as the primary source of power for the energy system, with a marked increase in output from other devices as well. The power output exhibits significant variability, underscoring the model’s capability to dynamically adjust different devices to meet the continuously changing electricity load demands.