1. Introduction
Cloud computing has received attention as an innovative model due to the fast growth of Internet technology [
1]. In the cloud computing system, distributed computing technology and various open service interfaces are used to obtain benefits by selling its redundant computing and storage capabilities to users [
2], i.e., the cloud computing concept is based on a pay-per-use paradigm that consists of a general populace set of the on-demand assignment of programmable computer resources, and users can easily access the network [
3]. Service providers can make profits by providing essential services during a short period of time through the advantages of their hardware resources, while users and certain companies that do not want to increase the cost of their internal data center construction can lease virtualized resources provided by service providers to save costs [
4].
The goal of cloud computing is to offer consumers cloud services using virtualized resources [
5]. Generally speaking, each service provided by cloud service providers can represent a task, and the process of service providers leasing services to different cloud users is the process of task allocation and execution [
6]. Due to the significantly increased demand for cloud services, cloud service providers are extending the range and quantity of services that they offer [
7]. In recent years, the emphasis has switched to how to correctly and appropriately plan these operations in order to enhance resource usage. For these scheduling problems, it is actually a kind of NP-hard problem [
8]. Researchers have put forth much effort on this subject. For example, Panda et al. [
9] devised the assessment criteria while keeping time and resource usage in mind and provided the scheduling method. In a diverse context, Hussain et al. [
10] suggested an energy-saving and effective job scheduling technique. In addition, due to the simple operation, scalability, and robustness, heuristic algorithms are effective at addressing issues such as task scheduling, load balancing, and virtual machine placement, and have gained a lot of attention from researchers.
The Phasmatodea Population Evolution (PPE) algorithm is a heuristic population evolution algorithm based on stick insect population development features [
11]. The algorithm treats the problem’s solutions as a population, and each population has two properties: the population size and pace of expansion [
12]. The movement of the solution of the problem in the solution space is regarded as the evolutionary trend of the population [
13]. In addition, the technique substitutes K-nearest optimal solutions for optimal global solutions. This broadens the range of population evolution patterns and the speed at which the best solutions are found is increased, whilst the population competition influences the evolutionary direction of the population [
14]. In the context of this research, we present an APPE algorithm that optimizes and balances the algorithm’s development and exploration skills to improve the algorithm’s optimization speed and general search capacity [
15]. The contributions of this paper are as follows:
It proposes an advanced Phasmatodea Population Evolution (APPE) algorithm. It combines the restart mechanism with a new evolutionary trend of stick insect populations to balance the algorithm’s exploration and development capabilities.
It builds an evaluation function using the makespan, cost, and load balancing degree as indicators.
Extensive simulation and comparison of the proposed approach with comparable algorithms utilizing CEC2014 benchmark suites for testing.
To assess the performance of the algorithm, it is compared to five similar algorithms in two heterogeneous environments.
The following is the architecture of this essay, the second part introduces the related work on cloud computing task scheduling, the third part introduces the proposed algorithm, the fourth section introduces the simulation strategy and the result analysis, and finally, the fifth section introduces the summary and analyses.
2. Related Work
Cloud computing has grown in popularity as a distributed system based on the Internet in recent years. It has attracted much attention because it can configure a shareable resource pool according to the request and can be immediately provided and unloaded [
16]. As the number of cloud users continues to grow, the amount of requests to be processed in the cloud is also increasing. If the job scheduling is unreasonable, its performance will be reduced [
17]. The scheduling algorithm is responsible for distributing the work requested by cloud users to cloud resources in order to minimize the make span, improve resource utilization, reduce use cost, and balance the cloud infrastructure and resource load [
18,
19,
20]. The scheduling question in the cloud environment can be separated into different fields, which are illustrated as follows.
Task scheduling algorithms can be divided into task scheduling based on the Quality of Service (Qos), task scheduling based on the Ant Colony Optimization (ACO) algorithm, Particle Swarm Optimization (PSO) algorithm-based task scheduling, Genetic Algorithm (GA) based on the task scheduling, and fuzzy-based task scheduling [
21].
Initially, individuals researched task scheduling focused on QoS. He et al. [
22] was a pioneer in the field of QoS-based scheduling algorithm research, taking QoS parameters as evaluation criteria to analyze and execute the next schedule. The suggested approach is a generic adjustable heuristic task scheduling algorithm based on QoS advice that significantly improves performance by incorporating QoS into the Min-Min heuristic algorithm. Following that, Wu et al. [
23] devised a cloud-based QoS-driven job scheduling algorithm which prioritizes the tasks according to the particularity of the tasks, allocates resources according to the sorting queue, and allocates resources according to the sorting queue. It was tested and found to achieve good load balancing and performance. Through the suggested optimization model and distributed queue based on dynamic load, the S et al. [
24] enhances resource usage efficiency, minimizes cost, and delays. HGEDH et al. [
25] categorized the tasks based on their attributes and assigned them to the available servers. However, the algorithm can only categorize tasks based on the supplied attributes, and tasks must be queued for categorization. Hanini et al. [
26] utilized the workload to determine the number of virtual machines to be used, combined with the control of incoming requests to virtual machines to control energy consumption.
ACO also plays an important role in addressing scheduling problems. Ants release pheromones in the process of searching for food, and share their travel experience through the trail of pheromones to connect with each other. ACO is effective for solving NP-complete problems, especially for dynamic task scheduling problems [
27]. Liu et al. [
28] applies the ACO algorithm to solve the problem of virtual machine placement in the cloud while also decreasing energy consumption by reducing the number of physical hosts. Xin et al. [
29] introduced the network and offered an ACO-based resource scheduling strategy for reducing the running time and energy consumption by improving the scheduling system. Delavar et al. [
30] studied the task scheduling problem of grid computing in terms of QoS. Researchers used a sub-heuristic ACO strategy in terms of make span and money. However, the task scheduling problem in a heterogeneous environment is not considered. Wu et al. researched the assessment model under multi-objective in order to monitor the change of energy consumption at all times, which considerably decreased the resource waste [
31]. Recently, Kumar et al. [
32] utilized the scheduler to arrange work and resources in a reasonable manner, coupled the ACO and GA algorithms, and devised a multi-objective scheduling system. Nevertheless, the load and security of virtual machines were not taken into account in the scheduling procedure in this study. To make it more efficient and balance the load in the cloud, Ragmani et al. [
33] considered the allocation of the execution time and resources, and optimized the efficiency of the algorithm by altering parameters. Sun et al. [
34] found the optimal solutions earlier by optimizing the updating method of the ant colony pheromone, and proposed a Period Ant Colony Optimization (PACO) algorithm.
Because of its simple principle and less parameters to be adjusted, using the PSO algorithm to address scheduling problems has garnered a lot of interest. In the scheduling process, Pandey et al. [
35] incorporated the transport and execution costs into the evaluation function and solved the scheduling issue using the PSO algorithm. The algorithm was then compared to the Best Resource Selection (BRS) algorithm, which saves a significant amount of money. Juan et al. [
36] improved the PSO algorithm which defined the cost vector to restrict the initial solutions and the search space of the solutions. Alsaidy et al. [
37] stood for an advanced optimization algorithm that utilized heuristics to initialize particle swarms in order to maximize cost-effectiveness and resource consumption. Wen et al. [
38] mixed the ACO and PSO algorithms to accelerate the exit from local optimization and enhanced the convergence speed. Kumar et al. [
39] offered a task selection strategy for improving the algorithm’s performance by picking the optimal VM.
Several methods for dealing with redundant task scheduling problems have been proposed based on studies of improved GA. Kumar et al. [
40] integrated the GA algorithm with the techniques typically used in work scheduling, proposed a new algorithm, and compared it with the standard genetic algorithm. Nevertheless, the research only considered the makespan under different resource quantities when evaluating. Nagar et al. [
41] used a previous workflow scheduling model that predicted the earliest completion time, and reduced the execution time by proposing a new Predict Earliest Finish Time (PEFT) genetic algorithm. However, the technique is best suited for workflows with a small number of tasks and does not consider the execution cost, the number of virtual machines, or the data center’s energy usage. Velliangiri et al. [
42] combined the hybrid electric search with the genetic algorithm while considering the execution time, maximum completion time, and load balancing. Similarly, the article did not consider the load of virtual machines. Manasrah et al. [
43] introduced the GA-PSO algorithm to handle the resource allocation problem in the cloud, substantially shortening the make span and cost. Farhadian et al. [
44] optimized the virtual machine allocation problem by using a hybrid algorithm based on IC and GA, thereby reducing the energy consumption of the data center, and the simulation experiments on CloudSim software obtained good results.
Fahmy [
45] explained the use of fuzzy algorithms to schedule aperiodic work in real-time systems for fuzzy-based scheduling. On a soft real-time single processor system, Fahmy developed a fuzzy method for aperiodic work scheduling. The total throughput time of the task is lowered by assessing the priority of the work that is now running and altering the priority of the job in the queue, and the technique is utilized in a multi-objective algorithm [
46]. Zhou et al. [
47] proposed a heterogeneous earliest completion time algorithm based on fuzzy dominance sorting in the Infrastructure as a Service (IaaS) workflow, which greatly improved the running speed. Revathi et al. [
48] combined the scheduling optimization with virtual allocation methods to design a VM based on security requirements, and proposed a scheduling heuristic optimization algorithm based on the Cost Prediction Matrix (CPM). Rezaeipanah et al. [
49] devised a mechanism for managing virtual machines in the cloud center.
Based on the meta heuristic algorithm and cloud computing task scheduling, combined with the above research results and problems, this research focused on using the APPE algorithm to solve the task scheduling problem in the cloud. The simulation results on MATLAB show that the APPE algorithm can get a better allocation scheme in a heterogeneous cloud environment.
3. Task Scheduling Based on the Advanced Phasmatodea Population Evolution Algorithms
In cloud computing, scheduling can be divided into task scheduling and job scheduling. Among them, Hadoop is the most often used scheduling method in task scheduling [
50,
51], which is mainly used to improve the system performance. Task scheduling is used to allocate resources to task applications by mapping tasks and resources, and now the critical issue in task scheduling is determining how to assign jobs to processors to achieve low cost, high efficiency, and minimize the makespan [
52]. A new method is proposed here to address the shortcomings of the current research on cloud computing task scheduling algorithms. First, a task scheduling system and evaluation model are designed to evaluate the pros and cons of task allocation based on the load level, cost, and makespan [
53]. The APPE-based task scheduling method is then implemented to the task scheduling model. Based on the evaluation framework, the optimal work scheduling method is obtained.
3.1. System Design
When a cloud user submits a job, the task is placed to the task queue via the task manager, and the task scheduling method assigns the task to the virtual machine. Each task is independent and non-preemptive.
Figure 1 depicts the cloud computing system’s task scheduling paradigm.
Tasks are serially processed on the virtual machine in this article via the task queue. The task has two attributes: m and , m represents the amount of tasks. represents the length of the task, and the unit is millions of machine language instructions (MI). A virtual machine has four properties: n, , , . n represents the amount of VMs, and represents the average processing rate of single-word fixed-point instructions. stands for memory, stands for bandwidth.
3.2. Evaluation Model
The indicators for evaluating cloud computing task scheduling performance mainly include makespan, Profit, Completion time, Cost, and Waiting time. This paper comprehensively considers the task scheduling performance from the perspectives of makespan, cost and degree of imbalance [
54]. These performance indicators are described in detail below.
.
represents the task queue submitted by cloud users, and
m stands for the number of tasks.
.
stands for the length of the
i-th task.
.
represents the
j-th VM.
n stands for the amount of VM.
,
represents the fact that the task
i is executed on the VM
j, otherwise
.
stands for the expected completion time, that is, the processing time of the task
i on the VM
j, which is computed by the following formula:
represents the executing speed of the VM j.
Makespan:
Makespan is a critical metric for assessing the effectiveness of task scheduling in the cloud. The makespan is the completion time of the task, which reflects the total operating duration of all VMs, and is computed using the following formula:
Cost:
The cost calculated according to the specification of VM is as follows;
,
,
,
,
, and
per hour [
55]. Calculate the cost of the virtual machine through the following formula.
represents the hourly cost of the jth virtual machine, and in a heterogeneous environment, its resource cost is related to , , and , stands for virtual machine memory, represents the bandwidth of VM.
represents the degree of imbalance of the system.
n represents the number of VM,
represents the degree of impact of each MIPS, RAM and bandwidth on the virtual machine, as shown by the following formula:
Here,
stands for the running time of the VM
i,
represents the average running time of the VM,
is related to MIPS, RAM and bandwidth,
,
,
are three weight values, respectively. Through the above performance indicators, the objective function formula is obtained as follows:
3.3. Scheduling Model Based on the APPE
3.3.1. Advanced PPE Algorithm
In the development process of Phasmatodea populations, the PPE algorithm guaranteed the development of continuous populations by mimicking the qualities of route dependency, convergence evolution, population expansion, and competitiveness. The evolution process can be thought of as an optimization process of populations in d-dimensional space; hence, we can think of the solution as a population of stick insects in d-dimensional space.
As a new heuristic algorithm, PPE has good optimization and exploration ability. The algorithm considers the top K solutions with the highest fitness value to be the closest to optimum solutions. It improves its global exploration ability through the K-nearest optimal solutions and perturbation. Through path dependence, convergent evolution, population growth, and competition, the solutions other than the nearest optimal solutions can reach the region with a better fitness value faster and improve their optimization ability.
In our work, the advanced PPE algorithm is divided into two aspects. On the one hand, the speed searching for the ideal solution is increased by refining the population evolution trend calculation formula. As we are all aware, the calculation of the population’s evolution trend
is the basis of the original PPE method, which affects the exploration ability of the population. The updating technique is divided into three sections. The first component is the closest ideal population, while the second component represents the continuation of the prior evolution trend, and the third part is mutation [
56]. The calculation formula of the evolution trend is as follows:
In the formula, stands for the evolutionary trend of the population, p represents the population quantity, A denotes the level of similarity to the nearest optimum, m stands for mutation, and k is the amount of iterations.
Through the formula, it was found that the first part of the convergent evolution has no effect on the update of the evolution trend of the nearest optimal population, so we propose a new evolution trend update formula, as shown below.
In the improved formula, is used to judge whether it is the nearest optimal solution, and B denotes the degree to which it resembles the global optimum.
Another aspect of improving the PPE algorithm is to add a restart mechanism to the algorithm, restarting the solutions with poor fitness values and the worst solutions among the nearest optimal solutions and avoiding the area centered on the initial position of the restart solutions during the restart process [
57]. Algorithm 1 depicts the pseudo-code for the advanced PPE algorithm.
Algorithm 1: Advanced PPE Algorithm |
Initialize N populations, T; Initialize , P, K; Calculate , set and ; whilet < Iter do Update T to ; Calculate , update and ; Restart ordinary solutions; Restart nearest optimal solutions; while < N do if then if < then T=; Update , ; end if Update use Equation ( 11); else T=; Update , ; Update use Equation ( 11); end if Solution for random choice , (j≠i); if <G then Update , ; end if end while end while
|
In Algorithm 1, N is the number of Phasmatodea populations, T stands for the set of Phasmatodea populations in the APPE algorithm, and each population represents a solutions. Furthermore, denotes the global optimum solutions, whereas denotes the nearest optimal solutions.
After the calculation of the fitness values of all solutions in each iteration process is completed, a restart mechanism is introduced to restart the ordinary solutions and the nearest optimal solutions, and the population evolution trend after restart is updated. The restart process of the two types of solutions is the same as in Algorithm 2, which is as follows.
Algorithm 2: Restart Strategy |
Initialize G, , t, K, ;//the value of iter is related to restart count ifG> || rem()==0 then Restart k solutions with poor fitness; Avoid the position where the solutions start; Modify population evolution trends; Calculate fitness after restarting the solutions update; end if
|
3.3.2. Task Scheduling Algorithm Based on the Advanced PPE
Since task scheduling seeks to assign n independent non-preemptive tasks to cloud resources (VMs). In this work, we map each solution to a population; each stick insect population symbolizes a task scheduling technique; the dimension reflects the amount of tasks; and the value of the dimension
i of the solutions is
j, which means that task
i is allocated to the VM
j. Taking the proposed evaluation function as the standard, we obtain the optimal resource allocation scheme based on the APPE algorithm [
58,
59]. The execution procedure is depicted in Algorithm 3 below.
Algorithm 3: APPE based task scheduling |
Step1: Initialize the parameter, generate initial population ; Step2: According to the proposed evaluation function, calculate fitness ; Step3: The initial global optimal and nearest optimal solutions are obtained; Step4: The evolutionary trend of the population is calculated through path dependence, convergent evolution, population growth and competition; Step5: Update and calculate new fitness ; Step6: Update , global optimal and nearest optimal solutions; Step7: Repeat steps 4, 5, and 6 until the iteration is complete; Step8: The global optimal solution is the optimal scheduling scheme;
|
In Algorithm 3, represents the position of solutions population and each solution represents the location of a stick insect population, that is, a task-scheduling scheme. represents the position of the updated solutions population.