Next Article in Journal
DSRP Theory: A Primer
Next Article in Special Issue
Modeling Emergency Logistics Location-Allocation Problem with Uncertain Parameters
Previous Article in Journal
In Situ Technological Innovation Diffusion Rate Accuracy Assessment
Previous Article in Special Issue
Simulation of Cooperation Scenarios of BRI-Related Countries Based on a GVC Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers

1
Business School, Xuzhou University of Technology, Xuzhou 221018, China
2
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
3
School of Business, Sun Yat-sen University, Guangzhou 510275, China
*
Author to whom correspondence should be addressed.
Systems 2022, 10(2), 27; https://doi.org/10.3390/systems10020027
Submission received: 24 January 2022 / Revised: 26 February 2022 / Accepted: 28 February 2022 / Published: 2 March 2022
(This article belongs to the Special Issue Data Driven Decision-Making for Complex Production Systems)

Abstract

:
In this paper, we apply a Markov decision process to find the optimal asynchronous dynamic policy of an energy-efficient data center with two server groups. Servers in Group 1 always work, while servers in Group 2 may either work or sleep, and a fast setup process occurs when the server’s states are changed from sleep to work. The servers in Group 1 are faster and cheaper than those of Group 2 so that Group 1 has a higher service priority. Putting each server in Group 2 to sleep can reduce system costs and energy consumption, but it must bear setup costs and transfer costs. For such a data center, an asynchronous dynamic policy is designed as two sub-policies: The setup policy and the sleep policy, both of which determine the switch rule between the work and sleep states for each server in Group 2. To find the optimal asynchronous dynamic policy, we apply the sensitivity-based optimization to establish a block-structured policy-based Markov process and use a block-structured policy-based Poisson equation to compute the unique solution of the performance potential by means of the RG-factorization. Based on this, we can characterize the monotonicity and optimality of the long-run average profit of the data center with respect to the asynchronous dynamic policy under different service prices. Furthermore, we prove that a bang–bang control is always optimal for this optimization problem. We hope that the methodology and results developed in this paper can shed light on the study of more general energy-efficient data centers.

1. Introduction

Over the last two decades considerable attention has been given to studying energy- efficient data centers. On the one hand, as the number and size of data centers increase rapidly, energy consumption becomes one main part of the operating costs of data centers. On the other hand, data centers have become a fundamental part of the IT infrastructure in today’s Internet services, in which a huge number of servers are deployed in each data center such that the data centers can provide cloud computing environments. Therefore, finding optimal energy-efficient policies and designing optimal energy-efficient mechanisms are always interesting, difficult, and challenging in the energy-efficient management of data centers. Readers may refer to recent excellent survey papers, such as Masanet et al. [1], Zhang et al. [2], Nadjahi et al. [3], Koot and Wijnhoven [4], Shirmarz and Ghaffari [5], Li et al. [6], and Harchol-Balter [7].
Barroso and Hölzle [8] demonstrated that many data centers were designed to handle peak loads effectively, but this directly caused a significant number of servers (approximately 20%) in the data centers to be idle because no work was done in the off-peak period. Although the idle servers do not provide any services, they still continue to consume a notable amount of energy, which is approximately 70% of the servers working in the on-peak period. Therefore, it is necessary and useful to design an energy-efficient mechanism for specifically dealing with idle servers. In this case, a key technique, the energy-efficient states of “Sleep” or “Off” were introduced such that the idle servers can take the sleep or off state, which always consumes less energy, in which the sleep state consumes very little energy, while the energy consumption of the off state is zero. See Kuehn and Mashaly [9] for more interpretation. To analyze the energy-efficient states, thus far, some queueing systems either with server energy-efficient states (e.g., work, idle, sleep, and off) or with server control policies (e.g., vacation, setup, and N-policy) have been developed in the study of energy-efficient data centers. Important examples include the survey papers by Gandhi [10] and Li et al. [6], and the research papers by Gandhi et al. [11,12,13], Maccio and Down [14,15], Phung-Duc [16] and Phung-Duc and Kawanishi [17].
It is always necessary to design an optimal energy-efficient mechanism for data centers. To do this, several static optimization methods have been developed using two basic approaches. The first is to construct a suitable utility function for a performance-energy trade-off or a performance cost with respect to the synchronous optimization of different performance measures, for example, reducing energy consumption, reducing system response time, and improving quality of service. The second is to minimize the performance cost by means of some optimal methods, including linear programming, nonlinear programming, and integer programming. Gandhi et al. [12] provided two classes of the performance-energy trade-offs: (a) ERWS, the weighted sum β 1 E [ R ] + β 2 E [ E ] of the mean response time E [ R ] and the mean power cost E [ E ] , where β 1 , β 2 0 are weighted coefficients; (b) ERP, the product E [ R ] E [ E ] of the mean response time and the mean power cost. See Gandhi et al. [12] and Gandhi [10] for systematical research. Maccio and Down [14] generalized the ERP to a more general performance cost function that considers the expected cycle rate. Gebrehiwot et al. [18] made some interesting generalizations of the ERP and ERWS by introducing multiple intermediate sleep states. Additionally, Gebrehiwot et al. [19,20] generalized the FCFS queue of the data center with multiple intermediate sleep states to the processor-sharing discipline and to the shortest remaining processing time (SRPT) discipline, respectively. In addition, Mitrani [21,22] provided another interesting method to discuss the data center of N identical servers that contain m reserves, while the idle or work of the servers is controlled by an up threshold U and a down threshold D. He established a new performance cost C = c 1 E [ L ] + c 2 E [ S ] and provided expressions for the average numbers E [ L ] and E [ S ] , so that the performance cost C can be optimized with respect to the three key parameters m, U, and D.
To date, little work has been done on applications of Markov decision processes (MDPs) to find the optimal dynamic control policies of energy-efficient data centers. Readers may refer to recent publications for details, among which Kamitsos et al. [23] constructed a discrete-time MDP and proved that the optimal sleep energy-efficient policy is simply hysteretic, so that it has a double threshold structure. Note that such an optimal hysteretic policy follows Hipp and Holzbaur [24] and Lu and Serfozo [25]. For policy optimization and dynamic power management for electronic systems or equipment, the MDPs and stochastic network optimization were developed from five different perspectives: (a) the discrete-time MDPs by Yang et al. [26]; (b) the continuous-time MDPs by Ding et al. [27]; (c) stochastic network optimization by Liang et al. [28]; (d) the sensitivity-based optimization by Xia and Chen [29], Xia et al. [30], and Ma et al. [31]; and (e) the deep reinforcement learning by Chi et al. [32].
The purpose of this paper is to apply continuous-time MDPs to set up optimal dynamic energy-efficient policies for data centers, in which the sensitivity-based optimization is developed to find the optimal solution. Note that the sensitivity-based optimization is greatly refined from the MDPs by dealing with the Poisson equation by means of some novel tools, for instance, performance potential and performance difference. To date, the sensitivity-based optimization has also been sufficiently related to the Markov reward processes (see Li [33]), thus, it is an effective method for the performance optimization of many practical systems (see an excellent book by Cao [34] for more details). The sensitivity-based optimization theory has been applied to performance optimization in many practical areas. For example, in energy-efficient data centers by Xia et al. [35]; inventory rationing by Li et al. [36]; the blockchain selfish mining by Ma and Li [37]; and in finance by Xia [38].
The main contributions of this paper are threefold. The first contribution is to apply the sensitivity-based optimization (and the MDPs) to study a more general energy-efficient data center with key practical factors, for example, a finite buffer, a fast setup process, and transferring some incomplete service jobs to the idle servers in Group 1 or to the finite buffer, if any. Although practical factors will not increase any difficulty in performance evaluation (e.g., modeling by means of queueing systems or Markov processes, also see Gandhi [10] for more details), they can largely cause substantial difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining threshold-type policies by using the sensitivity-based optimization. For instance, the finite buffer makes the policy-based Markov process appear as the two-dimensional block-structured Markov process from the one-dimensional birth–death process given in Ma et al. [31] and Xia et al. [35].
Note that this paper has two related works: Ma et al. [31] and Xia et al. [35], and it might be necessary to set up some useful relations between this paper and each of the two papers. Compared with Ma et al. [31], this paper considers more practical factors in the energy-efficient data centers such that the policy-based Markov process is block-structured, which makes solving the block-structured Poisson equation more complicated. Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes an analysis of the monotonicity and optimality of dynamic energy-efficient policies more difficult and challenging. Therefore, this paper is a necessary and valuable generalization of Ma et al. [31] and Xia et al. [35] through extensively establishing the block-structured policy-based Markov processes, which in fact are the core part of the sensitivity-based optimization theory and its applications in various practical systems.
The second contribution of this paper is that it is the first to find an optimal asynchronous dynamic policy in the study of energy-efficient data centers. Note that the two groups of servers in the data center have “the setup actions from the sleep state to the work state” and “the close actions from the work state to the sleep state”, thus, we follow the two action steps to form an asynchronous dynamic policy, which is decomposed into two sub-policies: the setup policy (or the setup action) and the sleep policy (or the close action). Crucially, one of the successes of this paper is to find the optimal asynchronous dynamic policy from many asynchronous dynamic policies by means of the sensitivity-based optimization. To date, it has still been very difficult and challenging in the MDPs.
The third contribution of this paper is to provide a unified framework for applying the sensitivity-based optimization to study the optimal asynchronous dynamic policy of the energy-efficient data center. For such a more complicated energy-efficient data center, we first establish a policy-based block-structured Markov process as well as a more detailed cost and reward structure, and provide an expression for the unique solution to the block-structured Poisson equation by means of the RG-factorization. Then, we show the monotonicity of the long-run average profit with respect to the setup and sleep policies and the asynchronous policy, respectively. Based on this, we find the optimal asynchronous policy when the service price is higher (or lower) than a key threshold. Finally, we indicate that the optimal control is a bang–bang control. Such a structure of the optimal asynchronous energy-efficient policy reduces the search space, which is a significant reduction of the optimization complexity and effectively alleviates the curse of the dimensionality of MDPs. Therefore, the optimal asynchronous dynamic policy is the threshold-type in the energy-efficient data center. Note that the optimality of the threshold-type policy can realize a large reduction for the search space, thus, the optimal threshold-type policy is of great significance to solve the mechanism design problem of energy-efficient data centers. Therefore, the methodology and results developed in this paper provide new highlights for understanding dynamic energy-efficient policy optimization and mechanism design in the study of more general data centers.
The organization of this paper is as follows. In Section 2, we give a problem description for an energy-efficient data center with more practical factors. In Section 3, we establish a policy-based continuous-time block-structured Markov process and define a suitable reward function with respect to both states and policies of the Markov process. In Section 4, we set up a block-structured Poisson equation and provide an expression for its unique solution by means of the RG-factorization. In Section 5, we study a perturbation realization factor of the policy-based continuous-time block-structured Markov process for the asynchronous dynamic policy, and analyze how the service price impacts on the perturbation realization factor. In Section 6, we discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the asynchronous policy. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center. In Section 7, if the optimal asynchronous dynamic policy is the threshold-type, then we can compute the maximal long-run average profit of the energy-efficient data center. In Section 8, we give some concluding remarks. Finally, three appendices are given, both for the state-transition relation figure of the policy-based block-structured continuous-time Markov process and for the block entries of its infinitesimal generator.

2. Model Description

In this section, we provide a problem description for setting up and optimizing an asynchronous dynamic policy in an energy-efficient data center with two groups of different servers, a finite buffer, and a fast setup process. Additionally, we provide the system structure, operational mode, and mathematical notations in the energy-efficient data center.
Server groups: The data center contains two server groups: Groups 1 and 2, each of which is also one interactive subsystem of the data center. Groups 1 and 2 have m 1 and m 2 servers, respectively. Servers in the same group are homogeneous, while those in different groups are heterogeneous. Note that Group 1 is viewed as a base-line group whose servers are always at the work state even if some of them are idle, the purpose of which is to guarantee a necessary service capacity in the data center. Hence, each server in Group 1 always works regardless of whether it has a job or not, so that it must consume an amount of energy at any time. In contrast, Group 2 is regarded as a reserved group whose servers may either work or sleep so that each of the m 2 servers can switch its state between work and sleep. If one server in Group 2 is at the sleep state, then it consumes a smaller amount of energy than the work state, as maintaining only the sleep state requires very little energy.
A finite buffer: The data center has a finite buffer of size m 3 . Jobs must first enter the buffer, and then they are assigned to the groups (Group 1 is prior to Group 2) and subsequently to the servers. To guarantee that the float service capacity of Group 2 can be fully utilized when some jobs are taken from the buffer to Group 2, we assume that m 3 m 2 , i.e., the capacity of the buffer must be no less than the server number of Group 2. Otherwise, if there are more jobs waiting in the buffer, the jobs transferred from Group 2 to the buffer will be lost.
Arrival processes: The arrivals of jobs at the data center are a Poisson process with arrival rate λ . If the buffer is full, then any arriving job has to be lost immediately. This leads to an opportunity cost C 5 per unit of time for each lost job due to the full buffer.
Service processes: The service times provided by each server in Groups 1 and 2 are i.i.d. and exponential with service rates μ 1 and μ 2 , respectively. We assume that μ 1 μ 2 , which makes the prior use of servers in Group 1. The service discipline of each server in the data center is First Come First Serve (FCFS). If a job finishes its service at a server, then it immediately leaves the system. At the same time, the data center can obtain a fixed service reward (or service price) R from the served job.
Once a job enters the data center for its service, it has to pay holding costs per unit of time C 2 1 , C 2 2 , and C 2 3 in Group 1, Group 2, and the buffer, respectively. We assume that C 2 1 C 2 2 also guarantees the prior use of servers in Group 1. Therefore, to support the service priority, each server in Group 1 is not only faster but also cheaper than that in Group 2.
Switching between work and sleep: To save energy, the servers in Group 2 can switch between the work and sleep states. On the one hand, if there are more jobs waiting in the buffer, then Group 2 sets up and turns on some sleeping servers. This process usually involves a setup cost C 3 1 . However, the setup time is very short as it directly begins from the sleep state and it can be ignored. On the other hand, if the number of jobs in Group 2 is smaller, then the working servers are switched to the sleep state, while the incomplete-service jobs are transferred to the buffer and served as the arriving ones.
Transfer rules: (1) To Group 1. Based on the prior use of servers in Group 1, if a server in Group 1 becomes idle and there is no job in the buffer, then an incomplete-service job (if it exists) in Group 2 must immediately be transferred to the idle server in Group 1. Additionally, the data center needs to pay a transferred cost C 4 to the transferred job.
(2) To the buffer. If some servers in Group 2 are closed to the sleep state, then those jobs in the servers closed at the sleep state are transferred to the buffer, and a transferred cost C 3 2 is paid by the data center.
To keep the transferred jobs that can enter the buffer, we need to control the new jobs arriving at the buffer. If the sum of the job number in the buffer and the job number in Group 2 is equal to m 3 , then the newly arriving jobs must be lost immediately.
Power Consumption: The power consumption rates P 1 , W and P 2 , W are for the work states of servers in Groups 1 and 2, respectively, while P 2 , S is only for the sleep state of a server in Group 2. Note that each server in Group 1 does not have the sleep state and it is clear that P 1 , S = 0 . We assume that 0 < P 2 , S <   P 2 , W . There is no power consumption for keeping the jobs in the buffer. C 1 is the power consumption price per unit of the power consumption rate and per unit of time.
Independence: We assume that all the random variables in the data center defined above are independent.
Finally, to aid reader understanding, the data center, together with its operational mode and mathematical notations, is depicted in Figure 1. Table 1 summarizes some notations involved in the model. This will be helpful in our later study.
In the remainder of this section, it might be useful to provide some comparison of the above model assumptions with those in Ma et al. [31] and in Xia et al. [35].
Remark 1.
(1) Compared with our previous paper [31], this paper considers several new practical factors, such as a finite buffer, a fast setup process, and a job transfer rule. The new factors make our MDP modeling more practical and useful in the study of energy-efficient data centers. Although the new factors do not increase any difficulty in performance evaluation through modeling by means of queueing systems or Markov processes, they can cause substantially more difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining threshold-type policies by using the sensitivity-based optimization. Note that the difficulties mainly grow out of establishing the policy-based block-structured Markov process and solving the block-structured Poisson equation. On this occasion, we have simplified the above model descriptions: for example, the setup is immediate, the jobs can be transferred without delay either between the slow and fast servers or between the slow servers and the buffer, the jobs must be transferred as soon as the fast server becomes free, the finite buffer space is reserved for jobs in progress, and so on.
(2) For the energy-efficient data center operating with some buffer, it is seen from Figure A3 in Appendix B that the main challenge of our work is to focus on how to describe the policy-based block-structured Markov process. Obviously, (a) if there are more than two groups of servers, then it is easy to check that the policy-based Markov process will become multi-dimensional so that its analysis is very difficult; (b) if the buffer is infinite, then we have to deal with the policy-based block-structured Markov process with infinitely many levels, for which the discussion and computation are very complicated.
Remark 2.
Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes analysis for the monotonicity and optimality of the dynamic energy-efficient policies more difficult and challenging. Therefore, many cost and reward factors make the MDP analysis and the sensitivity-based optimization more complicated.

3. Optimization Model Formulation

In this section, for the energy-efficient data center, we first establish a policy-based continuous-time Markov process with a finite block structure. Then, we define a suitable reward function with respect to both states and policies of the Markov process. Note that this will be helpful and useful for setting up a MDP to find the optimal asynchronous dynamic policy in the energy-efficient data center.

3.1. A Policy-Based Block-Structured Continuous-Time Markov Process

The data center in Figure 1 shows Group 1 of m 1 servers, Group 2 of m 2 servers, and a buffer of size m 3 . We need to introduce both “states” and “policies” to express the stochastic dynamics of this data center. Let N 1 t , N 2 t , and N 3 t be the numbers of jobs in Group 1, Group 2, and the buffer, respectively. Therefore, N 1 t , N 2 t , N 3 t is regarded as a state of the data center at time t. Let all the cases of such a state N 1 t , N 2 t , N 3 t form a set as follows:
Ω = Ω 0 Ω 1 Ω 2 Ω m 2 Ω m 2 + 1 Ω m 2 + 2 ,
where
Ω 0 = 0 , 0 , 0 , 1 , 0 , 0 , , m 1 , 0 , 0 , Ω 1 = m 1 , 0 , 1 , m 1 , 0 , 2 , , m 1 , 0 , m 3 , Ω 2 = m 1 , 1 , 0 , m 1 , 1 , 1 , , m 1 , 1 , m 3 1 , Ω m 2 = m 1 , m 2 1 , 0 , m 1 , m 2 1 , 1 , , m 1 , m 2 1 , m 3 m 2 + 1 , Ω m 2 + 1 = m 1 , m 2 , 0 , m 1 , m 2 , 1 , , m 1 , m 2 , m 3 m 2 , Ω m 2 + 2 = m 1 , m 2 , m 3 m 2 + 1 , m 1 , m 2 , m 3 m 2 + 2 , , m 1 , m 2 , m 3 .
For a state n 1 , n 2 , n 3 , it is seen from the model description that there are four different cases: (a) By using the transfer rules, if n 1 = 0 , 1 , , m 1 1 , then n 2 = n 3 = 0 ; if either n 2 0 or n 3 0 , then n 1 = m 1 . (b) If n 1 = m 1 and n 2 = 0 , then the jobs in the buffer can increase until the waiting room is full, i.e., n 3 = 1 , 2 , , m 3 . (c) If n 1 = m 1 and n 2 = 1 , 2 , , m 2 1 , then the total numbers of jobs in Group 2 and the buffer are no more than the buffer size, i.e., n 2 + n 3 m 3 . (d) If n 1 = m 1 and n 2 = m 2 , then the jobs in the buffer can also increase until the waiting room is full, i.e., n 3 = 1 , 2 , , m 3 .
Now, for Group 2, we introduce an asynchronous dynamic policy, which is related to two dynamic actions (or sub-policies): from sleep to work (setup) and from work to sleep (close). Let d n 1 , n 2 , n 3 W and d n 1 , n 2 , n 3 S be the numbers of working servers and of sleeping servers in Group 2 at State n 1 , n 2 , n 3 , respectively. By observing the state set Ω , we call d W and d S the setup policy (i.e., from sleep to work) and the sleep policy (i.e., from work to sleep), respectively.
Note that the servers in Group 2 can only be set up when all of them are idle, while we cannot simultaneously have the setup policy ( d W ) because the servers in Group 2 are always affected by the sleep policy ( d S ) if they still work for some jobs. This is what we call asynchronous dynamic policies. Here, we consider the control optimization of the total system. For such sub-policies, we provide an interpretation of four different cases as follows:
(1) In Ω 0 , if n 1 = 0 , 1 , , m 1 1 , then n 2 = n 3 = 0 due to the transfer rule. Thus, there are no jobs in Group 2 or in the buffer, so that no policy in Group 2 is used;
(2) In Ω 1 , the states will affect how to use the setup policy. If n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 , then d m 1 , 0 , n 3 W is the number of working servers in Group 2 at State m 1 , 0 , n 3 . Note that some of the slow servers need to first start, so that some jobs in the buffer can enter the activated slow servers, thus, d m 1 , 0 , n 3 W 0 , 1 , , m 2 , each of which can possibly take place under an optimal dynamic policy;
(3) From Ω 2 to Ω m 2 + 1 , the states will affect how to use the sleep policy. If n 1 = m 1 , n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , then d m 1 , n 2 , n 3 S is the number of sleeping servers in Group 2 at State m 1 , n 2 , n 3 . We assume that the number of sleeping servers is no less than m 2 n 2 . Note that the sleep policy is independent of the work policy. Once the sleep policy is set up, the servers without jobs must enter the sleep state. At the same time, some working servers with jobs are also closed to the sleep state, and the jobs in those working servers are transferred to the buffer. It is easy to see that d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ;
(4) In Ω m 2 + 2 , if n 1 = m 1 and n 2 = m 2 , then n 3 may be any element in the set m 3     m 2   +   1 ,   m 3     m 2   +   2 , , m 3 , it is clear that n 2 + n 3 > m 3 .
Our aim is to determine when or under what conditions an optimal number of servers in Group 2 switch between the sleep state and the work state such that the long-run average profit of the data center is maximal. From the state space Ω , we define an asynchronous dynamic energy-efficient policy d as
d = d W d S ,
where d W and d S are the setup and sleep policies, respectively; ‘⊠’ denotes that the policies d W and d S occur asynchronously; and
d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , m 3 W , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , 1 , m 3 1 S ;   d m 1 , 2 , 0 S , d m 1 , 2 , 1 S , , d m 1 , 2 , m 3 2 S ; ; d m 1 , m 2 , 0 S , d m 1 , m 2 , 1 S , , d m 1 , m 2 , m 3 m 2 S .
Note that d W is related to the fact that if there is no job in Group 2 at the initial time, then all the servers in Group 2 are at the sleep state. Once there are jobs in the buffer, we quickly set up some servers in Group 2 such that they enter the work state to serve the jobs. Similarly, we can understand the sleep policy d S . In the state subset i = 2 m 2 + 2 Ω i , it is seen that the setup policy d W will not be needed because some servers are kept at the work state.
For all the possible policies d given in (1), we compose a policy space as follows:
D : = d = d W d S : d m 1 , 0 , n 3 W   0 , 1 , , m 2   for   1 n 3 m 3 ; d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2   for   m 1 , n 2 , n 3   Ω 2 Ω 3 Ω m 2 + 1 .
Let X d ( t ) = N 1 t , N 2 t , N 3 t d for any given policy d D . Then { X d ( t ) : t 0 } is a policy-based block-structured continuous-time Markov process on the state space Ω whose state transition relations are given in Figure A3 in Appendix B (we provide two simple special cases to understand such a policy-based block-structured continuous-time Markov process in Appendix A). Based on this, the infinitesimal generator of the Markov process X d ( t ) : t 0 is given by
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 1 , 4 Q 1 , m 2 + 1 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 4 , 1 Q 4 , 2 Q 4 , 3 Q 4 , 4 Q m 2 + 1 , 1 Q m 2 + 1 , 2 Q m 2 + 1 , 3 Q m 2 + 1 , 4 Q m 2 + 1 , m 2 + 1 Q m 2 + 1 , m 2 + 2 Q m 2 + 2 , m 2 + 1 Q m 2 + 2 , m 2 + 2 ,
where every block element Q i , j depends on the policy d (for simplification of description, here we omit “ d ”) and it is expressed in Appendix C.
It is easy to see that the infinitesimal generator Q d has finite states, and it is irreducible with Q d e = 0 , thus, the Markov process Q d is a positive recurrent. In this case, we write the stationary probability vector of the Markov process X d ( t ) : t 0 as
π d = π 0 d , π 1 d , , π m 2 + 2 d ,   d   D ,
where
π 0 d = π d 0 , 0 , 0 , π d 1 , 0 , 0 , , π d m 1 , 0 , 0 , π 1 d = π d m 1 , 0 , 1 , π d m 1 , 0 , 2 , , π d m 1 , 0 , m 3 , π 2 d = π d m 1 , 1 , 0 , π d m 1 , 1 , 1 , , π d m 1 , 1 , m 3 1 ,   π m 2 d = π d m 1 , m 2 1 , 0 , π d m 1 , m 2 1 , 1 , , π d m 1 , m 2 1 , m 3 m 2 + 1 , π m 2 + 1 d = π d m 1 , m 2 , 0 , π d m 1 , m 2 , 1 , , π d m 1 , m 2 , m 3 m 2 , π m 2 + 2 d = π d m 1 , m 2 , m 3 m 2 + 1 , π d m 1 , m 2 , m 3 m 2 + 2 , , π d m 1 , m 2 , m 3 .
Note that the stationary probability vector π d can be obtained by means of solving the system of linear equations π d Q d = 0 and π d e = 1 , where e is a column vector of the ones with a suitable size. To this end, the RG-factorizations play an important role in our later computation. Note that some computational details are given in Chapter 2 in Li [33].
Now, we use UL-type RG-factorization to compute the stationary probability vector π d as follows. For 0 i , j k and 0 k m 2 + 2 , we write
Q i , j k = Q i , j + n = k + 1 m 2 + 2 Q i , n n Q n , n n 1 Q n , j n .
Clearly, Q i , j m 2 + 2 = Q i , j and Q i , j 0 = Q i , j 0 . Let
U n d = Q n , n n , 0 n m 2 + 2 , R i , j d = Q i , j j U j d 1 , 0 i < j m 2 + 2 ,
and
G i , j d = U i d 1 Q i , j i , 0 j < i m 2 + 2 .
Then the UL-type RG-factorization is given by
Q d = I R U d I U D d I G L d ,
where
R U d = 0 R 0 , 1 d R 0 , 2 d R 0 , m 2 + 1 d R 0 , m 2 + 2 d 0 R 1 , 2 d R 1 , m 2 + 1 d R 1 , m 2 + 2 d 0 0 R m 2 , m 2 + 1 d R m 2 , m 2 + 2 d 0 R m 2 + 1 , m 2 + 2 d 0 ,
U D d = diag U 0 d , U 1 d , , U m 2 + 1 d , U m 2 + 2 d
and
G L d = 0 G 1 , 0 d 0 G 2 , 0 d G 2 , 1 d 0 G 3 , 0 d G 3 , 1 d G 3 , 2 d 0 G m 2 + 2 , 0 d G m 2 + 2 , 1 d G m 2 + 2 , 2 d G m 2 + 2 , m 2 + 1 d 0 .
By using Theorem 2.9 of Chapter 2 in Li [33], the stationary probability vector of the Markov process Q d is given by
π 0 d = τ d x 0 d , π k d = i = 0 k 1 π i d R i , k d , 1 k m 2 + 2 ,
where x 0   d is the stationary probability vector of the censored Markov chain U 0 d to level 0, and the positive scalar τ d is uniquely determined by k = 0 m 2 + 2 π k d e = 1 .
Remark 3.
The RG-factorizations provide a unified, constructive and algorithmic framework for the numerical computation of many practical stochastic systems. It can be applied to provide effective solutions for the block-structured Markov processes, and are shown to be also useful for the optimal design and dynamical decision-making of many practical systems. See more details in Li [33].
The following theorem provides some useful observations on some special policies d W d S   D , in which the special policies will have no effect on the infinitesimal generator Q d W d S or the stationary probability vector π d W d S .
Theorem 1.
Suppose that two asynchronous energy-efficient policies d W 1 d S , d W 2 d S   D satisfy one of the following two conditions: ( a ) for each n 3 = 1 , 2 , , m 2 , if d m 1 , 0 , n 3 W 1   n 3 , n 3 + 1 , , m 2 , then we take d m 1 , 0 , n 3 W 2 as any element of the set 1 , 2 , , m 2 ; ( b ) for each n 3 = m 2 + 1 , m 2 + 2 , , m 3 , if d m 1 , 0 , n 3 W 1   1 , 2 , , m 2 , then we take d m 1 , 0 , n 3 W 2 = d m 1 , 0 , n 3 W 1 . Under both such conditions, we have
Q d W 1 d S = Q d W 2 d S ,   π d W 1 d S = π d W 2 d S .
Proof of Theorem 1.
It is easy to see from (2) that all the levels of the matrix Q d W 1 d S are the same as those of the matrix Q d W 2 d S , except level 1. Thus, we only need to compare level 1 of the matrix Q d W 1 d S with that of the matrix Q d W 2 d S .
For the two asynchronous energy-efficient policies d W 1 d S , d W 2 d S   D satisfying the conditions (a) and (b), by using 1 d m 1 , 0 , n 3 W n 3 in (2), it is clear that for n 3 = 1 , 2 , , m 2 , if d m 1 , 0 , n 3 W 1 , d m 1 , 0 , n 3 W 2   n 3 , n 3 + 1 , , m 2 , then
1 d m 1 , 0 , n 3 W 1 n 3 = 1 d m 1 , 0 , n 3 W 2 n 3 .
Thus, it follows from (2) that Q 1 , k d W 1 d S = Q 1 , k d W 2 d S , k = 1 , 2 , , m 2 + 1 . This also gives that Q d W 1 d S = Q d W 2 d S , and thus π d W 1 d S = π d W 2 d S . This completes the proof. □
Note that Theorem 1 will be necessary and useful for analyzing the policy monotonicity and optimality in our later study. Furthermore, see the proof of Theorem 4.
Remark 4.
This paper is the first to introduce and consider the asynchronous dynamic policy in the study of energy-efficient data centers. We highlight the impact of the two asynchronous sub-policies: the setup and sleep policies on the long-run average profit of the energy-efficient data center.

3.2. The Reward Function

For the Markov process Q d , now we define a suitable reward function for the energy-efficient data center.
Based on the above costs and price notations in Table 1, a reward function with respect to both states and policies is defined as a profit rate (i.e., the total revenues minus the total costs per unit of time). Therefore, according to the impact of the asynchronous dynamic policy on the profit rate, the reward function at State n 1 , n 2 , n 3 under policy d is divided into four cases as follows:
Case (a): For n 1 = 0 , 1 , , m 1 and n 2 = n 3 = 0 , the profit rate is not affected by any policy, and we have
f n 1 , 0 , 0 = R n 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 n 1 C 2 1 .
Note that in Case (a), there is no job in Group 2 or in the buffer. Thus, it is clear that each server in Group 2 is at the sleep state.
However, in the following two cases (b) and (c), since there are some jobs either in Group 2 or in the buffer, the policy d will play a key role in opening (i.e., setup) or closing (i.e., sleep) some servers of Group 2 to save energy efficiently.
Case (b): For n 1 = m 1 , n 2 = 0 and n 3 = 1 , 2 , , m 3 , the profit rate is affected by the setup policy d W , we have
f d W m 1 , 0 , n 3 = R m 1 μ 1 m 1 P 1 , W + d m 1 , 0 , n 3 W P 2 , W + m 2 d m 1 , 0 , n 3 W P 2 , S C 1       m 1 C 2 1 + n 3 C 2 3 d m 1 , 0 , n 3 W C 3 1 λ 1 n 1 = m 1 , n 2 = 0 , n 3 = m 3 C 5 ,
where 1 { · } is an indicator function whose value is 1 when the event is in { · } , otherwise its value is 0.
Case (c): For n 1 = m 1 , n 2 = 1 , 2 , , m 2 and n 3 = 0 , 1 , , m 3 n 2 , the profit rate is affected by the sleep policy d S , we have
f d S m 1 , n 2 , n 3 = R m 1 μ 1 + m 2 d m 1 , n 2 , n 3 S μ 2 m 1 P 1 , W + d m 1 , n 2 , n 3 S P 2 , S + m 2 d m 1 , n 2 , n 3 S P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 d m 1 , n 2 , n 3 S C 3 2       m 1 μ 1 1 n 2 > 0 , n 3 = 0 C 4 λ 1 n 1 = m 1 , n 2 + n 3 = m 3 C 5 .
Note that the job transfer rate from Group 2 to Group 1 is given by n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } . If 0 n 1 m 1 1 , then n 2 = 0 and n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = 0 . If n 1 = m 1 and n 2 = 0 , then n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = 0 . If n 1 = m 1 , 1 n 2 m 2 and n 3 = 0 , then n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = m 1 μ 1 C 4 .
Case (d): For n 1 = m 1 , n 2 = m 2 and n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 , the profit rate is not affected by any policy, we have
f m 1 , m 2 , n 3 = R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + m 2 C 2 2 + n 3 C 2 3 λ 1 n 1 = m 1 , n 2 = m 2 , n 3 = m 3 C 5 .
We define a column vector composed of the elements f n 1 , n 2 , n 3 , f d W n 1 , n 2 , n 3 and f d S n 1 , n 2 , n 3 as
f d = f 0 T , f 1 d W T , f 2 d S T , , f m 2 + 1 d S T , f m 2 + 2 T T ,
where
f 0 = f 0 , 0 , 0 , f 1 , 0 , 0 , , f m 1 , 0 , 0 T , f 1 d W = f d W m 1 , 0 , 1 , f d W m 1 , 0 , 2 , , f d W m 1 , 0 , m 3 T , f 2 d S = f d S m 1 , 1 , 0 , f d S m 1 , 1 , 1 , , f d S m 1 , 1 , m 3 1 T ,   f m 2 + 1 d S = f d S m 1 , m 2 , 0 , f d S m 1 , m 2 , 1 , , f d S m 1 , m 2 , m 3 m 2 T , f m 2 + 2 = f m 1 , m 2 , m 3 m 2 + 1 , f m 1 , m 2 , m 3 m 2 + 2 , , f m 1 , m 2 , m 3 T .
In the remainder of this section, the long-run average profit of the data center (or the policy-based continuous-time Markov process X d ( t ) : t 0 ) under an asynchronous dynamic policy d is defined as
η d = lim T + E 1 T 0 T f d X d ( t ) d t = π d f d ,
where π d and f d are given by (3) and (8), respectively.
We observe that as the number of working servers in Group 2 decreases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. On the other hand, as the number of sleeping servers in Group 2 increases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. Thus, there is a tradeoff between the total revenues and the total costs for a suitable number of working and/or sleeping servers in Group 2 by using the setup and sleep policies, respectively. This motivates us to study an optimal dynamic control mechanism for the energy-efficient data center. Thus, our objective is to find an optimal asynchronous dynamic policy d * such that the long-run average profit η d is maximized, that is,
d * = arg max d D η d .
Since the setup and sleep policies d W and d S occur asynchronously, they cannot interact with each other at any time. Therefore, it is seen that the optimal policy can be decomposed into
d * = d W * d S * = arg max d W   D η d W arg max d S   D η d S .
In fact, it is difficult and challenging to analyze the properties of the optimal asynchronous dynamic policy d * = d W * d S * , and to provide an effective algorithm for computing the optimal policy d * . To do this, in the next sections we will introduce the sensitivity-based optimization theory to study this energy-efficient optimization problem.

4. The Block-Structured Poisson Equation

In this section, for the energy-efficient data center, we set up a block-structured Poisson equation which provides a useful relation between the sensitivity-based optimization and the MDP. Additionally, we use the RG-factorization, given in Li [33], to solve the block-structured Poisson equation and provide an expression for its unique solution.
For d   D , it follows from Chapter 2 in Cao [34] that for the policy-based continuous-time Markov process X d ( t ) : t 0 , we define the performance potential as
g d n 1 , n 2 , n 3 = E 0 + f ( d ) X d ( t ) η d d t X d ( 0 ) = n 1 , n 2 , n 3 ,
where η d is defined in (9). It is seen from Cao [34] that for any policy d D , g d n 1 , n 2 , n 3 quantifies the contribution of the initial state n 1 , n 2 , n 3 to the long-run average profit of the data center. Here, g d n 1 , n 2 , n 3 is also called the relative value function or the bias in the traditional Markov decision process theory, see, e.g., Puterman [39] for more details. We further define a column vector g d with elements g d n 1 , n 2 , n 3 for ( n 1 , n 2 , n 3 )   Ω
g d = g 0 d , g 1 d , g 2 d , , g m 2 + 1 d , g m 2 + 2 d T ,
where
g 0 d = g d 0 , 0 , 0 , g d 1 , 0 , 0 , , g d m 1 , 0 , 0 T , g 1 d = g d m 1 , 0 , 1 , g d m 1 , 0 , 2 , , g d m 1 , 0 , m 3 T , g 2 d = g d m 1 , 1 , 0 , g d m 1 , 1 , 1 , , g d m 1 , 1 , m 3 1 T , g m 2 + 1 d = g d m 1 , m 2 , 0 , g d m 1 , m 2 , 1 , , g d m 1 , m 2 , m 3 m 2 T , g m 2 + 2 d = g d m 1 , m 2 , m 3 m 2 + 1 , g d m 1 , m 2 , m 3 m 2 + 2 , , g d m 1 , m 2 , m 3 T .
A similar computation to that in Ma et al. [31], the block-structured Poisson equation is given by
Q d g ( d ) = η d e f ( d ) ,
where η d is defined in (9), f d is given in (8), and Q d is given in (2).
To solve the system of linear equations (13), we note that rank Q d = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 and det Q d = 0 because the size of the matrix Q d is m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 + 1 . Hence, this system (13) of linear equations exists with infinitely many solutions with a free constant of an additive term. Let Q be a matrix obtained through omitting the first row and the first column vectors of the matrix Q d . Then,
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 1 , 4 Q 1 , m 2 + 1 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 4 , 1 Q 4 , 2 Q 4 , 3 Q 4 , 4 Q m 2 + 1 , 1 Q m 2 + 1 , 2 Q m 2 + 1 , 3 Q m 2 + 1 , 4 Q m 2 + 1 , m 2 + 1 Q m 2 + 1 , m 2 + 2 Q m 2 + 2 , m 2 + 1 Q m 2 + 2 , m 2 + 2 ,
where
Q 0 , 0 = λ + μ 1 λ 2 μ 1 λ + 2 μ 1 λ m 1 1 μ 1 λ + m 1 1 μ 1 λ m 1 μ 1 λ + m 1 μ 1 ,
Q 0 , 1 is obtained by means of omitting the first row vector of Q 0 , 1 , Q 1 , 0 and Q 2 , 0 are obtained from omitting the first column vectors of Q 1 , 0 and Q 2 , 0 , respectively. The other block entries in Q d are the same as the corresponding block entries in the matrix Q d .
Note that, rank Q d = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 and the size of the matrix Q d is m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 . Hence, the matrix Q d is invertible.
Let h d and φ d be two column vectors of size m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 obtained through omitting the first element of the two column vectors f d η d e and g ( d ) of size m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 + 1 , respectively, and
l 0 = m 1 , l 1 = m 1 + m 3 , l 2 = m 1 + 2 m 3 , l m 2 + 1 = m 1 + m 2 2 m 2 2 2 + m 2 m 3 + m 3 , L = m 1 + 3 m 2 2 m 2 2 2 + m 2 m 3 + m 3 .
Then,
h d = f 0 η d f 1 d W η d f 2 d S η d f m 2 + 1 d S η d f m 2 + 2 η d = def h 0 d h 1 d h 2 d h m 2 + 1 d h m 2 + 2 d ,   φ ( d ) = def g 0 d g 1 d g 2 d g m 2 + 1 d g m 2 + 2 d ,
where f 0 and g 0 d are the two column vectors, which are obtained through omitting the scale entries f 0 , 0 , 0 and g d 0 , 0 , 0 of f 0 and g 0 d , respectively, and
h 0 d = h 1 , h 2 , , h l 0 T , h 1 d = h l 0 + 1 , h l 1 + 2 , , h l 1 T , h 2 d = h l 1 + 1 , h l 2 + 2 , , h l 2 T , h m 2 + 1 d = h l m 2 + 1 , h l m 2 + 1 + 2 , , h l m 2 + 1 T , h m 2 + 2 d = h l m 2 + 1 + 1 , h l m 2 + 2 + 2 , , h L T .
Therefore, it follows from (13) that
Q d φ ( d ) = h d + μ 1 e 1 g d 0 , 0 , 0 ,
where e 1 is a column vector with the first element being one and all the others being zero. Note that the matrix Q d is invertible and Q d 1 > 0 , thus the system (16) of linear equations always exists with one unique solution:
φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 · ,
where g d 0 , 0 , 0 = is any given positive constant. For the convenience of computation, we take g d 0 , 0 , 0 = = 1 . In this case, we have
φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 .
Note that the expression of the invertible matrix Q d 1 can be obtained by means of the RG-factorization, which is given in Li [33] for general Markov processes.
For convenience of computation, we write
Q d 1 = Q 0 T , Q 1 T , Q 2 T , , Q m 2 + 1 T , Q m 2 + 2 T T ,
and every element of the matrix Q r is written by a scalar q n , l r , we denote by n a system state under the certain block, and l the index of element, where r = 0 , 1 , , m 2 + 2 , l = 1 , 2 , , L , for L = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 , and
n = 1 , 2 , , m 1 ,   for   r = 0 , 1 , 2 , , m 3 ,   for   r = 1 , 0 , 1 , , m 3 r + 1 ,   for   2 r m 2 + 1 , m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,   for   r = m 2 + 2 .
It is easy to check that
Q 0 = q 1 , 1 0 q 1 , 2 0 q 1 , L 0 q 2 , 1 0 q 2 , 2 0 q 2 , L 0 q m 1 , 1 0 q m 1 , 2 0 q m 1 , L 0 L × m 1 , Q 1 = q 1 , 1 1 q 1 , 2 1 q 1 , L 1 q 2 , 1 1 q 2 , 2 1 q 2 , L 1 q m 3 , 1 1 q m 3 , 2 1 q m 3 , L 1 L × m 3 ,
for 2 r m 2 + 1 ,
Q r = q 0 , 1 r q 0 , 2 r q 0 , L r q 1 , 1 r q 1 , 2 r q 1 , L r q m 3 r + 1 , 1 r q m 3 r + 1 , 2 r q m 3 r + 1 , L r L × m 3 r + 2 ,
and
Q m 2 + 2 = q m 3 m 2 + 1 , 1 m 2 + 2 q m 3 m 2 + 1 , 2 m 2 + 2 q m 3 m 2 + 1 , L m 2 + 2 q m 3 m 2 + 2 , 1 m 2 + 2 q m 3 m 2 + 2 , 2 m 2 + 2 q m 3 m 2 + 2 , L m 2 + 2 q m 3 , 1 m 2 + 2 q m 3 , 2 m 2 + 2 q m 3 , L m 2 + 2 L × m 2 .
The following theorem provides an expression for the vector φ ( d ) under a constraint condition g d 0 , 0 , 0 = = 1 . Note that this expression is very useful for applications of the sensitivity-based optimization theory to the study of Markov decision processes in our later study.
Theorem 2.
If g d 0 , 0 , 0 = 1 , then for n 1 = 1 , 2 , , m 1 ,
g d n 1 , 0 , 0 = l = 1 L q n 1 , l 0 h l + μ 1 q n 1 , 1 0 ;
for n 3 = 1 , 2 , , m 3 ,
g d m 1 , 0 , n 3 = l = 1 L q n 3 , l 1 h l + μ 1 q n 3 , 1 1 ;
for n 2 = r 1 , n 3 = 0 , 1 , , m 3 n 2 , and 2 r m 2 + 1 ,
g d m 1 , n 2 , n 3 = l = 1 L q n 3 , l r h l + μ 1 q n 3 , 1 r ;
for n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
g d m 1 , m 2 , n 3 = l = 1 L q n 3 , l m 2 + 2 h l + μ 1 q n 3 , 1 m 2 + 2 .
Proof of Theorem 2.
It is seen from (18) that we need to compute two parts: Q d 1 h d and μ 1 Q d 1 e 1 . Note that
Q 1 h d = l = 1 L q 1 , l 0 h l l = 1 L q m 1 , l 0 h l l = 1 L q 0 , l r h l l = 1 L q m 3 r + 1 , l r h l l = 1 L q m 3 m 2 + 1 , l m 2 + 2 h l l = 1 L q m 3 , l m 2 + 2 h l L × 1   and μ 1 Q 1 e 1 = μ 1 q 1 , 1 0 q m 1 , 1 0 q 0 , 1 r q m 3 r + 1 , 1 r q m 3 m 2 + 1 , 1 m 2 + 2 q m 3 , 1 m 2 + 2 L × 1 .
thus a simple computation for the vector φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 can obtain our desired results. This completes the proof. □

5. Impact of the Service Price

In this section, we study the perturbation realization factor of the policy-based continuous-time Markov process both for the setup policy and for the sleep policy (i.e., they form the asynchronous energy-efficient policy), and analyze how the service price impacts on the perturbation realization factor. To do this, our analysis includes the following two cases: the setup policy and the sleep policy. Note that the results given in this section will be useful for establishing the optimal asynchronous dynamic policy of the energy-efficient data center in later sections.
It is a key in our present analysis that the setup policy and the sleep policy are asynchronous at any time; thus, we can discuss the perturbation realization factor under the asynchronous dynamic policy from two different computational steps.

5.1. The Setup Policy

For the performance potential vector φ ( d ) under a constraint condition g d 0 , 0 , 0 = 1 , we define a perturbation realization factor as
G d n , n = def g d n g d n ,
where n = n 1 , n 2 , n 3 , n = n 1 , n 2 , n 3 . We can see that G d n , n quantifies the difference between two performance potentials g d n 1 , n 2 , n 3 and g d n 1 , n 2 , n 3 . It measures the long-run effect on the average profit of the data center when the system state is changed from n = n 1 , n 2 , n 3 to n = n 1 , n 2 , n 3 . For our next discussion, through observing the state space, it is necessary to define some perturbation realization factors as follows:
G 1 d W = def g d m 1 , i 1 , n 3 i 1 g d m 1 , i 2 , n 3 i 2 , G 2 d W = def g d m 1 , 0 , n 3 g d m 1 , i 1 , n 3 i 1 , G 3 d W = def g d m 1 , 0 , n 3 g d m 1 , i 2 , n 3 i 2 ,
where 0 i 2 < i 1 m 2 and n 3 = 0 , 1 , , m 3 .
It follows from Theorem 2 that
g d m 1 , 0 , n 3 = l = 1 L q n 3 , l 1 h l + μ 1 q n 3 , 1 1 , g d m 1 , i 1 , n 3 i 1 = l = 1 L q n 3 i 1 , l i 1 + 1 h l + μ 1 q n 3 i 1 , 1 i 1 + 1 , g d m 1 , i 2 , n 3 i 2 = l = 1 L q n 3 i 2 , l i 2 + 1 h l + μ 1 q n 3 i 2 , 1 i 2 + 1 .
To express the perturbation realization factor by means of the service price R, we write
A 0 = 0 , B 0 = m 1 P 1 , W + m 2 P 2 , S C 1 > 0 ;
for 1 l l 0 and n 1 = 1 , 2 , , m 1 , n 2 = n 3 = 0 ,
A l = n 1 μ 1 > 0 ,     B l = m 1 P 1 , W + m 2 P 2 , S C 1 + n 1 C 2 1 > 0 ;
for l 0 + 1 l l 1 , and n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 ,
A l = m 1 μ 1 > 0 , B l d W = m 1 P 1 , W + d m 1 , 0 , n 3 W P 2 , W + m 2 d m 1 , 0 , n 3 W P 2 , S C 1 + m 1 C 2 1 + n 3 C 2 3 + d m 1 , 0 , n 3 W C 3 1 + λ 1 n 1 = m 1 , n 2 = 0 , n 3 = m 3 C 5 > 0 ;
for l 1 + 1 l l m 2 + 1 , and n 1 = m 1 , n 2 = 1 , 2 , , m 2 ,   n 3 = 0 , 1 , , m 3 n 2 ,
A l d S = m 1 μ 1 + ( m 2 d m 1 , n 2 , n 3 S ) μ 2 > 0 , B l d S = m 1 P 1 , W + d m 1 , n 2 , n 3 S P 2 , S + m 2 d m 1 , n 2 , n 3 S P 2 , W C 1 + m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 + n 2 m 2 d m 1 , n 2 , n 3 S C 3 2 + m 1 μ 1 1 n 3 = 0 C 4 + λ 1 n 2 + n 3 = m 3 C 5 > 0 ;
for l m 2 + 1 + 1 l L , and n 1 = m 1 , n 2 = m 2 ,   n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
A l = m 1 μ 1 + m 2 μ 2 > 0 ,
B l = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 + m 2 C 2 2 + n 3 C 2 3 + λ 1 n 1 = m 1 , n 2 = m 2 , n 3 = m 3 C 5 > 0 .
Then for 1 l l 0 and n 1 = 0 , 1 , , m 1 , n 2 = n 3 = 0 ,
f n 1 , 0 , 0 = R A l B l ;
for l 0 + 1 l l 1 , and n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 ,
f d W m 1 , 0 , n 3 = R A l B l d W ;
for l 1 + 1 l l m 2 + 1 , and n 1 = m 1 , n 2 = 1 , 2 , , m 2 ,   n 3 = 0 , 1 , , m 3 n 2 ,
f d S m 1 , n 2 , n 3 = R A l d S B l d S ;
for l m 2 + 1 + 1 l L , and n 1 = m 1 , n 2 = m 2 ,   n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
f d m 1 , m 2 , n 3 = R A l B l .
We rewrite π d as
π 0 d = π 0 ; π 1 , π 2 , , π l 0 , π 1 d = π l 0 + 1 , π l 0 + 2 , , π l 1 , π 2 d = π l 1 + 1 , π l 1 + 2 , , π l 2 , π m 2 + 1 d = π l m 2 + 1 , π l m 2 + 2 , , π l m 2 + 1 , π m 2 + 2 d = π l m 2 + 1 + 1 , π l m 2 + 1 + 2 , , π L .
Then it is easy to check that
D d = π 0 A 0 + l = 0 l 0 π l A l + l = l 0 + 1 l 1 π l A l + l = l 1 + 1 l m 2 + 1 π l A l d S + l = l m 2 + 1 + 1 L π l A l > 0 ,
and
F d = π 0 B 0 + l = 0 l 0 π l B l + l = l 0 + 1 l 1 π l B l d W + l = l 1 + 1 l m 2 + 1 π l B l d S + l = l m 2 + 1 + 1 L π l B l > 0 .
Thus, we obtain
η d = π d f d = n 1 = 0 m 1 π d n 1 , 0 , 0 f n 1 , 0 , 0 + n 3 = 1 m 3 π d m 1 , 0 , n 3 f d W m 1 , 0 , n 3 + n 3 = 0 m 3 n 2 n 2 = 0 m 2 π d m 1 , n 2 , n 3 f d S m 1 , n 2 , n 3 + n 3 = m 3 m 2 + 1 m 3 π d m 1 , m 2 , n 3 f m 1 , m 2 , n 3 = R D d F d .
It follows from (15) that for 1 l l 0 ,
h l = R A l D d B l F d ;
for l 0 + 1 l l 1 ,
h l = R A l D d B l d W F d ;
for l 1 + 1 l l m 2 + 1 ,
h l = R A l d S D d B l d S F d ;
for l m 2 + 1 + 1 l L ,
h l = R A l D d B l F d .
If a job finishes its service at a server and leaves this system immediately, then the data center can obtain a fixed revenue (i.e., the service price) R from such a served job. Now, we study the influence of the service price R on the perturbation realization factor. Note that all the numbers q n , l r are positive and are independent of the service price R, while all the numbers h l are the linear functions of R. We write
W n r = l = 1 l 0 q n , l r h l A l D d + l = l 0 + 1 l 1 q n , l r h l A l D d + l = l 1 + 1 l m 2 + 1 q n , l r h l A l d S D d + l = l m 2 + 1 + 1 L q n , l r h l A l D d + μ 1 q n , 1 r
and
V n r = l = 1 l 0 q n , l r h l B l F d + l = l 0 + 1 l 1 q n , l r h l B l d W F d + l = l 1 + 1 l m 2 + 1 q n , l r h l B l d S F d + l = l m 2 + 1 + 1 L q n , l r h l B l F d ,
then for i 1 , i 2 = 0 , 1 , , m 2 , we obtain
G 1 d W = R W n 3 i 1 i 1 + 1 W n 3 i 2 i 2 + 1 V n 3 i 1 i 1 + 1 V n 3 i 2 i 2 + 1 , G 2 d W = R W n 3 1 W n 3 i 1 i 1 + 1 V n 3 1 V n 3 i 1 i 1 + 1 , G 3 d W = R W n 3 1 W n 3 i 2 i 2 + 1 V n 3 1 V n 3 i 2 i 2 + 1 .
Now, we define
G d W = m 1 μ 1 G 1 d W i 1 μ 2 G 2 d W + β 1 + i 2 μ 2 G 3 d W + β 1 ,
where β 1 is defined as
β 1 = P 2 , W P 2 , S C 1 + C 3 1 μ 2 .
From the later discussion in Section 6, we will see that G d W plays a fundamental role in the performance optimization of data centers, and the sign of G d W directly determines the selection of decision actions, as shown in (38) later. To this end, we analyze how the service price can impact on G d W as follows. Substituting (21) into the linear equation G d W = 0 , we obtain
R = φ i 1 V n 3 i 1 i 1 + 1 φ i 2 V n 3 i 2 i 2 + 1 ψ i 1 , i 2 V n 3 1 β 1 φ i 1 W n 3 i 1 i 1 + 1 φ i 2 W n 3 i 2 i 2 + 1 ψ i 1 , i 2 W n 3 1 ,
where φ i 1 = m 1 μ 1 + i 1 μ 2 , φ i 2 = m 1 μ 1 + i 2 μ 2 and ψ i 1 , i 2 = i 1 i 2 μ 2 .
Thus, the unique solution of the price R in (22) is given by
d W i 1 , i 2 = φ i 1 V n 3 i 1 i 1 + 1 φ i 2 V n 3 i 2 i 2 + 1 ψ i 1 , i 2 V n 3 1 β 1 φ i 1 W n 3 i 1 i 1 + 1 φ i 2 W n 3 i 2 i 2 + 1 ψ i 1 , i 2 W n 3 1 ,
It is easy to see from (22) that (a) if R d W i 1 , i 2 , then G d W 0 ; and (b) if R d W i 1 , i 2 , then G d W 0 .
In the energy-efficient data center, we define two critical values, related to the service price, as
R H W = max d   D 0 , d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1
and
R L W = min d   D d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1
The following proposition uses the two critical values, which are related to the service price, to provide a key condition whose purpose is to establish a sensitivity-based optimization framework of the energy-efficient data center in our later study. Additionally, this proposition will be useful in the next section for studying the monotonicity of the asynchronous energy-efficient policies.
Proposition 1.
(1) If R R H W , then for any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , we have
G d W 0 .
(2) If 0 R R L W , then for any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , we have
G d W 0 .
Proof of Proposition 1.
(1) For any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , since R R H W and R H W = max d   D 0 , d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1 , this gives
R d W i 1 , i 2 .
Thus, for any couple i 1 , i 2 with 0 i 2 < i 1 m 2 this makes that G d W 0 .
(2) For any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , if 0 R R L W , we get
R d W i 1 , i 2 ,
this gives that G d W 0 . This completes the proof. □

5.2. The Sleep Policy

The analysis for the sleep policy is similar to that of the setup policy given in the above subsection. Here, we shall provide only a simple discussion.
We define the perturbation realization factor for the sleep policy as follows:
G 1 d S = def g d m 1 , j 2 , n 3 + n 2 j 2 g d m 1 , j 1 , n 3 + n 2 j 1 , G 2 d S = def g d m 1 , n 2 , n 3 g d m 1 , j 1 , n 3 + n 2 j 1 , G 3 d S = def g d m 1 , n 2 , n 3 g d m 1 , j 2 , n 3 + n 2 j 2 ,
where 0 j 2 < j 1 n 2 , n 2 = 0 , 1 , , m 2 and n 3 = 0 , 1 , , m 3 .
It follows from Theorem 2 that
g d m 1 , n 2 , n 3 = l = 1 L q n 3 , l n 2 + 1 h l + μ 1 q n 3 , 1 n 2 + 1 , g d m 1 , j 1 , n 3 + n 2 j 1 = l = 1 L q n 3 + n 2 j 1 , l j 1 + 1 h l + μ 1 q n 3 + n 2 j 1 , 1 j 1 + 1 , g d m 1 , j 2 , n 3 + n 2 j 2 = l = 1 L q n 3 + n 2 j 2 , l j 2 + 1 h l + μ 1 q n 3 + n 2 j 2 , 1 j 2 + 1 .
Similarly, to express the perturbation realization factor by means of the service price R, we write
G d S = m 1 μ 1 G 1 d S + j 1 μ 2 G 2 d S + β 2 j 2 μ 2 G 3 d S + β 2 ,
where
G 1 d S = R W n 3 + n 2 j 2 j 2 + 1 W n 3 + n 2 j 1 j 1 + 1 V n 3 + n 2 j 2 j 2 + 1 V n 3 + n 2 j 1 j 1 + 1 , G 2 d S = R W n 3 n 2 + 1 W n 3 + n 2 j 1 j 1 + 1 V n 3 n 2 + 1 V n 3 + n 2 j 1 j 1 + 1 , G 3 d S = R W n 3 n 2 + 1 W n 3 + n 2 j 2 j 2 + 1 V n 3 n 2 + 1 V n 3 + n 2 j 2 j 2 + 1 ,
and
β 2 = R + P 2 , W P 2 , S C 1 + C 3 2 μ 2 .
Now, we analyze how the service price impacts on G d S as follows: Substituting (29) into the linear equation G d S = 0 , we obtain
R = φ j 1 V n 3 + n 2 j 1 j 1 + 1 φ j 2 V n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 V n 3 n 2 + 1 β 2 φ j 1 W n 3 + n 2 j 1 j 1 + 1 φ j 2 W n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 μ 2 W n 3 n 2 + 1 .
Then, the unique solution of the price R in (30) is given by
d S j 1 , j 2 = φ j 1 V n 3 + n 2 j 1 j 1 + 1 φ j 2 V n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 V n 3 n 2 + 1 β 2 φ j 1 W n 3 + n 2 j 1 j 1 + 1 φ j 2 W n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 μ 2 W n 3 n 2 + 1 .
It is easy to see from Equation (30) that (a) if R d S j 1 , j 2 , then G d S 0 ; and (b) if R d S j 1 , j 2 , then G d S 0 .
In the energy-efficient data center, we relate to the service price and define two critical values as
R H S = max d   D 0 , d S 1 , 0 , d S 2 , 0 , , d S m 2 , m 2 1
and
R L S = min d   D d S 1 , 0 , d S 2 , 0 , , d S m 2 , m 2 1
The following proposition is similar to Proposition 1, thus its proof is omitted here.
Proposition 2.
(1) If R R H S , then for any d   D and for each couple j 1 , j 2 with 0 j 2 < j 1 n 2 , we have
G d S 0 .
(2) If 0 R R L S , then for any d   D and for each couple j 1 , j 2 with 0 j 2 < j 1 n 2 , we have
G d S 0 .
From Propositions 1 and 2, we relate to the service price and define two new critical values as
R H = max R H W , R H S   and   R L = min R L W , R L S .
The following theorem provides a simple summarization from Propositions 1 and 2, and it will be useful for studying the monotonicity and optimality of the asynchronous dynamic policy in our later sections.
Theorem 3.
(1) If R R H , then for any asynchronous dynamic policy d   D , we have
G d W 0   a n d   G d S 0 .
(2) If 0 R R L , then for any asynchronous policy d   D , we have
G d W 0   a n d   G d S 0 .

6. Monotonicity and Optimality

In this section, we use the block-structured Poisson equation to derive a useful performance difference equation, and discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the setup and sleep policies, respectively. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center.
The standard Markov model-based formulation suffers from a number of drawbacks. First and foremost, the state space is usually too large for practical problems. That is, the number of potentials to be calculated or estimated is too large for most problems. Secondly, the generally applicable Markov model does not reflect any special structure of a particular problem. Thus, it is not clear whether and how potentials can be aggregated to save computation by exploring the special structure of the system. The sensitivity point of view and the flexible construction of the sensitivity formulas provide us a new perspective to explore alternative approaches for the performance optimization of systems with some special features.
For any given asynchronous energy-efficient policy d   D , the policy-based continuous-time Markov process { X d ( t ) : t 0 } with infinitesimal generator Q d given in (2) is an irreducible, aperiodic, and positive recurrent. Therefore, by using a similar analysis to Ma et al. [31], the long-run average profit of the data center is given by
η d = π d f d ,
and the Poisson equation is written as
Q d g ( d ) = η d e f d .
For State n 1 , n 2 , n 3 , it is seen from (2) that the asynchronous energy-efficient policy d directly affects not only the elements of the infinitesimal generator Q d but also the reward function f d . That is, if the asynchronous policy d changes, then the infinitesimal generator Q d and the reward function f d will have their corresponding changes. To express such a change mathematically, we take two different asynchronous energy-efficient policies d and d , both of which correspond to their infinitesimal generators Q d and Q d , and to their reward functions f d and f d .
The following lemma provides a useful equation for the difference η d η d of the long-run average performances η d and η d for any two asynchronous policies d , d   D . Here, we only restate it without proof, while readers may refer to Ma et al. [31] for more details.
Lemma 1.
For any two asynchronous energy-efficient policies d , d   D , we have
η d η d = π d Q d Q d g ( d ) + f d f d .
Now, we describe the first role played by the performance difference, in which we set up a partial order relation in the policy set D so that the optimal asynchronous energy-efficient policy in the finite set D can be found by means of finite comparisons. Based on the performance difference η d η d for any two asynchronous energy-efficient policies d , d   D , we can set up a partial order in the policy set D as follows. We write d d if η d > η d ; d d if η d = η d ; d d if η d < η d . Furthermore, we write d d if η d η d ; d d if η d η d . By using this partial order, our research target is to find an optimal asynchronous policy d *   D such that d * d for any asynchronous energy-efficient policy d   D , or
d * = arg max d   D η d .
Note that the policy set D and the state space Ω are both finite, thus an enumeration method is feasible for finding the optimal asynchronous energy-efficient policy d * in the policy set D . Since
D = d = d W d S : d m 1 , 0 , n 3 W   0 , 1 , , m 2   for   1 n 3 m 3 ; d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2   for   m 1 , n 2 , n 3   Ω 2 Ω 3 Ω m 2 + 1 .
It is seen that the policy set D contains m 2 + 1 m 3 × 2 m 3 × 3 m 3 1 × × m 2 + 1 m 2 + 1 elements so that the enumeration method used to find the optimal policy will require a huge enumeration workload. However, our following work will be able to greatly reduce the amount of searching for the optimal asynchronous policy d * by means of the sensitivity-based optimization theory.
Now, we discuss the monotonicity of the long-run average profit η d with respect to any asynchronous policy d under the different service prices. Since the setup and sleep policies d W and d S occur asynchronously and they will not interact with each other at any time, we can, respectively, study the impact of the policies d W and d S on the long-run average profit η d . To this end, in what follows we shall discuss three different cases: R R H , 0 R R L , and R L < R < R H .

6.1. The Service Price R R H

In the case of R R H , we discuss the monotonicity and optimality with respect to two different policies: the setup policy and the sleep policy, respectively.

6.1.1. The Setup Policy with R R H W

The following theorem analyzes the right-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to the setup policy—either d m 1 , 0 , n 3 W   n 3 , n 3 + 1 , , m 2 if n 3 m 2 or d m 1 , 0 , n 3 W = m 2 if n 3 > m 2 .
Theorem 4.
For any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 3 , the long-run average profit η d W d S is linearly increasing with respect to the setup policy either d m 1 , 0 , n 3 W   n 3 , n 3 + 1 , , m 2 if n 3 m 2 or d m 1 , 0 , n 3 W = m 2 if n 3 > m 2 .
Proof of Theorem 4.
For each n 3 = 1 , 2 , , m 3 , we consider two interrelated policies d W d S , d W d S   D as follows:
d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W ̲ , d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , n 3 1 W , n 3 m 2 ̲ , d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W n 3 m 2 . It is seen that the two policies d W , d W have one difference only between their corresponding decision elements d m 1 , 0 , n 3 W and n 3 m 2 . In this case, it is seen from Theorem 1 that Q d W d S = Q d W d S and π d W d S = π d W d S . Furthermore, it is easy to check from (4) to (7) that
f d f d = 0 , 0 , , 0 , d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1 , ̲ 0 , , 0 T .
Thus, it follows from Lemma 1 that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g ( d W d S ) + f d W f d W = π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1
or
  η d W d S = η d W d S π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1 .
Since π d W d S = π d W d S , it is easy to see that π d W d S m 1 , 0 , n 3 = π d W d S m 1 , 0 , n 3 can be determined by d m 1 , 0 , n 3 W = n 3 m 2 . This indicates that π d W d S m 1 , 0 , n 3 is irrelevant to the decision element d m 1 , 0 , n 3 W . Furthermore, note that η d W d S is irrelevant to the decision element d m 1 , 0 , n 3 W , and P 2 , W P 2 , S , C 1 and C 3 1 are all positive constants, thus it is easy to see from (35) that the long-run average profit η d W d S is linearly decreasing with respect to each decision element d m 1 , 0 , n 3 W for d m 1 , 0 , n 3 W   n 3 + 1 m 2 , n 3 + 2 m 2 , , m 2 . It is worth noting that if m 2 n 3 m 3 , then d m 1 , 0 , n 3 W   m 2 , m 2 , , m 2 . This completes the proof. □
In what follows, we discuss the left-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , n 3 if n 3 < m 2 . Compared to analysis of its right-half part, our discussion for the left-half part is a little bit complicated.
Let the optimal setup policy d W * = arg max d W d S   D η d W d S be
d W * = d m 1 , 0 , 1 W * , d m 1 , 0 , 2 W * , , d m 1 , 0 , m 3 W * .
Then, it is seen from Theorem 4 that
d m 1 , 0 , n 3 W * = 0 , 1 , , n 3 , 1 n 3 m 2 , m 2 , m 2 n 3 m 3 .
Hence, Theorem 4 takes the area of finding the optimal setup policy d W * from a large set 0 , 1 , , m 2 m 3 to a greatly shrunken area 0 , 1 × 0 , 1 , 2 × × 0 , 1 , , m 2 1 × 0 , 1 , , m 2 m 3 m 2 + 1 .
To find the optimal setup policy d W * , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   1 , 2 , , n 3 m 2 . It is easy to check from (2) that
Q d W d S Q d W d S = 0 0 0 0 0 0 i 1 i 2 μ 2 m 1 μ 1 + i 2 μ 2 m 1 μ 1 + i 1 μ 2 0 0 0 0 0 0 .
On the other hand, from the reward functions given in (8), it is seen that for n 3 = 1 , 2 , , m 2 , and d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
f d W m 1 , 0 , n 3 = P 2 , W P 2 , S C 1 + C 3 1 d m 1 , 0 , n 3 W + R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 λ 1 n 3 = m 3 C 5
and
f d W m 1 , 0 , n 3 = P 2 , W P 2 , S C 1 + C 3 1 d m 1 , 0 , n 3 W + R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 λ 1 n 3 = m 3 C 5 .
Hence, we have
f d W f d W = 0 , 0 , , 0 , i 2 i 1 μ 2 β 1 , 0 , , 0 T .
We write that
η d d m 1 , 0 , n 3 W = i = π d d m 1 , 0 , n 3 W = i · f d d m 1 , 0 , n 3 W = i .
The following theorem discusses the left-half part (see Figure 2) of the unimodal structure of the long-run average profit η d with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , m 2 .
Theorem 5.
If R R H W , then for any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 2 , the long-run average profit η d W d S is strictly monotone and increasing with respect to each decision element d m 1 , 0 , n 3 W for d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 5.
For each n 3 = 1 , 2 , , m 2 , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 . Applying Lemma 1, it follows from (36)and (37) that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g ( d ) + f d W f d W = π d W d S m 1 , 0 , n 3 i 1 i 2 μ 2 g d m 1 , 0 , n 3 m 1 μ 1 + i 2 μ 2 g d m 1 , i 2 , n 3 i 2 + m 1 μ 1 + i 1 μ 2 g d m 1 , i 1 , n 3 i 1 i 1 i 2 β 1 = π d W d S m 1 , 0 , n 3 m 1 μ 1 G 1 d W i 1 μ 2 G 2 d W + β 1 + i 2 μ 2 G 3 d W + β 1 = π d W d S m 1 , 0 , n 3 G d W .
If R R H W , then it is seen from Proposition 1 that G d W 0 . Thus, we get that for the two policies d W d S , d W d S   D with d m 1 , 0 , n 3 W > d m 1 , 0 , n 3 W and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
η d W d S > η d W d S .
This shows that
η d d m 1 , 0 , n 3 W = 1 < η d d m 1 , 0 , n 3 W = 2 < < η d d m 1 , 0 , n 3 W = n 3 1 < η d d m 1 , 0 , n 3 W = n 3 .
This completes the proof. □
When R R H W , now we use Figure 2 to provide an intuitive summary for the main results given in Theorems 4 and 5. In the right-half part of Figure 2,
η d W d S = η d W d S π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 P 2 , W P 2 , S C 1 + C 3 1
shows that η d W d S is a linear function of the decision element d m 1 , 0 , n 3 W . By contrast, in the right-half part of Figure 2, we need to first introduce a restrictive condition: R R H W , under which
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
Since G d W also depends on the decision element d m 1 , 0 , n 3 W , it is clear that η d W d S is a nonlinear function of the decision element d m 1 , 0 , n 3 W .

6.1.2. The Sleep Policy with R R H S

It is different from the setup policy in that, for the sleep policy, each decision element is d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . Hence, we just consider the structural properties of the long-run average profit η d W d S with respect to each decision element d m 1 , n 2 , n 3 S .
We write the optimal sleep policy as d S * = arg max d W d S   D η d W d S , where
d S * = d m 1 , 1 , 0 S * , d m 1 , 1 , 1 S * , , d m 1 , 1 , m 3 1 S * ; d m 1 , 2 , 0 S * , , d m 1 , 2 , m 3 2 S * ; ; d m 1 , m 2 , 0 S * , , d m 1 , m 2 , m 3 m 2 S * .
Then, it is seen that
d m 1 , 1 , 0 S *   m 2 1 , m 2 , , d m 1 , 1 , m 3 1 S *   m 2 1 , m 2 ; d m 1 , 2 , 0 S *   m 2 2 , m 2 1 , m 2 , , d m 1 , 2 , m 3 2 S *   m 2 2 , m 2 1 , m 2 ; d m 1 , m 2 , 0 S *   0 , 1 , , m 2 , , d m 1 , m 2 , m 3 m 2 S *   0 , 1 , , m 2 .
It is easy to see that the area of finding the optimal sleep policy d S * is m 2 1 , m 2 m 3 × m 2 2 , m 2 1 , m 2 m 3 1 × × 0 , 1 , , m 2 m 3 m 2 .
To find the optimal sleep policy d S * , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 , it is easy to check from (2) that
Q d W d S Q d W d S = 0 0 0 0 0 0 m 1 μ 1 + j 2 μ 2 m 1 μ 1 + j 1 μ 2 j 2 j 1 μ 2 0 0 0 0 0 0 .
On the other hand, from the reward functions given in (6), d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S is in either m 2 n 2 , m 2 n 2 + 1 , , m 2 for 1 n 2 m 2 and 0 n 3 m 3 n 2 or 0 , 1 , , m 2 for n 2 = m 2 and 0 n 3 m 3 m 2 , we have
f d S m 1 , n 2 , n 3 = R μ 2 P 2 , W P 2 , S C 1 + C 3 2 d m 1 , n 2 , n 3 S + R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 C 3 2 m 1 μ 1 1 n 3 = 0 C 4
and
f d S m 1 , n 2 , n 3 = R μ 2 P 2 , W P 2 , S C 1 + C 3 2 d m 1 , n 2 , n 3 S + R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 C 3 2 m 1 μ 1 1 n 3 = 0 C 4 .
Hence, we have
f d S f d S = 0 , 0 , , 0 , j 2 j 1 μ 2 β 2 , 0 , , 0 T .
We write
η d d m 1 , n 2 , n 3 S = m 2 j = π d d m 1 , n 2 , n 3 S = m 2 j · f d d m 1 , n 2 , n 3 S = m 2 j .
The following theorem discusses the structure of the long-run average profit η d W d S with respect to each decision element d m 1 , n 2 , n 3 S .
Theorem 6.
If R R H S , then for any sleep policy d S with d W d S   D and for each n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , the long-run average profit η d is strictly monotone decreasing with respect to each decision element d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 .
Proof of Theorem 6.
For each n 2 = 1 , 2 , , m 2 and n 3 = 1 , 2 , , m 3 n 2 , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . Applying Lemma 1, it follows from (39)and (40) that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g d + f d S f d S = π d W d S m 1 , n 2 , n 3 m 1 μ 1 + j 2 μ 2 g d m 1 , j 2 , n 3 + n 2 j 2 m 1 μ 1 + j 1 μ 2 g d m 1 , j 1 , n 3 + n 2 j 1 j 2 j 1 μ 2 g d m 1 , n 2 , n 3 j 2 j 1 μ 2 β 2 = π d W d S m 1 , n 2 , n 3 m 1 μ 1 G 1 d S + j 1 μ 2 G 2 d S + β 2 j 2 μ 2 G 3 d S + β 2 = π d W d S m 1 , n 2 , n 3 G d S .
It is worthwhile to note that (41) has the same form as (38), since the perturbation of the sleep policy, j 1 , and j 2 denote the number of the working servers. If R R H S , then it is seen from Proposition 1 that for j 2 < j 1 , we have G d S 0 . Thus, we get that for j 2 < j 1 and j 1 , j 2   0 , 1 , , n 2 ,
η d W d S > η d W d S ,
this shows that η d W d S is strictly monotone increasing with respect to m 2 d m 1 , n 2 , n 3 S . Thus, we get that for the two policies d W d S , d W d S   D with d m 1 , n 2 , n 3 S > d m 1 , n 2 , n 3 S and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ,
η d W d S < η d W d S .
It is easy to see that
η d d m 1 , n 2 , n 3 S = m 2 n 2 > η d d m 1 , n 2 , n 3 S = m 2 n 2 + 1 > > η d d m 1 , n 2 , n 3 S = m 2 .
This completes the proof. □
When R R H S , now we use Figure 3 to provide an intuitive summary for the main results given in Theorems 6. According to (41), G d S depends on d m 1 , n 2 , n 3 S , and it is clear that η d W d S is a nonlinear function of d m 1 , n 2 , n 3 S .
As a simple summarization of Theorems 5 and 6, we obtain the monotone structure of the long-run average profit η d with respect to the asynchronous energy-efficient policy, while its proof is easy only on the condition that R R H makes R R H W and R R H S .
Theorem 7.
If R R H , then for any policy d   D , the long-run average profit η d is strictly monotone with respect to each decision element of d W and of d S , respectively.

6.2. The Service Price 0 R R L

A similar analysis to that of the case R R H , we simply discuss the service price 0 R R L for the monotonicity and optimality for two different policies: the setup policy and the sleep policy.

6.2.1. The Setup Policy with 0 R R L W

Theorem 8.
If 0 R R L W , for any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 2 , then the long-run average profit η d W d S is strictly monotone decreasing with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 8.
This proof is similar to that of Theorem 5. For each n 3 = 1 , 2 , , m 2 , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 for 1 n 3 m 2 . It is clear that
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
If 0 R R L W , then it is seen from Proposition 1 that G d W 0 . Thus, we get that for the two setup policies with d m 1 , 0 , n 3 W < d m 1 , 0 , n 3 W and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
η d W d S < η d W d S .
This shows that for 1 n 3 m 2 ,
η d d m 1 , 0 , n 3 W = 1 > η d d m 1 , 0 , n 3 W = 2 > > η d d m 1 , 0 , n 3 W = n 3 1 > η d d m 1 , 0 , n 3 W = n 3 .
This completes the proof. □
When 0 R R L W , we also use Figure 4 to provide an intuitive summary for the main results given in Theorems 4 and 8.

6.2.2. The Sleep Policy with 0 R R L S

Theorem 9.
If 0 R R L S , then for any sleep policy d S with d W d S   D and for each n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , the long-run average profit η d W d S is strictly monotone increasing with respect to each decision element d m 1 , n 2 , n 3 S , for d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 .
Proof of Theorem 9.
This proof is similar to that of Theorem 6. For each n 2 = 1 , 2 , , m 2 and n 3 = 0 , 1 , , m 3 n 2 , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . It is clear that
η d W d S η d W d S = π d W d S m 1 , n 2 , n 3 G d S .
By a similar analysis to that in Theorem 6, if 0 R R L S , then it is seen from Proposition 1 that for j 2 < j 1 , we have G d S 0 . Thus, we get that for j 2 < j 1 and j 1 , j 2   0 , 1 , , n 2 , η d W d S is strictly monotone decreasing with respect to m 2 d m 1 , n 2 , n 3 S , hence it is also strictly monotone increasing with respect to d m 1 , n 2 , n 3 S . Thus, we get that for the two sleep policies with d m 1 , n 2 , n 3 S < d m 1 , n 2 , n 3 S and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ,
η d W d S < η d W d S .
This shows that
η d d m 1 , n 2 , n 3 S = m 2 n 2 < η d d m 1 , n 2 , n 3 S = m 2 n 2 + 1 < < η d d m 1 , n 2 , n 3 S = m 2 .
This completes the proof. □
When 0 R R L S , we also use Figure 5 to provide an intuitive summary for the main results given in Theorem 9.
As a simple summarization of Theorems 8 and 9, the following theorem further describes monotone structure of the long-run average profit η d with respect to the asynchronous energy-efficient policy, while its proof is easy only through using the condition that 0 R R L makes 0 R R L W and 0 R R L S .
Theorem 10.
If 0 R R L , then for any asynchronous energy-efficient policy d   D , the long-run average profit η d is strictly monotone with respect to each decision element of d W and of d S , respectively.
In the remainder of this section, we discuss a more complicated case with the service price R L < R < R H . In this case, we use the bang–bang control and the asynchronous structure of d   D to prove that the optimal asynchronous energy-efficient polices d W * and   d S * both have bang–bang control forms.

6.3. The Service Price R L < R < R H

For the price R L < R < R H , we can further derive the following theorems about the monotonicity of η d with respect to the setup policy and the sleep policy, respectively.

6.3.1. The Setup Policy with R L W < R < R H W

For the service price R L W < R < R H W , the following theorem provides the monotonicity of η d with respect to the decision element d m 1 , 0 , n 3 W .
Theorem 11.
If R L W < R < R H W , then the long-run average profit η d W d S is monotone (either increasing or decreasing) with respect to the decision element d m 1 , 0 , n 3 W , where n 3 = 1 , 2 , , m 3 and d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 11.
Similarly to the first part of the proof for Theorem 5, we consider any two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 . Applying Lemma 1, we obtain
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
On the other hand, we can similarly obtain the following difference equation
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
By summing (42) and (43), we have
π d W d S m 1 , 0 , n 3 G d W π d W d S m 1 , 0 , n 3 G d W = 0 .
Therefore, we have the sign conservation equation
G d W G d W = π d W d S m 1 , 0 , n 3 π d W d S m 1 , 0 , n 3 > 0 .
The above equation means that the sign of G d W and G d W are always identical when a particular decision element d m 1 , 0 , n 3 W is changed to any d m 1 , 0 , n 3 W . With the sign conservation Equation (44) and the performance difference Equation (43), we can directly derive that the long-run average profit η d W d S is monotone with respect to d m 1 , 0 , n 3 W . This completes the proof. □
Based on Theorem 11, the following corollary directly derives that the optimal decision element d m 1 , 0 , n 3 W * has a bang–bang control form (see more details in Cao [34] and Xia and Chen [35]).
Corollary 1.
For the setup policy, the optimal decision element d m 1 , 0 , n 3 W * is either 0 or n 3 , i.e., the bang–bang control is optimal.
With Corollary 1, we should either keep all servers in sleep mode or turn on the servers such that the number of working servers equals the number of waiting jobs in the buffer. In addition, we can see that the search space of d m 1 , 0 , n 3 W can be reduced from { 0 , 1 , , n 3 } to a two-element set { 0 , n 3 } , which is a significant reduction of search complexity.

6.3.2. The Sleep Policy with R L S < R < R H S

For the service price R L S < R < R H S , the following theorem provides the monotonicity of η d with respect to the decision element d m 1 , n 2 , n 3 S .
Theorem 12.
If R L S < R < R H S , then the long-run average profit η d W d S is monotone (either increasing or decreasing) with respect to the decision element d m 1 , n 2 , n 3 S , where n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 and d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 .
Proof of Theorem 12.
Similar to the proof for Theorem 11, we consider any two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   0 , 1 , , m 2 . Applying Lemma 1, we obtain
η d W d S η d W d S = π d W d S m 1 , n 2 , n 3 G d S .
On the other hand, we can also obtain the following difference equation:
η d W d S η d W d S = π d W d S m 1 , n 2 , n 3 G d S .
Thus, the sign conservation equation is given by
G d S G d S = π d W d S m 1 , n 2 , n 3 π d W d S m 1 , n 2 , n 3 > 0 .
This means that the signs of G d S and G d S are always identical when a particular decision element d m 1 , n 2 , n 3 S is changed to any d m 1 , n 2 , n 3 S . We can directly derive that the long-run average profit η d W d S is monotone with respect to d m 1 , n 2 , n 3 S . This completes the proof. □
Corollary 2.
For the sleep policy, the optimal decision element d m 1 , n 2 , n 3 S * is either m 2 n 2 or m 2 , i.e., the bang–bang control is optimal.
With Corollary 2, we should either keep all in sleep or turn off the servers such that the number of sleeping servers equals the number of servers without jobs in Group 2. We can see that the search space of d m 1 , n 2 , n 3 S can be reduced from { m 2 n 2 , m 2 n 2 + 1 , , m 2 } to a two-element set { m 2 n 2 , m 2 } , hence this is also a significant reduction of search complexity.
It is seen from Corollaries 1 and 2 that the form of the bang–bang control is very simple and easy to adopt in practice, while the optimality of the bang–bang control guarantees the performance confidence of such simple forms of control. This makes up the threshold-type of the optimal asynchronous energy-efficient policy in the data center.

7. The Maximal Long-Run Average Profit

In this section, we provide the optimal asynchronous dynamic policy d * of the threshold-type in the energy-efficient data center and further compute the maximal long-run average profit.
We introduce some notation as follows:
c 0 = P 2 , W P 2 , S C 1 , c 1 = m 1 P 1 , W + m 2 P 2 , W C 1 , c 2 = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 , c 3 = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 + m 2 C 3 1 , c 4 = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 + m 1 μ 1 1 n 2 > 0 , n 3 = 0 C 4 , c 5 = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 + m 2 C 2 2 + m 1 μ 1 1 n 2 > 0 , n 3 = 0 C 4 + λ 1 n 2 + n 3 = m 3 C 5 .
Now, we express the optimal asynchronous energy-efficient policy d * of the threshold-type and compute the maximal long-run average profit η d * under three different service prices as follows:
Case 1.
The service price R R H
It follows from Theorem 7 that
d W * = 1 , 2 , , n 3 , , m 2 , , m 2 , d S * = m 2 1 , , m 2 1 ; ; m 2 n 2 , , m 2 n 2 ; ; 1 , , 1 ; 0 , , 0 ,
thus we have
η d * = n 1 = 0 m 1 π d n 1 , 0 , 0 R μ 1 C 2 1 n 1 c 1 + n 3 = 1 m 3 π d m 1 , 0 , n 3 R m 1 μ 1 c 2 c 0 + C 3 1 n 3 m 2 C 2 3 n 3 + n 3 = 0 m 3 n 2 n 2 = 0 m 2 π d m 1 , n 2 , n 3 R m 1 μ 1 c 4 + R μ 2 c 0 C 2 2 n 2 C 2 3 n 3 + n 3 = m 3 m 2 + 1 m 3 π d m 1 , m 2 , n 3 R m 1 μ 1 + m 2 μ 2 c 5 C 2 3 n 3 .
Case 2.
The service price 0 R R L
It follows from Theorem 10 that
d W * = 0 , 0 , , 0 , d S * = m 2 , m 2 , , m 2 ,
thus we have
η d * = n 1 = 0 m 1 π d n 1 , 0 , 0 R μ 1 C 2 1 n 1 c 1 + n 3 = 1 m 3 π d m 1 , 0 , n 3 R m 1 μ 1 c 2 C 2 3 n 3 + n 3 = 0 m 3 n 2 n 2 = 0 m 2 π d m 1 , n 2 , n 3 R m 1 μ 1 c 4 C 2 2 + C 3 2 n 2 C 2 3 n 3 + n 3 = m 3 m 2 + 1 m 3 π d m 1 , m 2 , n 3 R m 1 μ 1 + m 2 μ 2 c 5 C 2 3 n 3 .
Remark 5.
The above results are intuitive because when the service price is suitably high, the number of working servers is equal to a crucial number related to waiting jobs both in Group 2 and in the buffer; when the service price is lower, each server at the work state must pay a high energy consumption cost, but they receive only a low revenue. In this case, the profit of the data center cannot increase, so that all the servers in Group 2 would like to be closed at the sleep state.
Case 3.
The service price R L < R < R H
In Section 6.3, we have, respectively, proven the optimality of the bang–bang control for the setup and sleep policies, regardless of the service price R. However, if R L < R < R H , we cannot exactly determine the monotone form (i.e., increasing or decreasing) of the optimal asynchronous energy-efficient policy. This makes the threshold-type of the optimal asynchronous energy-efficient policy in the data center. In fact, such a threshold-type policy also provides us with a choice to compute the optimal setup and sleep policies, they not only have a very simple form but are also widely adopted in numerical applications.
In what follows, we focus our study on the threshold-type asynchronous policy, although its optimality is not yet proven in our next analysis.
We define a couple of threshold-type control parameters as follows:
θ 1 , θ 2 : θ 1 , θ 2 = 0 , 1 , , m 2 ,
where θ 1 and θ 2 are setup and sleep thresholds, respectively. Furthermore, we introduce two interesting subsets of the policy set D . We write d θ 1 W as a threshold-type setup policy and d θ 2 S as a threshold-type sleep policy. Let
d θ 1 W = def 0 , 0 , , 0 , θ 1 1   zeros θ 1 , θ 1 + 1 , , m 2 , , m 2 , d θ 2 S = def 0 , 0 , , 0 ; ; m 2 θ 2 + 1 , , m 2 θ 2 + 1 ; m 2 θ 2 , , m 2 θ 2 ; m 2 , , m 2 m 3 1 2 θ 2 θ 2 1   m 2 s .
Then
D = d θ 1 , θ 2 : d θ 1 , θ 2 = d θ 1 W d θ 2 S , θ 1 , θ 2 = 0 , 1 , , m 2 .
It is easy to see that D D .
For an asynchronous energy-efficient policy d θ 1 , θ 2 , it follows from (4) to (7) that for n 1 = 0 , 1 , , m 1 and n 2 = n 3 = 0 ,
f n 1 , 0 , 0 = R n 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 n 1 C 2 1 ;
for n 1 = m 1 , n 2 = 0 and n 3 = 1 , 2 , , θ 1 1 ,
f d θ 1 W m 1 , 0 , n 3 = R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 ;
for n 1 = m 1 , n 2 = 0 and n 3 = θ 1 , θ 1 + 1 , , m 3 ,
f d θ 1 W m 1 , 0 , n 3 = R m 1 μ 1 m 1 P 1 , W + n 3 m 2 P 2 , W + m 2 n 3 m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 n 3 m 2 C 3 1 ;
for n 1 = m 1 , n 2 = 1 , 2 , , θ 2 1 and n 3 = 0 , 1 , , m 3 n 2 ,
f d θ 2 S m 1 , n 2 , n 3   = R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 C 3 2 m 1 μ 1 1 n 3 = 0 C 4 ;
for n 1 = m 1 , n 2 = θ 2 , θ 2 + 1 , , m 2 and n 3 = 0 , 1 , , m 3 n 2 ,
f d θ 2 S m 1 , n 2 , n 3   = R m 1 μ 1 + n 2 μ 2 m 1 P 1 , W + m 2 n 2 P 2 , S + n 2 P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 m 1 μ 1 1 n 3 = 0 C 4 ;
for n 1 = m 1 , n 2 = m 2 and n 3 = m 3 m 2 , m 3 m 2 + 1 , , m 3 ,
f m 1 , m 2 , n 3 = R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + m 2 C 2 2 + n 3 C 2 3 λ 1 n 3 = m 3 C 5 .
Note that
η d θ 1 , θ 2 = n 1 = 0 m 1 π d θ 1 , θ 2 n 1 , 0 , 0 f n 1 , 0 , 0 + n 3 = 1 m 3 π d θ 1 , θ 2 m 1 , 0 , n 3 f d θ 1 W m 1 , 0 , n 3 + n 3 = 0 m 3 n 2 n 2 = 0 m 2 π d θ 1 , θ 2 m 1 , n 2 , n 3 f d θ 2 S m 1 , n 2 , n 3 + n 3 = m 3 m 2 + 1 m 3 π d θ 1 , θ 2 m 1 , m 2 , n 3 f m 1 , m 2 , n 3 .
It follows from (48) that the long-run average profit under policy d θ 1 , θ 2 is given by
η d θ 1 , θ 2 = i = 0 n π d θ 1 , θ 2 n 1 , 0 , 0 R μ 1 C 2 1 n 1 c 1 + n 3 = 1 θ 1 1 π d θ 1 W m 1 , 0 , n 3 R m 1 μ 1 c 2 C 2 3 n 3 + n 3 = θ 1 m 3 π d θ 1 W m 1 , 0 , n 3 R m 1 μ 1 c 2 c 0 + C 3 1 n 3 m 2 n 3 C 2 3 + n 3 = 0 m 3 n 2 n 2 = 1 θ 2 π d θ 2 S m 1 , n 2 , n 3 R m 1 μ 1 c 4 + R μ 2 c 0 C 2 2 n 2 C 2 3 n 3 + n 3 = 0 m 3 n 2 n 2 = θ 2 + 1 m 2 π d θ 2 S m 1 , n 2 , n 3 R m 1 μ 1 c 4 C 2 2 + C 3 2 n 2 C 2 3 n 3 + n 3 = m 3 m 2 + 1 m 3 π d θ 1 , θ 2 m 1 , m 2 , n 3 R m 1 μ 1 + m 2 μ 2 c 5 C 2 3 n 3 .
Let
θ 1 * , θ 2 * = arg max θ 1 , θ 2   0 , 1 , , m 2 η d θ 1 , θ 2 .
Then, we call d θ 1 * , θ 2 * the optimal threshold-type asynchronous energy-efficient policy in the policy set D . Since D D , the partially ordered set D shows that D is also a partially ordered set. Based on this, it is easy to see from the two partially ordered sets D and D Δ that
η d θ 1 * , θ 2 * η d * .
For the energy-efficient data center, if η d θ 1 * , θ 2 * = η d * , then we call d θ 1 * , θ 2 * the optimal threshold-type asynchronous energy-efficient policy in the original policy set D ; if η d θ 1 * , θ 2 * < η d * , then we call d θ 1 * , θ 2 * the suboptimal threshold-type asynchronous energy-efficient policy in the original policy set D .
Remark 6.
This paper is a special case of the group-server queue (see Li et al. [6]), but it provides a new theoretical framework for the performance optimization of such queueing systems. It is also more applicable to large-scale service systems, such as data centers for efficiently allocating service resources.
Remark 7.
In this paper, we discuss the optimal asynchronous dynamic policy of the energy-efficient data center deeply, and such types of policies are widespread in practice. It would be interesting to extend our results to a more general situation. Although the sensitivity-based optimization theory can effectively overcome the drawbacks of MDPs, it still has some limitations. For example, it cannot discuss the optimization of two or more dynamic control policies synchronously, which is a very important research direction in dynamic optimization.

8. Conclusions

In this paper, we highlight the optimal asynchronous dynamic policy of an energy-efficient data center by applying sensitivity-based optimization theory and RG-factorization. Such an asynchronous policy is more important and necessary in the study of energy-efficient data centers, and it largely makes an optimal analysis of energy-efficient management more interesting, difficult, and challenging. To this end, we consider a more practical model with several basic factors, for example, a finite buffer, a fast setup process from sleep to work, and the necessary cost of transferring jobs from Group 2 either to Group 1 or to the buffer. To find the optimal asynchronous dynamic policy in the energy-efficient data center, we set up a policy-based block-structured Poisson equation and provide an expression for its solution by means of the RG-factorization. Based on this, we derive the monotonicity and optimality of the long-run average profit with respect to the asynchronous dynamic policy under different service prices. We prove the optimality of the bang–bang control, which significantly reduces the action search space, and study the optimal threshold-type asynchronous dynamic policy. Therefore, the results of this paper provide new insights to the discussion of the optimal dynamic control policies of more general energy-efficient data centers.
Along such a line, there are a number of interesting directions for potential future research, for example:
  • Analyzing non-Poisson inputs such as Markovian arrival processes (MAPs) and/or non-exponential service times, e.g., the PH distributions;
  • Developing effective algorithms for finding the optimal dynamic policies of the policy-based block-structured Markov process (i.e., block-structured MDPs);
  • Discussing the fact that the long-run performance is influenced by the concave or convex reward (or cost) function;
  • Studying individual optimization for the energy-efficient management of data centers from the perspective of game theory.

Author Contributions

Conceptualization, J.-Y.M. and Q.-L.L.; methodology, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under grant No. 71932002, by the Beijing Social Science Foundation Research Base Project under grant No. 19JDGLA004, and in part by the National Natural Science Foundation of China under grant No. 61573206 and No. 11931018.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors are grateful to the editor and anonymous referees for their constructive comments and suggestions, which sufficiently help the authors to improve the presentation of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Special Cases

In this appendix, we provide two simple special cases to understand the state transition relations and the infinitesimal generator of the policy-based block-structured continuous-time Markov process X d t : t 0 .
Case A1.
m 1 = m 2 = 2 and m 3 = 3
To understand the block-structured continuous-time Markov process under an asynchronous energy-efficient policy, we first take the simplest case of m 1 = m 2 = 2 and m 3 = 3 as an example to illustrate the state transition relations and infinitesimal generator in detail.
Figure A1 shows the arrival and service rates without any policy for the Markov process X d t : t 0 . The transitions in which the transfer rates and the policies are simultaneously involved are a bit difficult, and such transition relations appear in a Markov chain with sync jumps of multi-events (either transition rates or policies (see Budhiraja and Friedlander [40])). For example, what begins a setup policy at State 2 , 0 , 1 is the entering Poisson process to State 2 , 0 , 1 , whose inter-entering times are i.i.d. and exponential with the entering rate λ + 2 μ 1 + μ 2 . Hence, if there exists one or two server set ups, then the transition rate from State 2 , 0 , 1 to State 2 , 1 , 0 is 1 d 2 , 0 , 1 W 1 λ + 2 μ 1 + μ 2 . Such an entering process is easy to see in Figure A1.
Figure A1. State transition relations for the case of m 1 = m 2 = 2 and m 3 = 3 .
Figure A1. State transition relations for the case of m 1 = m 2 = 2 and m 3 = 3 .
Systems 10 00027 g0a1
We denote a k 1 , k 2 i as the transitions with the setup policy
a 1 , 1 1 = 1 d 2 , 0 , 1 W 1 λ + 2 μ 1 + μ 2 , a 1 , 2 2 = 1 d 2 , 0 , 2 W = 1 λ + 2 μ 1 + μ 2 , a 2 , 2 2 = 1 d 2 , 0 , 2 W = 2 λ + 2 μ 1 + μ 2 , a 1 , 3 3 = 1 d 2 , 0 , 3 W = 1 λ ,   a 2 , 3 3 = 1 d 2 , 0 , 3 W = 2 λ .
Similarly, a k 3 , k 4 , k 5 i denotes the transitions with the sleep policy from
a 1 , 0 , 0 0 = 1 d 2 , 1 , 0 S = 2 2 μ 1 + 2 μ 2 , a 1 , 1 , 0 1 = 1 d 2 , 1 , 1 S = 2 λ + 2 μ 1 + 2 μ 2 , a 1 , 2 , 0 2 = 1 d 2 , 1 , 2 S = 2 λ , a 2 , 0 , 0 3 = 1 d 2 , 2 , 0 S = 2 2 μ 1 , a 2 , 0 , 1 3 = 1 d 2 , 2 , 0 S = 1 2 μ 1 , a 2 , 1 , 0 5 = 1 d 2 , 2 , 1 S = 2 λ + 2 μ 1 + 2 μ 2 , a 2 , 1 , 1 5 = 1 d 2 , 2 , 1 S = 1 λ + 2 μ 1 + 2 μ 2 .
Furthermore, we write the diagonal entries
b 1 1 = λ + 2 μ 1 + a 1 , 1 1 , b 2 2 = λ + 2 μ 1 + a 1 , 2 2 + a 2 , 2 2 ,   b 3 3 = 2 μ 1 + a 1 , 3 3 + a 2 , 3 3 ; b 1 , 0 0 = λ + μ 2 + a 1 , 0 , 0 0 ,   b 1 , 1 1 = λ + 2 μ 1 + μ 2 + a 1 , 1 , 0 1 ,   b 1 , 2 2 = 2 μ 1 + μ 2 + a 1 , 2 , 0 2 .
Therefore, its infinitesimal generator is given by
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 3 , 4 Q 4 , 3 Q 4 , 4 ,
where for level 0, it is easy to see that
Q 0 , 0 = λ λ μ 1 λ + μ 1 λ 2 μ 1 λ + μ 1   a n d   Q 0 , 1 = 0 0 λ ;
for level 1, the setup policy affects the infinitesimal generator
Q 1 , 0 = 2 μ 1 0 0 ,   Q 1 , 1 = b 1 1 λ 2 μ 1 b 2 2 λ 2 μ 1 b 3 3 ,
Q 1 , 2 = a 1 , 1 1 a 1 , 2 2 a 1 , 3 3   a n d   Q 1 , 3 = a 2 , 2 2 a 2 , 3 3 ;
for level 2, the sleep policy affects the infinitesimal generator
Q 2 , 0 = μ 2 0 0 ,   Q 2 , 1 = a 1 , 0 , 0 1 μ 2 a 1 , 1 , 0 1 μ 2 a 1 , 2 , 0 1   a n d   Q 2 , 2 = b 1 , 0 0 λ 2 μ 1 b 1 , 1 1 λ 2 μ 1 b 1 , 2 2 ;
for level 3, we have
Q 3 , 1 = 0 a 2 , 0 , 0 3 0 0 0 a 2 , 1 , 0 5 , Q 3 , 2 = 2 μ 2 a 2 , 0 , 1 3 0 0 2 μ 2 a 2 , 1 , 1 5 ,
Q 3 , 3 = b 2 , 0 3 λ 2 μ 1 b 2 , 1 5   a n d   Q 3 , 4 = 0 0 λ 0 ;
for level 4, we have
Q 4 , 3 =     0 2 μ 1 + 2 μ 2 0 0       a n d   Q 4 , 4 =     λ + 2 μ 1 + 2 μ 2 λ 2 μ 1 + 2 μ 2 2 μ 1 + 2 μ 2     .
Case A2.
m 1 = 2 , m 2 = 3 and m 3 = 4
To further understand the policy-based block-structured continuous-time Markov process X d t : t 0 , we take another example to illustrate that if the parameters m 2 and m 3 increase slightly, the complexity of the state transition relations will increase considerably.
Figure A2. State transition relations for the case of m 1 = 2 , m 2 = 3 and m 3 = 4 .
Figure A2. State transition relations for the case of m 1 = 2 , m 2 = 3 and m 3 = 4 .
Systems 10 00027 g0a2
The number of servers in Group 2 and the buffer capacity both increase by one, which makes the number of state transitions affected by the setup and sleep policies increase from 5 to 9 and from 7 to 15, respectively. Compared with Figure A2, the state transition relations become more complicated. We divide the state transition rate of the Markov process into two parts, as shown in Figure A2: (a) the state transitions without any policy and (b) the state transitions by both setup and sleep policies. Similar to Case 1, for the state transitions in which the transfer rates and the policies are simultaneously involved in the Markov chain with sync jumps of multi-events, the transfer rate is equal to the total entering rate at a certain state.
Furthermore, its infinitesimal generator is given by
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 1 , 4 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 4 , 1 Q 4 , 2 Q 4 , 3 Q 4 , 4 Q 4 , 5 Q 5 , 4 Q 5 , 5 .
Similarly, we can obtain every element Q i , j , 0 i , j 5 ; here, we omit the computational details.

Appendix B. State Transition Relations

In this appendix, we provide a general expression for the state transition relations of the policy-based block-structured continuous-time Markov process X d t : t 0 . To express the state transition rates, in what follows, we need to introduce some notations.
Based on the state transition rates for both special cases that are related to either the arrival and service processes or the setup and sleep policies, Figure A3 provides the state transition relations of the Markov process X d t : t 0 in general. Note that the figure is so complicated that we have to decompose it into three different parts: (a) the arrival and service processes, (b) the state transitions by the setup policy, and (c) the state transitions by the sleep policy. However, the three parts must be integrated as a whole.
(a) The arrival and service rates: The first type is ordinary for the arrival and service rates without any policy (see Figure A3a).
(b) The setup policy: For k 2 = k 1 , we write
a k 1 , k 1 1 = 1 d m 1 , 0 , k 2 W k 1 λ + m 1 μ 1 + μ 2 ,
for k 2 = k 1 + 1 , k 1 + 2 , , m 3 1 ,
a k 1 , k 2 2 = 1 d m 1 , 0 , k 2 W = k 1 λ + m 1 μ 1 + μ 2 ,
for k 2 = m 3 ,
a k 1 , m 3 3 = 1 d m 1 , 0 , k 2 W = k 1 λ .
Observing Figure A3b, what begins a setup policy at State m 1 , 0 , k is the entering Poisson process to State m 1 , 0 , k , whose inter-entering times are i.i.d. and exponential with entering rates, namely either λ + m 1 μ 1 + μ 2 for 0 k m 3 1 or λ for k = m 3 . Such an entering process is easy to see from Figure A3a.
Since the setup and sleep policies are asynchronous, a k 1 , k 2 i will not contain any transition with the sleep policy because the sleep policy cannot be followed by the setup policy at the same time. To express the diagonal entries of Q 1 , 1 in Appendix C, we introduce
b k 2 1   =   λ + m 1 μ 1   +     a k 1 , k 1 1   +   k 1   =   1 k 2     1 a k 1 , k 2 2 ,   if   k 2   =   1 , 2 , , m 2     1 , b k 2 2   =   λ   +   m 1 μ 1   +   k 1   =   1 m 2 a k 1 , k 2 2 ,   if   k 2   =   m 2 , m 2   +   1 , , m 3     1 , b k 2 3 = m 1 μ 1 + k 1 = 1 m 2 a k 1 , m 3 3 ,   if   k 2 = m 3 .
(c) The sleep policy: For k 3 = 1 , 2 , , m 2 , k 4 = 0 , 1 , , m 3 k 3 , and k 5 = 0 , 1 , , m 2 , we write
a k 3 , k 4 , k 5 0 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 m 1 μ 1 + k 3 + 1 μ 2 , a k 3 , k 4 , k 5 1 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 λ + m 1 μ 1 + k 3 + 1 μ 2 , a k 3 , k 4 , k 5 2 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 λ , a k 3 , k 4 , k 5 3 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 m 1 μ 1 , a k 3 , k 4 , k 5 4 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 λ + m 1 μ 1 , a k 3 , k 4 , k 5 5 = 1 d m 1 , k 3 , k 4 S = m 2 k 5 λ + m 1 μ 1 + m 2 μ 2 .
From Figure A3c, it is seen that there is a difference between the sleep and setup policies: State transitions with the sleep policy exist at many states m 1 , i , k for 1 i m 2 . Clearly, the state transition with the sleep policy from State m 1 , i , k is the entering Poisson processes, with the rate being the total entering rate to State m 1 , i , k . Note that the sleep policy cannot be followed by the setup policy at the same time. Thus, it is easy to check these state transition rates given in the above ones.
To express the diagonal entries of Q i , i for 2 i m 2 + 1 , we introduce
b k 3 , k 4 0 = λ + k 3 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 0 , b k 3 , k 4 1 = λ + m 1 μ 1 + k 3 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 1 , b k 3 , k 4 2 = m 1 μ 1 + k 3 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 2 , b k 3 , k 4 3 = λ + m 2 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 3 ,   b k 3 , k 4 4 = λ + m 1 μ 1 + m 2 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 4 , b k 3 , k 4 5 = m 1 μ 1 + m 2 μ 2 + k 3 k 5 = 0 a k 3 , k 4 , k 5 5 .
Remark A1.
The first key step in applications of the sensitivity-based optimization to the study of energy-efficient data centers is to draw the state transition relation figure (e.g., see Figure A3) and to write the infinitesimal generator of the policy-based block-structured Markov process. Although this paper has largely simplified the model assumptions, Figure A3 is still slightly complicated by its three separate parts: (a), (b), and (c). Obviously, if we consider some more general assumptions (for example, (i) the faster servers are not cheaper, (ii) Group 2 is not slower, (iii) there is no transfer rule, and so on), then the state transition relation figure will become more complicated, so that it is more difficult to write the infinitesimal generator of the policy-based block-structured Markov process and to solve the block-structured Poisson equation.
Figure A3. State transition relations of the policy-based block-structured Markov process.
Figure A3. State transition relations of the policy-based block-structured Markov process.
Systems 10 00027 g0a3
Remark A2.
Figure A3 shows that Part (a) expresses the arrival and service rates, while Parts (b) and (c) express the state transition rates caused by the setup and sleep policies, respectively. Note that the setup policy is started by only the arrival and service process at State ( m 1 , 0 , k ) for 1 k m 3 (see Part (b)), in which there is no setup rate because the setup time is so short that it is ignored. Similarly, it is easy to understand Part (c) for the sleep policy. It is worthwhile to note that an idle server may be at the work state, as seen in the idle servers with the work state in Group 1.

Appendix C. Block Elements in Q(d)

In this appendix, we write each block element in the matrix Q d .
(a) For level 0, it is easy to see that
Q 0 , 0 = λ   λ μ 1 λ + μ 1 λ m 1 1 μ 1 λ + m 1 1 μ 1 λ m 1 μ 1 λ + m 1 μ 1 ,   Q 0 , 1 = λ .
(b) For level 1, the setup policy affects the infinitesimal generator, and Q 1 , 0 is given by
Q 1 , 0 = m 1 μ 1 , Q 1 , k 1 + 1 = 0 , 0 , , 0 , k 1 1   0 s A k 1 T ,
where
A k 1 = diag a k 1 , k 1 1 , a k 1 , k 1 + 1 2 , , a k 1 , m 3 1 2 , a k 1 , m 3 3 ,   if   1 k 1 m 2 1 , diag a k 1 , k 1 2 , a k 1 , k 1 + 1 2 , , a k 1 , m 3 1 2 , a k 1 , m 3 3 ,   if   k 1 = m 2 ,
and 0 is a block of zeros with suitable size. From (A1), we have
Q 1 , 1 = b 1 1 λ m 1 μ 1 b 2 1 λ m 1 μ 1 b m 2 1 1 λ m 1 μ 1 b m 2 2 λ m 1 μ 1 b m 3 1 2 λ m 1 μ 1 b m 3 3 .
(c) For level 2, i.e., k 3 = 1 ,
Q 2 , 0 = μ 2 , Q 2 , 1 = a 1 , 0 , 0 0 μ 2 a 1 , 1 , 0 1 μ 2 a 1 , m 3 2 , 0 1 μ 2 a 1 , m 3 1 , 0 2 ,
and
Q 2 , 2 = b 1 , 0 0 λ m 1 μ 1 b 1 , 1 1 λ m 1 μ 1 b 1 , m 3 2 1 λ m 1 μ 1 b 1 , m 3 1 2 .
(d) For level k 3 + 1 , k 3 = 2 , 3 , , m 2 2 , subdivided into three cases as follows:
For k 5 = 0 , 1 , , k 3 2 ,
Q k 3 + 1 , k 5 + 1 = 0 , 0 , , 0 , k 3 k 5   0 s A k 3 , k 5 ,
where
A k 3 , k 5 = diag a k 3 , 0 , k 5 0 , a k 3 , 1 , k 5 1 , , a k 3 , m 3 k 3 , k 5 1 , a k 3 , m 3 k 3 , k 5 2 .
For k 5 = k 3 1 ,
Q k 3 + 1 , k 3 = k 3 μ 2 a k 3 , 0 , k 3 1 0 k 3 μ 2 a k 3 , 1 , k 3 1 1 k 3 μ 2 a k 3 , m 3 k 3 1 , k 3 1 1 k 3 μ 2 a k 3 , m 3 k 3 , k 3 1 1 .
For k 5 = k 3 ,
Q k 3 + 1 , k 3 + 1 = b k 3 , 0 0 λ m 1 μ 1 b k 3 , 1 1 λ m 1 μ 1 b k 3 , m 3 k 3 1 1 λ m 1 μ 1 b k 3 , m 3 k 3 2 .
(e) For level m 2 + 1 , i.e., k 3 = m 2 ,
Q m 2 + 1 , k 5 + 1 = 0 , 0 , , 0 , m 2 k 5   0 s A m 2 , k 5 , k 5 = 0 , 1 , , m 2 2 ,
where
A m 2 , k 5 = diag a m 2 , 0 , k 5 3 , a m 2 , 1 , k 5 4 , , a m 2 , m 3 m 2 1 , k 5 4 , a m 2 , m 3 m 2 , k 5 5 .
Q m 2 + 1 , m 2 = m 2 μ 2 a m 2 , 0 , m 2 1 3 m 2 μ 2 a m 2 , 1 , m 2 1 4 m 2 μ 2 a m 2 , m 3 m 2 1 , m 2 1 4 m 2 μ 2 a m 2 , m 3 m 2 , m 2 1 5 ,
Q m 2 + 1 , m 2 + 2 = λ ,
and
Q m 2 + 1 , m 2 + 1 = b m 2 , 0 3 λ m 1 μ 1 b m 2 , 1 4 λ m 1 μ 1 b m 2 , m 3 m 2 1 4 λ m 1 μ 1 b m 2 , m 3 m 2 5 .
(f) For level m 2 + 2 ,
Q m 2 + 2 , m 2 + 1 = a   and   Q m 2 + 2 , m 2 + 2 = λ + a λ a λ + a λ a λ + a λ a a ,
where a = m 1 μ 1 + m 2 μ 2 .

References

  1. Masanet, E.; Shehabi, A.; Lei, N.; Smith, S.; Koomey, J. Recalibrating global data center energy-use estimates. Science 2020, 367, 984–986. [Google Scholar] [CrossRef] [PubMed]
  2. Zhang, Q.; Meng, Z.; Hong, X.; Zhan, Y.; Liu, J.; Dong, J.; Bai, T.; Niu, J.; Deen, M.J. A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization. J. Syst. Archit. 2021, 119, 102253. [Google Scholar] [CrossRef]
  3. Nadjahi, C.; Louahlia, H.; Lemasson, S. A review of thermal management and innovative cooling strategies for data center. Sustain. Comput. Inform. Syst. 2018, 19, 14–28. [Google Scholar] [CrossRef]
  4. Koot, M.; Wijnhoven, F. Usage impact on data center electricity needs: A system dynamic forecasting model. Appl. Energy 2021, 291, 116798. [Google Scholar] [CrossRef]
  5. Shirmarz, A.; Ghaffari, A. Performance issues and solutions in SDN-based data center: A survey. J. Supercomput. 2020, 76, 7545–7593. [Google Scholar] [CrossRef]
  6. Li, Q.L.; Ma, J.Y.; Xie, M.Z.; Xia, L. Group-server queues. In Proceedings of the International Conference on Queueing Theory and Network Applications, Qinhuangdao, China, 21–23 August 2017; pp. 49–72. [Google Scholar]
  7. Harchol-Balter, M. Open problems in queueing theory inspired by data center computing. Queueing Syst. 2021, 97, 3–37. [Google Scholar] [CrossRef]
  8. Barroso, L.A.; Hölzle, U. The case for energy-proportional computing. Computer 2007, 40, 33–37. [Google Scholar] [CrossRef]
  9. Kuehn, P.J.; Mashaly, M.E. Automatic energy efficiency management of data center resources by load-dependent server activation and sleep modes. Ad Hoc Netw. 2015, 25, 497–504. [Google Scholar] [CrossRef]
  10. Gandhi, A. Dynamic Server Provisioning for Data Center Power Management. Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, 2013. [Google Scholar]
  11. Gandhi, A.; Doroudi, S.; Harchol-Balter, M.; Scheller-Wolf, A. Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward. Queueing Syst. 2014, 77, 177–209. [Google Scholar] [CrossRef]
  12. Gandhi, A.; Gupta, V.; Harchol-Balter, M.; Kozuch, M.A. Optimality analysis of energy-performance trade-off for server farm management. Perform. Eval. 2010, 67, 1155–1171. [Google Scholar] [CrossRef] [Green Version]
  13. Gandhi, A.; Harchol-Balter, M.; Adan, I. Server farms with setup costs. Perform. Eval. 2010, 67, 1123–1138. [Google Scholar] [CrossRef]
  14. Maccio, V.J.; Down, D.G. On optimal policies for energy-aware servers. Perform. Eval. 2015, 90, 36–52. [Google Scholar] [CrossRef] [Green Version]
  15. Maccio, V.J.; Down, D.G. Exact analysis of energy-aware multiserver queueing systems with setup times. In Proceedings of the IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, London, UK, 19–21 September 2016; pp. 11–20. [Google Scholar]
  16. Phung-Duc, T. Exact solutions for M/M/c/setup queues. Telecommun. Syst. 2017, 64, 309–324. [Google Scholar] [CrossRef] [Green Version]
  17. Phung-Duc, T.; Kawanishi, K.I. Energy-aware data centers with s-staggered setup and abandonment. In Proceedings of the International Conference on Analytical and Stochastic Modeling Techniques and Applications, Cardiff, UK, 24–26 August 2016; pp. 269–283. [Google Scholar]
  18. Gebrehiwot, M.E.; Aalto, S.; Lassila, P. Optimal energy-aware control policies for FIFO servers. Perform. Eval. 2016, 103, 41–59. [Google Scholar] [CrossRef]
  19. Gebrehiwot, M.E.; Aalto, S.; Lassila, P. Energy-performance trade-off for processor sharing queues with setup delay. Oper. Res. Lett. 2016, 44, 101–106. [Google Scholar] [CrossRef]
  20. Gebrehiwot, M.E.; Aalto, S.; Lassila, P. Energy-aware SRPT server with batch arrivals: Analysis and optimization. Perform. Eval. 2017, 15, 92–107. [Google Scholar] [CrossRef]
  21. Mitrani, I. Service center trade-offs between customer impatience and power consumption. Perform. Eval. 2011, 68, 1222–1231. [Google Scholar] [CrossRef]
  22. Mitrani, I. Managing performance and power consumption in a server farm. Ann. Oper. Res. 2013, 202, 121–134. [Google Scholar] [CrossRef]
  23. Kamitsos, I.; Ha, S.; Andrew, L.L.; Bawa, J.; Butnariu, D.; Kim, H.; Chiang, M. Optimal sleeping: Models and experiments for energy-delay tradeoff. Int. J. Syst. Sci. Oper. Logist. 2017, 4, 356–371. [Google Scholar] [CrossRef]
  24. Hipp, S.K.; Holzbaur, U.D. Decision processes with monotone hysteretic policies. Oper. Res. 1988, 36, 585–588. [Google Scholar] [CrossRef]
  25. Lu, F.V.; Serfozo, R.F. M/M/1 queueing decision processes with monotone hysteretic optimal policies. Oper. Res. 1984, 32, 1116–1132. [Google Scholar] [CrossRef]
  26. Yang, J.; Zhang, S.; Wu, X.; Ran, Y.; Xi, H. Online learning-based server provisioning for electricity cost reduction in data center. IEEE Trans. Control Syst. Technol. 2017, 25, 1044–1051. [Google Scholar] [CrossRef]
  27. Ding, D.; Fan, X.; Zhao, Y.; Kang, K.; Yin, Q.; Zeng, J. Q-learning based dynamic task scheduling for energy-efficient cloud computing. Future Gener. Comput. Syst. 2020, 108, 361–371. [Google Scholar] [CrossRef]
  28. Liang, Y.; Lu, M.; Shen, Z.J.M.; Tang, R. Data center network design for internet-related services and cloud computing. Prod. Oper. Manag. 2021, 30, 2077–2101. [Google Scholar] [CrossRef]
  29. Xia, L.; Chen, S. Dynamic pricing control for open queueing networks. IEEE Trans. Autom. Control 2018, 63, 3290–3300. [Google Scholar] [CrossRef]
  30. Xia, L.; Miller, D.; Zhou, Z.; Bambos, N. Service rate control of tandem queues with power constraints. IEEE Trans. Autom. Control 2017, 62, 5111–5123. [Google Scholar] [CrossRef]
  31. Ma, J.Y.; Xia, L.; Li, Q.L. Optimal energy-efficient policies for data centers through sensitivity-based optimization. Discrete Event Dyn. Syst. 2019, 29, 567–606. [Google Scholar] [CrossRef] [Green Version]
  32. Chi, C.; Ji, K.; Marahatta, A.; Song, P.; Zhang, F.; Liu, Z. Jointly optimizing the IT and cooling systems for data center energy efficiency based on multi-agent deep reinforcement learning. In Proceedings of the 11th ACM International Conference on Future Energy Systems, Virtual Event, Australia, 22–26 June 2020; pp. 489–495. [Google Scholar]
  33. Li, Q.L. Constructive Computation in Stochastic Models with Applications: The RG-Factorizations; Springer: Beijing, China, 2010. [Google Scholar]
  34. Cao, X.R. Stochastic Learning and Optimization—A Sensitivity-Based Approach; Springer: New York, NY, USA, 2007. [Google Scholar]
  35. Xia, L.; Zhang, Z.G.; Li, Q.L. A c/μ-rule for for job assignment in heterogeneous group-server queues. Prod. Oper. Manag. 2021, 1–18. [Google Scholar] [CrossRef]
  36. Li, Q.L.; Li, Y.M.; Ma, J.Y.; Liu, H.L. A complete algebraic transformational solution for the optimal dynamic policy in inventory rationing across two demand classes. arXiv 2019, arXiv:1908.09295v1. [Google Scholar]
  37. Ma, J.Y.; Li, Q.L. Sensitivity-based optimization for blockchain selfish mining. In Proceedings of the International Conference on Algorithmic Aspects of Information and Management, Dallas, TX, USA, 20–22 December 2021; pp. 329–343. [Google Scholar]
  38. Xia, L. Risk-sensitive Markov decision processes with combined metrics of mean and variance. Prod. Oper. Manag. 2020, 29, 2808–2827. [Google Scholar] [CrossRef]
  39. Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley& Sons: New York, NY, USA, 2014. [Google Scholar]
  40. Budhiraja, A.; Friedlander, E. Diffusion approximations for controlled weakly interacting large finite state systems with simultaneous jumps. Ann. Appl. Probab. 2018, 28, 204–249. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Energy-efficient management of a data center.
Figure 1. Energy-efficient management of a data center.
Systems 10 00027 g001
Figure 2. The unimodal structure of the long-run average profit by the setup policy.
Figure 2. The unimodal structure of the long-run average profit by the setup policy.
Systems 10 00027 g002
Figure 3. The monotone structure of the long-run average profit by the sleep policy.
Figure 3. The monotone structure of the long-run average profit by the sleep policy.
Systems 10 00027 g003
Figure 4. The decreasing structure of the long-run average profit by the setup policy.
Figure 4. The decreasing structure of the long-run average profit by the setup policy.
Systems 10 00027 g004
Figure 5. The increasing structure of the long-run average profit by the sleep policy.
Figure 5. The increasing structure of the long-run average profit by the sleep policy.
Systems 10 00027 g005
Table 1. Cost notation in the data center.
Table 1. Cost notation in the data center.
CostNecessary Interpretation
C 1 The power consumption price
C 2 1 The holding cost for a job in Group 1 per unit of sojourn time
C 2 2 The holding cost for a job in Group 2 per unit of sojourn time
C 2 3 The holding cost for a job in the buffer per unit of sojourn time
C 3 1 The setup cost for a server switching from the sleep state to the work state
C 3 2 The transferred cost for a incomplete-service job returning to the buffer
C 4 The transferred cost for a job in Group 2 is transferred to Group 1
C 5 The opportunity cost for each lost job
RThe service price from the served job
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ma, J.-Y.; Li, Q.-L.; Xia, L. Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers. Systems 2022, 10, 27. https://doi.org/10.3390/systems10020027

AMA Style

Ma J-Y, Li Q-L, Xia L. Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers. Systems. 2022; 10(2):27. https://doi.org/10.3390/systems10020027

Chicago/Turabian Style

Ma, Jing-Yu, Quan-Lin Li, and Li Xia. 2022. "Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers" Systems 10, no. 2: 27. https://doi.org/10.3390/systems10020027

APA Style

Ma, J. -Y., Li, Q. -L., & Xia, L. (2022). Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers. Systems, 10(2), 27. https://doi.org/10.3390/systems10020027

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop