In this section, we will present an overview of edge computing architecture. Additionally, we will define the pertinent functions of local computation and edge computing separately. Lastly, this paper will introduce MOBA-CV-SARSA, as proposed herein.
2.1. System Architecture
In edge computing, computational resources move closer to smart devices, transitioning from the cloud to local proximity, as depicted in
Figure 1. Edge servers are strategically placed at the network periphery. This shift eliminates the need for devices to send data to the cloud for processing, thereby economizing transmission time and circumventing bandwidth limitations. The strategic deployment of compact base stations enhances connectivity for myriad smart devices. Although edge computing reduces latency by minimizing transmission distances, the finite energy of wireless devices and transmission lags between devices and edge servers necessitate careful consideration. Consequently, this study will delve into the intricacies of edge computing’s local architecture and extensively analyze these pertinent issues.
This study adopts the framework proposed in reference [
2], which includes mobile edge computing (MEC) servers and Macro eNodeBs (MeNBs). MeNBs connect to multiple Small eNodeBs (SeNBs), as illustrated in
Figure 2. SeNBs are divided and interconnected based on the location of Smart Mobile Devices (SMDs), facilitating task transmission. In this architecture, computational capabilities are exclusive to edge servers and smart devices. The configuration comprises one edge server, MeNB collectively, M SeNBs, and S SMDs.
In this architecture, wired connections enable continuous power and data transmission between Macro eNodeBs and Small eNodeBs, simplifying transmission time and energy consumption considerations. Wireless links connect Small eNodeBs with smart devices, which lack continuous power and necessitate attention to residual energy and limited computational capabilities. Task states remain until offloading decision optimization, allowing focused latency and energy consumption resolution. The edge computing server gathers optimization information from smart devices, including task sizes and computational capabilities. Considerations of network latency and energy consumption ensure balanced offloading decisions.
The offloading decision for the k-th task from the j-th smart device under the m-th MeNB is represented as
. A task is processed remotely if
equals 1; otherwise, if the task chooses local processing,
equals 0. Since a single smart device can generate multiple tasks, Equation (1) is applied to convert the task count into binary form [
2]. This method converts the number of tasks (
) from smart device (s), where s ∈ {1, 2, …, S}, and S denotes the total number of smart devices in the system. The bit length is
. By using the conversion formula, decimal values are transformed into binary format, enabling the assignment of offloading decisions to each task independently. This allows the determination of whether each task should be processed remotely or locally, as depicted in
Figure 3.
2.2. Problem Formulation
Given the inherent trade-off between energy consumption and latency, this study delves into the network’s energy consumption and latency dynamics by analyzing task-offloading strategies. It aims to discern whether individual tasks best suit local or remote computation. The ensuing discourse will explore the formulas quantifying energy consumption and latency for local and remote computation scenarios. These formulas will be tailored to the architecture of the j-th smart device within the m-th zone, as depicted in
Figure 4, defining pertinent parameters.
First, we define the delay time (
) and energy consumption (
) when the j-th smart device in the m-th zone selects local computation. The delay time is the total computational capacity required by all tasks generated by the device, divided by the device’s computational capacity (
) [
12].
represents the total number of tasks generated by the device. In contrast,
represents the computational capacity required to complete the k-th task, namely the CPU cycles needed to complete the task. Equation (2) describes the required computation delay to the j-th smart device in the m-th zone:
Through the computation of the CPU cycles necessary for processing a 1-bit task (
) and the data size associated with it (
), we can ascertain the total CPU cycles needed to accomplish the task [
12]:
The energy consumption (
) incurred by local computation results from the multiplication of the computational capacity needed for each task (
) by its respective offloading decision (
). Subsequently, this value is further multiplied by the energy consumption coefficient (
) and the square of the smart device’s computational capability (
) [
12].
In the upcoming discussion, we will explore the latency and energy consumption associated with remote computation. When the j-th smart device in the m-th zone engages in remote computation for the kth task, it is crucial to factor in the energy and time needed for task transmission from the device to the small base station and, subsequently, from the small base station to the edge server. Also, this study excludes consideration of the energy consumption and latency of the returned results, given their relatively small data volume and minimal impact on the overall network performance.
The forthcoming discussion will outline the transmission time (
) and energy consumption (
) associated with smart devices transferring tasks to small base stations. The transmission time hinges on the aggregate data size for transmission (
), regardless of the transmission rate from the smart device back to the small base station (
) [
2]. Within this context,
denotes the data size of the kth task generated by the jth smart device in the mth zone,
signifies the offloading decision for that specific task, and
represents the total number of tasks generated by the device.
In the above equation, the transmission rate (
) is determined by the Shannon Theorem, with B indicating the channel bandwidth,
representing the device’s transmission power, and
indicating the channel gain.
stands for noise, while
denotes the interference on this channel.
is defined by Equation (6).
The channel gain (
) from the device to the small base station equals the gain (
) divided by the square of the distance between the location (
) of the smart mobile device (SMD) and the location of the small base station (SeNB) (
) plus the square of the base station’s fixed height (H) [
13]. In this context,
denotes the channel gain at a reference distance of 1 m and a transmission power of 1 watt:
The product of the device’s transmission power (
) and the channel gain (
) from the connected small base station determines the total devices (
) in adjacent cells employing an identical transmission channel to the smart device (
), leading to the evaluation of channel interference (
). The channel decision (
) depends on whether devices (
) in neighboring cells utilize the same transmission channel as the device (
); a value of 1 signifies channel congruence, whereas 0 denotes disparate channels.
We multiply the aggregate data size of transmitted tasks (
) by the transmission power (
) of the device and subsequently divide by the transmission rate (
) from the device to the small base station [
2]. The energy consumption (
) follows the equation
The total computational capacity needed for all remote computing tasks is calculated by multiplying the computational capacity required for each task (
) by the offloading decision (
) and summing the results. Subsequently, this total is divided by the computational capacity of the edge server (F):
The total network latency (T) encompasses the duration of local computation (
) for all smart device tasks, the transmission time for remote computation (
), and the task computation time at the edge server (
):
The total energy consumption (E) comprises the energy used for local computation (
) and the transmission energy for remote computation (
) of all smart device tasks.
This study considers three constraint violations (CVs): the combined energy of local computation (
) and remote computation transmission (
) must not exceed the device’s remaining energy (
).
Smart devices must possess computational capabilities that surpass the requirements for local computations.
Local tasks have an offloading decision of 0, while remote tasks have a decision of 1:
Integration is performed using Equations (13)–(15). Optimized task-offloading decisions (
) are determined to minimize both overall network latency (T) and energy consumption (E) while adhering to the constraints:
2.3. Design of MOBA-CV-SARSA
Incorporating the multi-objective bat algorithm into the adaptive tuning of hyperparameters for the SARSA learning algorithm and considering the constraints of edge computing (CVs), we introduce MOBA-CV-SARSA, depicted in
Figure 5, where A, B, and C are non-dominated solutions in the multi-object optimization; D is a dominated solution.
This study introduces MOBA-CV-SARSA, aiming to optimize task-offloading decisions for achieving equilibrium between network latency and energy consumption, as depicted in
Figure 6. It preserves the bat algorithm’s framework while elevating it to a multi-objective bat algorithm using non-dominated sorting and crowding distance. Additionally, it incorporates SARSA with reinforcement learning to autonomously fine-tune the multi-objective bat algorithm’s hyperparameters and offers multi-objective decision making for thorough solution comparisons. This decision will evaluate the bat population’s rank first and then the crowding distance [
14] if the bats have the same rank in each iteration. Post-iteration sorting is conducted based on energy consumption to extend the network’s longevity. Bats with smaller energy expenditures will be prioritized. This is because this paper hopes to extend the life of the network and does not want the device to fail prematurely due to insufficient power. Therefore, this decision is included before outputting the final result. That is, if bats at the same level have the same crowding distance at the same time, they will be sorted according to energy consumption in the final optimized population.
Initially,
Figure 6 delineates the population’s initialization settings [
2]. Each row symbolizes a bat (
,
), with N indicating the bat’s dimensionality, equivalent to the overall count of smart devices (S). The total population consists of npop rows, signifying the aggregate number of bats. Each bat encapsulates task-offloading decisions across dimensions. The first bat performs local computation for all device tasks, whereas the second conducts remote computation.
reflects the outcome following the conversion of all functions of the sth device to remote computation, as defined by Equation (1).
In each iteration, the loudness (
) and pulse emission rate (
) of the bats undergo initial updates. According to Reference [
15], optimal performance occurs when the loudness value (α) and pulse emission rate value (γ) fall within the range of 0.9 to 0.98. This study conducts a brute force experiment with 8250 trials and statistical analyses to determine the optimal values for MOBA-CV by combining and examining α and γ. Multiple experiments are required to obtain suitable α and γ hyperparameters for loudness and pulse emission rate. Hence, this paper proposes a multi-objective bat algorithm (MOBA-CV-SARSA) that employs reinforcement learning SARSA to streamline the cumbersome parameter experimentation process. SARSA treats the α and γ values used for bat updates as actions, where actions taken under specific states yield corresponding rewards. The subsequent section will present the customized reward function.
The primary objective of this research is to minimize both the total delay time (13) (denoted by
) and energy consumption (14) (
). However, SARSA pursues the maximization of the reward value. To prevent infinite values as the objective function becomes zero, a small positive value (
) is inclusively added to each objective function. Subsequently, these reciprocated values undergo multiplication with corresponding weights (
). Considering the absence of units, the energy consumption value (
) generally exceeds the delay time value (
) threefold. Hence, the reciprocal value of delay time (
) will exceptionally be that of energy consumption (
). To uphold equal significance for both objectives,
is designated as 0.25 and
as 0.75.
Reward values (16) enable the utilization of the Q-value update formula to modify the Q-table within SARSA. The Q-table format employed in this investigation is depicted in
Table 1. Initially, bats exhibiting diverse states are integrated into the Q-table’s state table. Following this, α and γ are designated as numerous sets of values, initializing all Q-values within the Q-table to 0. The respective Q-value undergoes an update upon the bat colony’s state alignment with an entry in the Q-table. With each iteration, new bats emerge, prompting a comparison between these bats and the states archived in the Q-table. Should the new state be pre-existing, the associated state table is employed for Q-value (
) updates. Conversely, if the new state is absent from the Q-table, the new bat is appended, and
ε-greedy facilitates the initial action selection and Q-value update.
When dealing with a multi-objective problem, a singular objective assessment approach is inadequate. This study presents a specialized method for multi-objective decision making to evaluate multiple objectives simultaneously. Within this methodology, a comprehensive comparison between the new bat (
) and the optimal bat (
) is conducted. The procedural details are elucidated in Algorithm 1, where ‘M’ denotes the total number of objective functions. In this investigation, M is set at 2, representing overall network latency and energy consumption. Initially, each objective function undergoes a comparison between the new bat and the optimal bat. If the new bat demonstrates superior performance over the optimal bat in the assessed objective function (
), the count of dominated objective functions (
) increases. When the number of objective functions dominated by the new bat matches the total count of objective functions, it signifies complete superiority over the optimal bat, prompting the replacement of the original bat with the new one. In sum, the pseudo-code of the aforementioned MOBA-CV-SARS is shown in Algorithm 1.
Algorithm 1: Pseudo-code of MOBA-CV-SARSA algorithm |
1: | Initial bats’ solutions/locations |
2: | , |
3: | Calculate the multi-objective functions value for , M is number of multi-objective functions |
4: | Process non-dominated sorting (based on ranking and crowding distance) for |
5: | for t = 1 to T (number of iterations) |
6: | Update A_i^(t + 1) and r_i^(t + 1) with SARSA |
7: | for i = 1 to npop |
8: | Select the best solution/location () |
9: | Adjust frequencies and update velocities , and generate solutions/locations () |
10: | if rand > pulse rate ) then |
11: | Generate a local solution/location around the local best solution/location |
12: | end |
13: | Calculate the multi-objective functions value for |
14: | if rand < loudness () then |
15: | Process multi-objective decision to decide whether replaces |
16: | end |
17: | end |
18: | Concatenate and |
19: | Process non-dominated sorting of and |
20: | Select bats’ solutions/locations () for next iteration |
21: | end |
22: | if bats’ solutions/locations () have the same rank and crowding distance then |
23: | Sort in ascending order according to energy consumption |
24: | end |
25: | Display the final results |
The proposed SARSA reinforcement learning algorithm optimizes hyperparameters α and γ in a range between 0 and 1. However, to accelerate the convergence, we firstly find the optimal
and
obtained from MOBA-CV-SARSA with a broad search step. Then, we refine the best solutions using Formulas (17) and (18), creating narrower search intervals with finer steps.
Once the identified
and
values converge to the specific values which are most frequently found, the search range for α and γ values contracts, triggering automatic switching, as depicted in
Figure 7. By executing a fresh search and update within this narrowed range, the method not only hastens convergence but also determines more precise α and γ values. It balances network latency and energy consumption by optimizing task-offloading decisions. This research employs a multi-objective bat algorithm to negotiate solutions in multi-objective functions. It employs SARSA reinforcement learning to adjust hyperparameters dynamically, eliminating the necessity for intricate hyperparameter experiments. The enhanced MOBA-CV-SARSA-DA can accurately identify suitable parameter ranges.