1. Introduction
The ocean not only contains numerous biological and mineral resources but also plays a significant role in the study of biodiversity, the exploration of global climate change, the development of marine resources, and ocean transportation [
1,
2]. In recent years, the ocean has become a focal point of great concern for governments, academia, and industry, prompting a significant acceleration in their efforts in ocean exploration and advance maritime communication. The Internet of Things (IoT) allows a wide range of smart devices, from simple smart home devices to complex industrial machinery, to gather extensive data and communicate with each other and with other internet-connected devices. This seamless data exchange enables these devices to carry out a variety of tasks autonomously [
3]. The success of the IoTs in the mainland has prompted extensive exploration into how to develop a maritime version of the Maritime Internet of Things (MIoT) [
4,
5]. MIoT is able to bring the advantages of the IoT to the maritime sector and provide ubiquitous internet services for marine environmental monitoring and emerging maritime-related applications, such as marine pollution monitoring, tidal recording, ocean current measuring, assisted navigation, maritime search and rescue (SAR) operations, vessel traffic management, and so on.
Similar to IoT devices (IoTD), MIoT devices (MIoTD), such as various sensors, smart buoys, unmanned surface vessels (USV), and floats, have stringent computation and energy resource limitations. To reduce the burden of powering and computing for task computation and reduce task processing latency at terminals, Mobile Edge Computing (MEC) is gradually replacing the cloud computing paradigm, envisioned as a promising paradigm, by putting lightweight servers in closer proximity to the terminals [
6]. MEC architecture can be employed in a variety of IoT application scenarios, like 6G communication, virtual reality, the Internet of Vehicles (IoV), smart cities, smart factories, and more. As one of the core techniques of the MEC, task offloading has received great attention recently. Most of the work has been conducted to determine how computation offloading can be optimized, that is, ensuring rapid and reliable wireless transmission while also providing a reasonable amount of communication and computation resources to meet task latency requirements [
7]. Generally speaking, binary offloading approaches are usually employed for simple, indivisible, or highly integrated tasks. This means that tasks can be either processed locally or entirely sent to the MEC servers [
8]. The authors in [
9,
10] studied the binary offloading problem with the goal of minimizing energy consumption, and they found that utilizing MEC servers for certain computing tasks can be more effective in certain situations.
Except that the implementation of MIoT is impeded by computing resource constraints, the shift in dominant traffic in wireless networks is a significant driving factor for the progression toward 6G. Since there is a finite amount of the spectrum available in wireless networks, advanced channel coding and modulation techniques are required to reduce interference, which normally infers more power consumption, not only from the transceiver but also from the overall radio access network [
11]. It is anticipated that wireless networks will remain a major contributor to the global carbon footprint, which is predicted to double in the next decade [
12]. Hence, a key challenge in deploying the MIoT is minimizing its traditional energy consumption to lower its carbon footprint and environmental negative impact. Furthermore, due to the numerous MIoTDs involved in the MIoT, such as underwater sensors and smart buoys on the sea, replacing or recharging the batteries of these devices is not convenient, not to mention the relevant cost and effectiveness [
13]. Therefore, adopting more energy-efficient, sustainable, and eco-friendly wireless communication systems is becoming increasingly necessary. The integration of MIoT and green communication technology is anticipated to bring a multitude of benefits by optimizing joint computing and resource allocation to reduce their traditional energy consumption and carbon footprint. For instance, by implementing energy-efficient protocols and equipment in MIoT, its power consumption can be cut back, while the deployment of renewable energy sources can further help to balance the carbon emissions associated with the network’s operation.
As an achievable way to cut down on traditional energy usage and carbon emissions, harnessing renewable energy [
14,
15] has attracted much attention in recent years. By incorporating energy harvest (EH) modules into energy storage devices, base stations (BSs), or other equipment, renewable energy, such as solar energy, wind energy, and tidal energy, can be exploited and then converted into electric power. This helps reduce carbon emissions and enables self-maintenance circuits [
16]. A few recent works have considered the integration of EH and MEC and the optimization of task offloading [
8,
17,
18]. These prior studies have primarily emphasized enhancing computing efficiency and minimizing energy consumption in IoTDs to enhance performance in the “energy-saving approach”. However, the aspect of reducing carbon emissions in the “exploiting renewable energy approach” has not received adequate attention in these works. With the increasing popularity of the internet and broadband communication systems, it is essential to consider reducing these systems’ traditional energy consumption to decrease their carbon footprints and keep their environmental impact to a minimum. Furthermore, offloading decisions can be more flexible. However, current classical network optimization algorithms still rely on complex heuristic adjustments to find a sufficient solution, resulting in exponential computational time and limited effectiveness for larger networks. Fortunately, deep reinforcement learning (DRL) techniques can replace the laborious process of traditional optimization algorithms while reducing computational complexity [
19,
20]. Recently, more and more attention has been given to the DQN-based strategies [
21,
22] applied to the MEC offloading issue. Nevertheless, value-based methods are not sufficient to tackle the optimization challenges of dynamic computation offloading and resource allocation with a continuous action space. This is due to their inability to extract temporal information from task data as they are restricted to global discrimination. To overcome the inefficiency and high variance associated with assessing a policy using policy gradient methods, the authors in [
23,
24] proposed joint optimization methods based on the DDPG structure that involves the simultaneous continuous-time adaptation of both actor and critic neural networks. However, the work [
24] solely focused on minimizing the system’s energy consumption, ignoring the fact that energy savings in a hybrid energy system do not imply lower carbon emissions.
Motivated by the above, in our previous work [
25], we adapted the deep deterministic policy gradient (DDPG) for joint optimization of computation offloading and resource allocation schemes, aiming at maximizing the total system execution efficiency (including the total size of completed tasks, task execution latency, and the system’s carbon emissions) under a hybrid energy supply. As an extension of the earlier work in [
25], in this article, we consider in more detail the constraints of task latency, computational resource capacity, and transmission power to formulate the maximizing system execution efficiency problem in the green MEC-enabled MIoT network and adapt the DDPG to effectively optimize the computation offloading and resource allocation scheme, aiming to significantly reduce the system’s carbon emissions and tasks’ delay and improve the total size of completed tasks. The difference between this paper and our previous work is listed in
Section 2. The primary findings of this study are outlined below:
Network architecture and problem formulation: We first propose a green MEC-enabled MIoT network architecture with multiple MIoTDs and multiple BSs and then formulate the problem of joint task offloading and resource allocation in the green MEC-enabled MIoT network as a stochastic optimization problem to maximize system execution efficiency, which consists of system execution cost (including system task latency and carbon emissions) and the size of completed tasks. Our objective function is more general than those proposed by many related studies in that we include the system’s carbon emissions due to fossil energy usage, the size of completed tasks, and their latency.
Algorithm design: We propose a DDPG-based carbon-aware task offloading and resource allocation algorithm (DCTORA). We jointly consider maximizing the size of completed tasks and minimizing the system’s carbon emissions and the delay of tasks. DCTORA is a model-free DRL method that can efficiently handle continuous action space. Moreover, it can determine the joint optimization scheme for each task in an unpredictable environment with stochastic tasks and renewable energy sources.
Experimental stimulation: The performance of our proposal is evaluated by extensive simulations. The simulation results demonstrate that, in various simulation scenarios, it outperforms the other four benchmark algorithms in terms of time-average task latency, time-average system execution efficiency, and time-average carbon emissions.
The remainder of this study is outlined as follows.
Section 2 reviews related works.
Section 3 describes the proposed green MEC-enabled MIoT network architecture, along with communication, computing and energy consumption, and carbon emissions models, followed by the problem formulation.
Section 4 outlines the proposed algorithm, while
Section 5 discusses the results of the performance assessment. Finally, we conclude this paper in
Section 6.
2. Related Works
Currently, existing maritime networks are dependent on narrowband radio transmissions, satellite links, and land-based cellular networks as the main methods of web access [
26]. The bandwidth of the maritime radio system is restrictive. Despite the potential of satellite systems for furnishing vessels with global internet access, the prohibitive cost of the satellite terminals and service significantly hinders their popularity. Those living in coastal areas may reap the rewards of the growth of land-based cellular networks (e.g., 5G). In this study, the focus is on the issue of offloading computational tasks and organizing them for offshore cellular communication. There has been extensive research on the issue of computation offloading, though not in relation to maritime scenarios.
For terrestrial cellular networks, MEC is capable of significantly reducing the backhaul traffic, transmission cost, and data leakage risk of the network by offloading computational tasks to nearby servers. The authors in [
27] have investigated the multi-user computational offloading problem in a single-server scenario using a Stackelberg game model under a software-defined network to achieve optimal task offloading. The model in [
27] exhibits high computational complexity, and it does not apply to application scenarios related to real-time task offloading. For the real-time offloading application scenario, the problem was converted into a second-order cone programming problem and solved iteratively through an algorithm based on successive convex approximations, such that the solution method is simplified in [
28]. However, [
28] has not considered the dynamic mobility of edge nodes, leading to its weak applicability. To investigate the fundamental trade-off between latency and energy consumption in MEC systems, an iterative heuristic-based online offloading algorithm has been proposed in [
29]. To minimize the system latency and determine the optimal offloading decision, the authors in [
30] presented the computational offloading decision as a finite-time Markov decision process and employed a dynamic programming approach. However, the work in [
30] does not consider the energy consumption constraints that can lead to high energy consumption by users. To meet the IoT’s requirements for both delay and energy consumption, the authors in [
31] proposed a framework based on decomposition and the continuous pseudo-convex method for the cooperative offloading problem between edge networks and central cloud computing in cellular networks that reduces the system cost by an iterative algorithm. MEC offloading scenarios actually have dynamic, stochastic, and time-varying characteristics, and the main purpose of the above task-offloading schemes for terrestrial networks is to obtain better offloading decisions, lacking consideration of the high real-time requirements of decision algorithms in actual scenarios.
As artificial intelligence technology is leaping forward, more and more researchers have progressively combined it with MEC technology, which can more effectively address the optimization problem of computing task offloading in dynamic, stochastic, and time-varying environments. The authors in [
32] studied the task offloading and resource allocation problem in vehicular networks and proposed a Q-Learning-based task offloading and resource allocation algorithm, but only considered a single MEC system. To optimize the task processing delay while satisfying the energy constraint, the authors in [
33] proposed a Q-Learning-based joint communication and computational resource allocation mechanism for multiple MEC systems in cellular networks and verified that the proposed method has better environmental adaptability through simulation experiments. In [
34], a novel software-defined network (SDN) edge cloud-based Q-Learning optimization framework was adopted to formulate offloading decisions and resource allocation for dynamic offloading scenarios, which can quickly adapt to a communication environment with gradient updates and a small number of samples. Although the studies [
32,
33,
34] have achieved good offloading results in some specific scenarios using Q-learning methods, when the state and action space of the optimization problem are too large and high-dimensional continuous, the memory space for storing Q-values will grow exponentially, and a large time overhead will be incurred in searching for the optimal offloading decision. Thus, deep learning techniques have been employed to solve the problem of high dimensionality in the state space where conventional RL exists. Considering the task offloading decision problem in a multi-user and multi-server MEC scenario, the authors in [
35] proposed an online offloading algorithm based on DRL to address the optimization problem of task offloading. In [
36], a hybrid MEC platform including land-based vehicles and unmanned aerial vehicles (UAVs) was considered, and a hybrid online offloading algorithm based on deep learning was proposed to minimize the energy consumption of IoTDs. However, it is limited to one device’s information input at a time, which does not apply to practical scenarios. The authors in [
37] investigated the joint offloading problem between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) and proposed a multi-intelligence DRL framework to achieve the goal of meeting the delay requirements of both V2I and V2V links. However, the convergence speed is slow.
In contrast to the above work on terrestrial networks, this study mainly focuses on the next-generation maritime information system, and a DRL-based offloading method is proposed for maritime MEC networks to support multiple maritime applications. Thus far, researchers have proposed several task offloading algorithms for maritime MEC, which conform to the requirements for low latency and high-reliability application services in maritime networks to a certain extent. To optimize the allocation mechanism of computational and communication resources under energy-limited and delay-sensitive conditions, the authors in [
38] analyzed the trade-off between the delay and energy consumption of maritime communication and proposed a phased joint optimization algorithm. Considering the difference in network node density between the nearshore and farshore of the ocean, [
39] proposed a task offloading model based on the nearshore and farshore scenarios, which was established and solved using a genetic algorithm and a particle swarm optimization algorithm, respectively. The authors in [
40] used mixed-integer nonlinear programming to separate the optimization objectives, efficiently allocate the transmission power, and formulate the offloading decision by improving the conventional artificial fish swarm algorithm. However, the heuristic algorithm not only requires a large number of iterations during the optimization search but also the computational power of the algorithm decreases significantly in complex unloading environments and the solution quality cannot be guaranteed. Discretizing continuous variables with DQN or DDQN disrupts the continuity of the space, making it impossible to identify the optimal policy. In order to improve learning stability, DDPG leverages the experience replay buffer and target network strategies from DQN. The optimization of DDPG for resource allocation problems with continuous action space makes it a more efficient approach, and its training performance and dependability surpass those of the original actor-critic network. With joint consideration of energy consumption, task delay, and cache fetching cost while adhering to the limited storage and computational resources in MEC systems, the authors in [
41] proposed a DDPG-based algorithm to optimize the long-term average system cost. Similarly, the authors in [
42] proposed a temporal attentional deterministic policy gradient-based DDPG. Although the proposals in [
41,
42] have been shown to achieve faster and more reliable convergence compared to DQN, they consider energy savings more than carbon emission reductions.
In our initial work [
25], we introduced a carbon-aware MEC framework for a hybrid renewable and grid-energy MEC system. We studied the problem of joint task offloading and resource allocation and proposed a DDPG-based joint optimization strategy considering stochastic tasks and renewable energy arrivals. The main objective was to minimize the total carbon emissions and task queue length. This current work includes several added contributions. First, we propose a new framework for green MEC-enabled maritime IoT networks and aim to optimize total system execution efficiency in an unpredictable environment. Second, we formulate the joint task offloading and resource allocation problem by maximizing system execution efficiency, which consists of the system execution cost (including system task latency and carbon emissions) and the size of completed tasks. Third, we propose a DDPG-based joint optimization strategy, eventually obtaining an effective resolution through continuous action space learning in an unpredictable environment. Fourth, we made a detailed comparison of time-average system execution efficiency, time-average task latency, and time-average carbon emission with baseline strategies by investigating metrics in this study. Additionally, apart from the related work reviewed in our earlier work [
25], this article further extensively summarizes and discusses the most pertinent studies on MEC offloading application scenarios, the algorithms employed, and the curse of dimensionality, with special attention to the optimization objectives, particularly in terms of carbon emissions.
Table 1 compares and summarizes the above literature through application scenarios, offloading algorithms, optimization objectives, and the curse of dimensionality. As depicted in
Table 1, the conventional MEC offloading algorithms [
27,
28,
29,
30,
31,
38,
39,
40] have high computational complexity and are always less efficient than AI algorithms. Although the AI offloading algorithm based on RL [
32,
33,
34] can achieve high offloading efficiency, there is a dimensional disaster that is only applicable to small-scale application scenarios. DRL-based offloading algorithms [
35,
36] are capable of overcoming the dimensionality catastrophe, although they are too focused on centralized offloading strategies and are less adaptable to new environments. As revealed by the comparison, the existing research on MEC for maritime networks has the following defects. To be specific, there has been limited research comprehensively considering task offloading and resource allocation in maritime edge collaborative architectures, and little work has taken into account reducing the carbon emissions of maritime networks by exploiting renewable energy. Furthermore, most existing works have slow convergence at high dimensionality and cannot conform to the requirements of low latency and high-reliability task offloading in maritime networks, and they are weakly adaptive to rapidly changing maritime environments.
3. System Model
3.1. Network Architecture
In this study, we propose an offshore MEC architecture based on a renewable energy supply, which is shown in
Figure 1. We consider an uplink transmission scenario for computation offloading in the MEC system with multiple cells and multiple MIoTDs, which consists of a multi-antenna macro-BS k deployed with an MEC server, a set of multiple micro-BSs M = {1,2,…, M}, each equipped with a high-performance processor, energy access point (EAP), and energy storage unit (ESN), and a set of multiple MIoTDs (such as various sensors, smart buoys, and USVs), denoted as N = {1,2,…, N}. All MIoTDs are randomly distributed around micro-BSs, and the number of MIoTDs associated with micro-BS m composes a set N
m = {1,2,…, N
m}, ∀N
m∈N. Each micro-BS covers a certain area for control information delivery, which can sense the system channel state information (CSI) by requesting feedback information through the control link. All micro-BSs can connect to the macro-BS through the wireless links and generate control policies for computation offloading and resource allocations based on local CSI and computation task requirements. Each MIoTD can only communicate with an adjacent micro-BS and is unable to communicate directly with the macro-BS. However, all micro-BSs can establish communication with the macro-BSs through wireless channels. Any MIoTD contains an EH module and can draw energy from its linked micro-BS’s EAP as long as its energy queue is not full. In each TS, the data generated by MIoTD can either be executed or dropped when its delay constraint cannot be satisfied. Depending on its own capability, MIoTD can be executed entirely locally or arbitrarily partially offloaded to its connected micro-BS or the macro-BS through the micro-BS according to the computation and energy resource. We assume that MIoTDs and micro-BSs can switch to conventional grid power when their ESNs’ energy is insufficient, while the macro-BS always operates on grid energy.
Table 2 lists the main notations (and their definitions) used in this paper.
Assuming that the macro-BS has sufficient computing resources, it can efficiently execute multiple parallel tasks without causing a queuing delay. Compared with the data offloaded to the macro-BS and micro-BSs, its execution result is significantly smaller. Thus, the transmission delay and energy consumption needed to send the output data back to the relevant MIoTDs can be ignored.
3.2. Communication Model
The consecutive duration of the process is divided into separate time slots (TSs), wherein each period
τ consistently remains at 2 ms. The OFDM access approach has been adopted to reduce inter-MIoTD interference within the same micro-BS. The available bandwidth of the micro-BS
m, represented as
Wm, is divided into
Nm subchannels of equal width, where
Nm is the number of MIoTDs connected to the micro-BS
m at TS
t. Then, the bandwidth of each MIoTD subchannel can be calculated as
BWm =
Wm/
Nm. Therefore, the achievable data rate between MIoTD
n and its associated micro-BS
m is denoted as
where
pm,n(
t),
hm,n(
t), and
Nm,n denote the transmit power of MIoTD
n, the channel gain between MIoTD
n and micro-BS
m, and the power spectrum density of the additive white Gaussian noise (AWGN), respectively. Let the maximum allowable transmission power of MIoTD
n be denoted as
, and
pm,n(
t) ≤
. Similarly, the available bandwidth
Wk of macro-BSs is divided into
M equal-width subchannels for micro-BSs. The bandwidth of each subchannel assigned to a micro-BS can be formulated as
BWk =
Wk/
M. Hence, the achievable upload data rate between micro-BS
m and macro-BS
k can be represented as
where
hk,m(
t) and
Nk,m express the channel gain between micro-BS
m and macro-BS
k, as well as the power spectrum density of the AWGN, and
pk,m(
t) refers to the transmitting power of micro-BS
m for offloading to the macro-BS
k. Let the maximum allowable transmission power of micro-BS
m be
and
pk,m(
t) ≤
.
3.3. Computing and Energy Consumption Model
In this portion, we illustrate the computation offloading and energy consumption models that we have studied. It is assumed that each MIoTD possesses restricted processing capacities, which can be used to handle the requested computation tasks locally. We consider a stochastic task arrival model in which the computation task of MIoTD n is defined as um,n(t) = {Im,n(t), Cm,n(t), (t)}, with Im,n(t) as the number of CPU cycles required for computing 1-bit data (cycles per bit) and Cm,n(t) as the data size (kbits) of the task. (t) is the maximum delay tolerance (seconds) of um,n(t). For task um,n(t), the computation capability required is Im,n(t) Cm,n(t)/(t) (cycles per second). In each TS, each computation task can either be executed or dropped when its delay constraint cannot be satisfied. φm,n(t) denotes the drop mode indicator. φm,n(t) = 0 indicates the computation task is dropped, while φm,n(t) = 1 means that the task is executed in the TS. Depending on its own capability, MIoTD n can opt for either local computing of the task or offloading it entirely to its connected micro-BS or partially to the macro-BS through the micro-BS. For each TS t, let xm,n(t) be a binary indicator; that is, 1 if MIoTD n has a task um,n(t) to be offloaded and 0 otherwise (i.e., locally executed).
3.3.1. Local Computing
When task
um,n(
t) is chosen for local processing, the processing performance is determined by the quality of MIoTD
n’s computing capability, which can be characterized by the CPU-cycle frequency
fm,n(
t). The computation latency of MIoTD
n can therefore be presented as follows:
where
DEm,n(
t) is the local computing bits of the MIoTD allocated in terms of local execution in TS
t, which can be expressed as
As defined in [
43], the energy consumption to execute the tasks locally can be denoted as
where
ka is a nonnegative coefficient depending on the chip architecture, and
kb and
kc mentioned later are similar to
ka.
When MIoTDs lack the computing capacities to complete the tasks locally, the tasks are offloaded to the micro-BS for processing. Specifically, the MIoTD n’s computation task is initially transmitted to the closest micro-BS m via a wireless connection for processing. If the micro-BS m lacks adequate computation resources or energy to complete the task, it can offload the computation task to macro-BS k, which is equipped with more computation resources, and then retrieve the result once the task is complete.
According to the above, we can obtain the sum of the computing task bits in MIoTD
n transmitted to micro-BS
m at TS
t, which is
when the offloaded data size is more than the maximum number of task bits that can be transmitted in TS
t between micro-BS
m and MIoTD
n, the MIoTD does not offload the tasks, i.e.,
φm,n(
t) = 0. Correspondingly, the transmission latency and energy consumption for
DTm,n(
t) are, respectively,
3.3.2. Edge Computing
As aforementioned, suppose each computing task can only be offloaded to one micro-BS, is fine-grained, and can be arbitrarily split into two parts in the micro-BS: one part is executed on its high-performance processor, while the other is offloaded and processed on the MEC server in the macro-BS. Let
ξm(
t) denote the offloading data ratio of micro-BS
m to macro-BS
k. The micro-BS
m executes
ξm(
t) ×
DTm,n(
t), and the amount of computing bits executed by the macro-BS
k is (1 −
ξm(
t)) ×
DTm,n(
t). The sum of all MIoTDs’ computing bits executed in micro-BS
m at TS is
The computing tasks transmitted by MIoTDs can be given recourse to the CPU cycles provided by the high-performance processor of micro-BS
m. The needed CPU cycles for computing tasks from the MIoTDs connected to the
m-th high-performance processor should be no larger than the available CPU cycles of the
m-th high-performance processor. Thus, we have
where
refers to the
m-th high-performance processor’s maximization available computation capability and
τ refers to the length of each TS. The related computing latency and energy consumption of each micro-BS are
where
fm(
t) denotes the computation resources provided by the micro-BS
m for the tasks from MIoTDs.
When computing tasks are offloaded partially to the macro-BS, the number of computing bits processed in the macro-BS in TS
t is
Similarly, the computation requirements of the MIoTDs cannot surpass the MEC server’s available computation capability. Thus, we have
where
refers to the MEC server’s maximization available computation capability and
τ refers to the length of each TS. Let
fk(
t) denote the computation resources provided by the macro-BS
k for the tasks from MIoTDs. The computing latency and energy consumption of
DEk(
t) are
The latency and energy consumption for uploading data from the micro-BS
m to the macro-BS
k are
Based on the task execution at various offloading modes evidenced above, the total number of computing bits that have been handled by the system in TS
t is
The overall system latency includes the latency of the local execution of tasks and the latency of remote transmission and execution in the system at TS
t, which can be represented as follows:
Correspondingly, the time-average task latency of the system can be denoted as follows:
According to [
16], the energy consumption of micro- or macro-BS can be attributed to two components: static power offset and energy consumption for task execution or data transmission. Since their static power consumption is the same for different offloading schemes, it is negligible in the following model. The energy usage of micro-BS
m encompasses the energy consumption related to executing tasks locally and transmitting data to the macro-BS, as well as charging power to its associated MIoTDs. Let
η denote the inverse of the power amplifier efficiency factor for the process of micro-BS to charge an MIoTD. Assuming equal power amplifier efficiency factor (
η) for all MIoTDs, the amount of energy used by a micro-BS to charge an MIoTD is
η times the quantity of energy consumed by the MIoTD to locally complete the task and transmit its offloaded task to the micro-BS. Hence, the energy consumption of micro-BS
m can be expressed as
3.4. Carbon Emission Model
It is suggested that the energy-harvesting capability of micro-BS
m at TS
t is
Gm(
t). For realistic considerations, the collected green energy is not the sole source of energy for each micro-BS and all MIoTDs in the network, which can also use traditional power when the harvested energy cannot fulfill their energy requirements. Let
gm(
t) represent the ratio of green energy used by micro-BS
m to its total consumed energy
Em(
t). Therefore, the green energy used by each micro-BS can be expressed as
Em(
t)
gm(
t). Considering the energy level of the EH module in micro-BS
m at TS
t, we have
Therefore, the dynamics of the renewable energy level
Bm(
t) of each EH module in micro-BS
m is symbolized as
when
Bm(
t) > 0, micro-BS
m operates on renewable energy sources. When
Bm(
t) ≤ 0, micro-BS
m and its covered MIoTDs switch to a conventional energy supply due to insufficient renewable energy. It is clear that the second scenario would result in higher carbon emissions of the system.
Based on the above, for each TS
t, the network’s energy consumption may consist of grid power and renewable energy. Hence, the system’s grid power usage can be represented as
Hence, the system’s total carbon emissions and time-average carbon emissions can be respectively expressed as
where
ε is the carbon emission factor in kg/kWh, i.e., the amount of CO
2 released when using 1 kWh of conventional energy, typically set at 0.998 kg/kWh [
25].
We defined the total system execution cost as
In actuality, COST(t) is a compromised tradeoff between the task latency and carbon emissions of the system at TS t. w is the weight coefficient. If w is sufficiently large, it demonstrates that MIoTD n is delay-sensitive with much lower latency. Otherwise, the importance of reducing traditional energy consumption and carbon emissions can be significantly highlighted.
3.5. Problem Formulation
In this subsection, the joint optimization problem of computation offloading and resource allocation is formulated under the uplink MIoT network scenario. We defined the system execution efficiency as the ratio of system execution cost to the total size of completed tasks, which can be expressed as
Further, we defined the time-average system execution efficiency as
Our aim is to reduce the system’s carbon emissions and task delay and improve the total size of completed tasks, that is, maximize system execution efficiency while fulfilling the constraints of the task requirements, which can be formulated as follows:
s.t.
where (31b) indicates the restriction on the ratio of green energy consumed. (31c) is the zero-one constraint for the drop and computation mode indicators of task
um,n(
t) at TS
t. (31d) represents the transmission power constraint in terms of MIoTDs and micro-BSs. (31e) and (31f) represent the CPU-cycle frequency constraint of any MIoTD, the high-performance processor, and the MEC server. (31g) and (31h) are the scheduling restrictions of each micro-BS and the macro-BS. (31i) guarantees that the green energy used by micro-BS
m does not exceed its renewable energy level in TS
t. (31j) denotes the constraint of the offloading data ratio from micro-BS
m to macro-BS
k. (31k) ensures the latency requirements of each task.
Tm,n(
t) refers to the total latency of the task
um,n(
t) at TS
t. Task
um,n(
t) may be executed either fully locally or offloaded. For the latter case, the task may be completed entirely at the micro-BS, or partially at the micro-BS, and the other part is executed at the macro-BS. Therefore,
Tm,n(
t) includes the local execution delay or the transmission and execution delay during offloading and can be represented as follows:
Considering the uncertainty of computational tasks and renewable energy in problem (31), next, we employ DRL techniques to determine the optimal offloading policy.
5. Simulation Results and Analysis
In this section, we evaluate the performance of our proposal through extensive simulations using Matlab 2020a to illustrate how well it performs in the MEC-enabled MIoT system for computation offloading. Initially, the simulation parameters are outlined. The performance of the proposed algorithm (DCTORA) is then compared to four benchmark algorithms described below in terms of the time-average cumulative reward, time-average system execution efficiency, time-average task latency, and time-average carbon emissions across different scenarios. For convenience, we will abbreviate “time-average cumulative reward”, “time-average system execution efficiency”, “time-average task latency” and “time-average carbon emission” as “cumulative reward”, “execution efficiency”, “task latency” and “carbon emission”, respectively, in this section:
Full local execution (FL): All MIoTDs independently perform their tasks using their local computing resources.
Full offloading (FO): Each MIoTD offloads its tasks to its connected micro-BS with the allocated transmit power or transmits the tasks to the macro-BS through the micro-BS; that is, all computation tasks are executed by utilizing the computation resources of the micro-BS’s high-performance processors or the MEC server of the macro-BS.
Greedy policy (GP): The system utilizes its maximum power to complete computing tasks either locally or remotely, completing tasks as often as possible.
DQN-based offloading Scheme (DOS): Each agent adopts the DQN approach to train its model for joint optimization of task offloading and resource allocation, in which computing resource and power allocation are separated into 10 levels with values ranging from 0 to their maximum values.
5.1. Simulation Settings
In our simulations, we consider a system with multiple cells each with a micro-BS, and each micro-BS coverage radius is 600 m, where the macro-BS is located at the center of the micro-BS and multi-MIoTDs are randomly distributed between [50, 200] m from each micro-BS and [200, 1000] m from the macro-BS. The length of each TS is set to 2 ms, and the number of participating MIoTDs is randomly chosen from [30, 54]. When an MIoTD is within a certain distance (d) from a micro-BS, it can communicate with the micro-BS directly. We assume the path loss models of micro-BSs and macro-BSs follow PL(dB) = 142.7 + 35.5 log10d, where d represents the distance between a micro-BS and its connected MIoTDs, as well as the distance between a micro-BS and the macro-BS, both measured in kilometers. The communication bandwidth of each micro-BS and each MIoTD node are 6 MHz and 2 MHz, respectively, which undergo Rayleigh fading. The noise power spectral density is N0 = −174 dBm/Hz. Moreover, we consider that the size of tasks generated by each MIoTD obeys a uniform distribution between [200, 1200] KB, and the maximum computation capacities of an MIoTD and a macro-BS are 1 GHz and 25 GHz, respectively. The maximum computation capacity of each micro-BS is between 3 GHz to 7 GHz. The inverse of the power amplifier efficiency factor for the process of a micro-BS charging an MIoTD is η = 1. The energy-harvesting capability of the micro-BS is uniformly distributed between 0 and 80 J.
The DRL-based framework (DCTORA and DOS) employs neural networks in the simulations, specifically for each micro-BS agent. These neural networks are comprised of an input layer, two hidden layers that are fully connected, and an output layer. In order to obtain the optimal objective value through the implementation of the gradient descent algorithm, it is necessary to periodically update the parameters of the target network and enhance the algorithm’s convergence by employing experience replay memory. The actor network and the critic network have different learning rates of 0.0001 and 0.001, respectively, whereas the DOS utilizes a learning rate of 0.001. The neural network is composed of two hidden layers, including 200 and 100 neurons, respectively. Furthermore, the size of the experience replay buffer is set to 2000, the mini-batch size is set to 32, and the discount factor is set to 0.95. The maximum number of episodes is set to 1000, and the frequency of learning is set to 5.
Table 3 provides a summary of the main parameter settings.
5.2. Performance Analysis
We first evaluate the convergence of the proposed DCTORA algorithm. Following this, we analyze the impact of coefficient values on the cumulative reward, task latency, and carbon emissions. Moreover, we discuss the system performance of our solution in different scenarios by comparing it with benchmarks.
5.2.1. Convergence Analysis
Figure 3 shows the convergence performance of the proposed DCTORA algorithm with different values of
w that are used in Equation (28). In the simulation, the number of participating MIoTDs is 30. Furthermore, the number of micro-BSs is 6 and its CPU frequency is 3 GHz. The energy-harvesting capability
Gm(
t) of each micro-BS is set to 30 J. Then, we use our proposed Algorithm 1 to obtain the optimal task offloading strategy.
From
Figure 3, we can observe that the cumulative reward converges to near-optimal values within 500 training episodes. Although the values of cumulative reward are in an unstable state with large fluctuations in the beginning, this indicates that each agent is constantly exploring the environment randomly. After a period of learning, the cumulative reward gradually stabilizes, and the fluctuation range decreases, regardless of the value of
w. Therefore, we can conclude that our proposed algorithm has good convergence performance. In addition, it can be found that in
Figure 3, when
w decreases, the agent yields the highest efficacy of cumulative reward performance with
w = 0.2, followed by
w = 0.5 and then
w = 0.9.
5.2.2. Effect of Coefficient Value on Performance Metrics
To focus on the impact of
w, we fix other parameters that are consistent with
Figure 3. The results of the task latency and the carbon emission with different values of
w are shown in
Figure 4. As a whole, the values of task latency and carbon emission gradually increase initially and then converge to different stable values. As we can see from
Figure 4a,b, the best outcome result in carbon emissions is attained when
w = 0.2, followed by
w = 0.5, and
w = 0.9 has the lowest performance. The statistical results show that the carbon emission achieved with
w = 0.9 surpasses that of
w = 0.5 by a significant 35% and exceeds
w = 0.2 by an impressive 38.24% across all 1000 training rounds. The task latency, when
w = 0.2, shows an increase of 27.14% compared to
w = 0.5 and a substantial 18.27% increase compared to
w = 0.9. From (28) and (29), we can directly observe that when
w < 0.5, more emphasis is placed on the effect of carbon emission. Conversely, when
w > 0.5, more emphasis is placed on the impact of task latency. To balance the trade-off between task latency and carbon emissions, we choose
w = 0.5 for the rest of the simulations in which the value of
w is not the variable.
5.2.3. Performance Metrics Versus Varying Data Size of MIoTD
Figure 5 compares the performance of different offloading schemes in terms of execution efficiency, task latency, and carbon emissions under different values of
Cm,n(
t), where
w = 0.5,
Gm(
t) = 40,
= 4,
= 15, and
M = 6. In
Figure 5a, as
Cm,n(
t) increases, the values of execution efficiency gradually decrease. The execution efficiency is composed of the task latency and carbon emissions; a larger
Cm,n(
t) implies tasks with larger sizes arriving in each TS, which usually leads to heavier computation and transmission burdens for task offloading and results in larger task latency and carbon emissions, subject to the limited computing capacity of MIoTDs. Meanwhile, our proposal achieves better execution efficiency performance than the others in each case (at least 50% larger than FL), and the DDPG-based agents perform better than the DQN-based ones. The GP method seeks to complete more tasks and consumes too much traditional energy, which makes short-sighted choices based on the task latency but does not consider the carbon emission. The FO scheme does not employ local computing, so there is a gap in GP. Furthermore, the FL algorithm overly relies on local execution capabilities, leading to its long latency. To gain a more insightful understanding, we plot the task latency and carbon emission with different values of
Cm,n(
t) in
Figure 5b,c.
In
Figure 5b,c, the trend of the task latency and carbon emissions both go up, with
Cm,n(
t) becoming larger and larger. In
Figure 5b, the difference in task latency among the five policies is slight when
Cm,n(
t) < 600. The DCTORA exhibits the smallest delay, while the FL has the largest delay. The GP’s task latency is less than the FO and FL algorithms. This is because the FL algorithm has the lowest computing capacity, resulting in its task latency being the longest. The GP algorithm seeks to complete more tasks locally or remotely; however, it ignores energy consumption. In contrast to GP, FO lacks local executions, so its task latency is slightly longer than GP. In
Figure 5c, we can observe a significant increase in carbon emissions as
Cm,n(
t) increases. DCTORA exhibits the smallest value, surpassing the GP algorithm by 18%, followed by DOS, FL, FO, and GP. The main reason is that as
Cm,n(
t) rises, more tasks are required to be executed either locally or transferred to the micro- or macro-BS. This leads to higher power consumption, potentially resulting in renewable energy sources failing to meet the system’s energy demand. Consequently, additional grid energy is consumed, leading to increased carbon emissions. The FL algorithm has lower local computing energy consumption for MIoTD compared to the server, resulting in reduced carbon emissions compared to the FO and GP algorithms. Although the FL algorithm may contribute to a carbon emission reduction, it ignores the issue of task latency.
Furthermore, we can conclude from the above that two DRL-based methods (DOS and DDPG) outperform other benchmarks in terms of execution efficiency, task latency, and carbon emission performances. In each Cm,n(t) case, the DCTORA scheme exhibits a reduced task delay and carbon emissions compared to the DOS, providing additional evidence of its effectiveness in addressing high-dimensional complex problems involving continuous action-state spaces.
5.2.4. Performance Metrics Versus Varying Energy-Harvesting Capability of Micro-BS
Figure 6 shows the performance comparison in terms of execution efficiency, task latency, and carbon emissions under different energy-harvesting capabilities of micro-BS, where
w = 0.5,
Cm,n(
t) = 600,
= 4,
= 15, and
M = 6. As we can see from
Figure 6a, the execution efficiency gradually increases with an increase in
Gm(
t). This is because, with a higher value of
Gm(
t), the system has more renewable energy and prefers to process more tasks locally or offload them to the edge servers. As
Gm(
t) > 60, the harvested renewable energy in real-time is sufficient to support the requirement of task executions, execution efficiency remains relatively constant regardless of the increase in the capability of energy harvesting. In
Figure 6b,c, the task latency and carbon emission gradually decrease and then also converge to stable values with the increase in the
Gm(
t). The reason is that, as
Gm(
t) increases, more green energy becomes available for the system, reducing grid energy consumption and decreasing carbon emissions. Additionally, more available green energy enables local and remote task execution and powers more computations, leading to reduced task latency and, as a result, improved execution efficiency.
As a whole, we can demonstrate that the DRL-based algorithms can make proper offloading decisions to achieve better performance, and the simulation results show that the performance of the DCTORA algorithm is higher than the other schemes. As for the traditional algorithms, the GP’s result is better than the other methods, but it only makes short-sighted choices based on the task latency, not considering the carbon emissions. When the value of Gm(t) increases, all algorithms’ execution efficiencies increase, and task latency and carbon emission decrease (compared with the GP algorithm, the DCTORA algorithm has changed more than 26% and 18% in task latency and carbon emissions, respectively).
5.2.5. Performance Metrics Versus Maximum Transmit Power of MIoTD
The maximum transmission power of MIoTD has a significant impact on the data transmission delay, energy consumption, carbon emissions, and the number of data bits executed by the system. Hence, this section will delve into exploring how
influences the system’s performance.
Figure 7 shows the correlation between the system performance and
of MIoTDs by different offloading policies, where
w = 0.5,
Gm(
t) = 50,
Cm,n(
t) = 600,
= 4, and
M = 6.
Figure 7a illustrates the execution efficiency under different
constraints at the MIoTD for different offloading schemes. The curves show that the execution efficiency of different algorithms except FL increases as
increases. The increase in
indicates a high available transmission rate between MIoTD and the micro-BS, which can facilitate task offloading and increase energy consumption to a certain extent, while also leading to increased carbon emissions when renewable energy is Insufficient. The DCTORA scheme achieves better performance than other offloading methods under different
constraints, which indicates its effectiveness.
We can see from
Figure 7b,c that the FL algorithm is almost not affected by the change in
, other schemes are highly dependent on the
constraint, the task latency decreases significantly as
increases, and the carbon emissions show a rising trend with an increase in
. This is because in FL, each MIoTD executes tasks locally without sending them to micro-BSs. For other schemes, higher levels of power allocated to MIoTDs can directly improve MIoTDs’ transmission rates, and the MIoTDs will prefer to offload tasks to the high-performance processor within their competence, which can facilitate task offloading and increase carbon emissions to a certain extent. According to
Figure 7b, the FL, FO, and GP algorithms are ineffective at minimizing task delay, with increases of 35.9%, 25.5%, and 10.1%, respectively, as compared to the proposed algorithm. Meanwhile, the simulation results showed that the DOS scheme had a task latency that was 4.9% greater than that of the proposed algorithm.
5.2.6. Performance Metrics Versus Different Computation Capacity of Micro-BS
Figure 8 presents the influence of different values of
(ranging from 3 GHz to 7 GHz) on system performance under different schemes with a fixed number of micro-BSs, where we set
Gm(
t) = 20,
Cm,n(
t) = 600,
= 13, and
M = 6. As we can see from
Figure 8a, the execution efficiency values of all curves increase when the computation capacity of the micro-BS increases. This is because a larger
means more computing capability can be provided, and it will considerably decrease the task execution latency at the micro-BS. When the computation capacity of the micro-BS reaches up to 5, the execution efficiency of all algorithms increases slowly.
The effect of
on task latency and carbon emissions is shown in
Figure 8b,c, where we fix
and increase
of the micro-BS. We can observe that the task latency and carbon emissions of the FL algorithm are less affected by
. Under different values of
, the DOS and DCTORA schemes can achieve better performance than the other three benchmarks, and the performance of the DCTORA paradigm is slightly better than the DOS paradigm. When
< 5 GHz/s, except FL, the task latency of all algorithms obviously decreases. This is because the processing speed of the micro-BS is faster than that of the MIoTD, and each MIoTD chooses to offload more tasks to the micro-BS. When 5 GHz/s ≤
≤ 7 GHz/s, the downward trend of task latency slows down. Numerically, DCTORA is 18.4%, 38.9%, 23.5%, and 21.2% better than DOS, FL, FO, and GP in task latency, respectively. From
Figure 8c, we can observe that the carbon emissions of all algorithms, except FL, increase with an increase 0k
when 5 GHz/s ≤
≤ 6 GHz/s and gradually stabilizes at
> 6 GHz/s. When
> 6 GHz/s, the system’s energy consumption has reached saturation, even if increasing
has little influence on the system’s carbon emissions. Furthermore, the performance of DCTORA is superior to other algorithms. The GP algorithm seeks to accomplish more tasks, so it has the highest carbon emissions. FO and FL algorithms take into account the carbon emissions factor, but the FL algorithm lacks BS-aided computing with traditional energy supply, so it has slightly lower carbon emissions. To summarize, DCTORA is 8.9%, 6.1%, 29.9%, and 31.9% lower than DOS, FL, FO, and GP in carbon emissions, respectively.
5.2.7. Performance Metrics Versus Different Numbers of Micro-BSs
To gain a more insightful understanding of the DCTORA agent, we plot the execution efficiency, task latency, and carbon emission values with different numbers of micro-BSs. We take a random number between [30, 54] as the total number of MIoTDs and set
= 4 GHz/sec,
Gm(
t) = 20, and
Cm,n(
t) = 600. Taken as a whole, except for FL, the other algorithms’ execution efficiencies increase, task latency decreases, and carbon emissions increase as the number of micro-BSs grows, as shown in
Figure 9. Our proposed DCTORA algorithm can achieve the best result, and the DOS follows with a small gap. The reason for this is that increased micro-BS availability offers more computational resources, and MIoTDs are closer to the micro-BS, which allows more MIoTDs to transfer their tasks to neighboring micro-BSs for processing, resulting in lower transmission and execution latency and improved system execution efficiency but increasing carbon emissions to some extent. The performance of FL remains unaffected by the increasing number of micro-BSs, mainly because local computing does not employ the computational resources offered by micro-BSs’ high-performance processors.