Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling

Liang, Wanlin; He, Xiaofan

doi:10.3390/electronics13224403

Open AccessArticle

Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling

by

Wanlin Liang

and

Xiaofan He

^*

The Electronic Information School, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4403; https://doi.org/10.3390/electronics13224403

Submission received: 28 September 2024 / Revised: 2 November 2024 / Accepted: 5 November 2024 / Published: 11 November 2024

(This article belongs to the Special Issue Distributed Intelligence Technologies for Smart Cities of the Future)

Download

Browse Figures

Versions Notes

Abstract

:

To address the straggling effect in distributed edge computing, existing methods often introduce extra computation or communication costs. Recently, information recycling has emerged as an efficient solution that avoids such extra overhead. Nonetheless, the performance of the existing information recycling-assisted rate-splitting-based distributed edge computing scheme significantly hinges on the single common stream, whose rate is limited by the edge node (EN) with the worst channel quality. To this end, a multi-group information recycling scheme for rate-splitting-based distributed edge computing is proposed, where the ENs are clustered into multiple groups, each with its own common stream. In this way, the proposed scheme alleviates the communication limitation of a single common stream, thereby boosting the information recycling mechanism for promoted computation collaboration. A K-medoids-based grouping algorithm is designed for the general multi-antenna case, and a more efficient continuous partitioning-based grouping algorithm is proposed for the special case of single-antenna. Besides, a convex–concave procedure-based algorithm is developed to solve the corresponding latency optimization problem. Simulations demonstrate that the proposed scheme offers significant and robust advantages across various communication and computation conditions. In the considered scenarios, the proposed scheme can substantially reduce the processing latency by up to 51.0% compared to the conventional information recycling scheme.

Keywords:

distributed edge computing; straggling; information recycling; non-orthogonal transmission

1. Introduction

In recent years, edge computing has played an increasingly vital role in enhancing the quality of computational services [1,2]. However, the mounting scale of edge intelligence applications, (e.g., big data, augmented reality, and artificial intelligence) is pushing the limits of edge computing. These large-scale and computationally intensive tasks have rendered traditional single-node edge computing insufficient, entailing distributed edge computing, which has emerged as a promising computation paradigm [3,4,5]. This approach partitions and distributes the original large-scale and intensive tasks among multiple edge nodes (ENs), exploiting their computational resources for faster processing.

Nonetheless, the straggling effect presents a notable challenge in distributed edge computing. Specifically, some ENs, known as stragglers, may experience substantial delays or even fail to complete their tasks due to various factors, including hardware heterogeneity, imbalanced workload allocation, I/O contentions, and hardware failures [6,7,8]. This significantly degrades the overall performance, as the system needs to wait for feedback from all the ENs in distributed edge computing [9]. To alleviate the straggling issues, several strategies have been considered in the literature. One typical approach is task replication [10,11,12], where redundant copies of the original task are distributed across multiple ENs, and the execution delay is determined by the fastest EN despite some stragglers. Besides task replication, coded computing [8] has recently been advocated, where the partitioned subtasks are encoded into a set of coded subtasks via coding theory [13]. Thanks to the well-designed coding redundancy, the mobile user can reconstruct the original task result using the feedback from only a subset of ENs, thus avoiding the need to wait for stragglers. In addition, workload reallocation [14,15] is another commonly adopted method, which allows slower ENs to re-transmit part of their remaining workloads to other faster ENs to minimize the overall processing latency. Nevertheless, despite the effectiveness of these pioneering works in mitigating the straggling effect, they often introduce redundant computation or require additional data transmission, which may still limit the overall performance.

To handle straggling without incurring extra computation and communication costs, a physical-layer mechanism known as information recycling has been recently proposed [16], in view of the widespread employment of the spectrum-efficient non-orthogonal transmission techniques (e.g., rate-splitting multiple access (RSMA) [17,18] and power-domain non-orthogonal multiple access (NOMA) [19]) in distributed edge computing [20,21,22,23,24]. The key insight of information recycling is that, under many non-orthogonal transmission techniques involving successive interference cancellation (SIC), each EN has to decode part of the information intended for other ENs [25,26]. Instead of directly discarding this part of the information after interference cancellation, the information recycling mechanism proactively recycles this information (for free), by exploiting the SIC procedure. With the recycled information, faster ENs that complete their own task earlier can help execute the corresponding part of the task for the straggling ENs, instead of waiting for the straggling ENs, thereby mitigating the straggling effect without extra overhead. A concrete example of the information recycling mechanism has been proposed for rate-splitting-based distributed edge computing systems [16]. The performance of this conventional scheme heavily hinges on the single common stream in rate-splitting. Specifically, a larger common stream can increase the portion of tasks available for information recycling and thus promote computation collaboration. However, the rate of the common stream is fundamentally limited by the EN with the worst channel quality. This constraint undermines the benefits of information recycling, since enforcing a large common stream may aggravate the impact of the rate limitation on transmission, leading to sub-optimal communication performance. To the best of our knowledge, developing a new distributed edge computing scheme that can overcome the communication limitation of the single common stream remains an open challenge.

To this end, a novel scheme, coined as multi-group information recycling for rate-splitting-based distributed edge computing, is proposed in this work. Particularly, it will cluster the ENs into multiple groups and offer one common stream for each group, where collaborative computation can still be enabled through information recycling. Since each common stream is decoded by a smaller number of ENs, the communication bottleneck of the single common stream can be alleviated, thereby facilitating the common stream transmission. As a result, the information recycling mechanism can be further boosted, promoting computation collaboration among the ENs for processing latency reduction. The major contributions of this work are summarized as follows:

A multi-group information recycling scheme for rate-splitting-based distributed edge computing is proposed in this work for processing latency reduction;
To obtain a good EN grouping strategy, a K-medoids-based grouping algorithm is designed by jointly considering the channel alignment and the channel strength for the general multi-antenna case, and a more efficient continuous partitioning-based grouping algorithm is proposed for the special case of single-antenna;
By exploiting the inherent structure of the corresponding latency optimization problem and bounding via difference of convex (DC) programming, a convex–concave procedure (CCCP)-based algorithm is developed to find a reasonably good solution.

The remainder of this paper is organized as follows. Related works are briefly reviewed in Section 2, and the system model and the conventional information recycling mechanism are presented in Section 3. In Section 4, the proposed scheme is presented, and the corresponding latency optimization problem is formulated. The latency optimization algorithms are developed in Section 5. Simulation results are presented in Section 6. Finally, conclusions and future works are discussed in Section 7.

Notations:

{(\cdot)}^{H}

denotes the Hermitian transpose.

C^{a \times b}

represents a complex matrix with dimension

a \times b

.

CN (a, b)

represents the distribution of a circularly symmetric complex Gaussian random vector with mean vector

a

and covariance matrix

b

.

E [\cdot]

represents expectation.

R [\cdot]

obtains the real part of a complex number.

|\cdot|

is the absolute value.

tr (\cdot)

is the trace.

2. Related Work

Several methods have been considered in the literature to mitigate the straggling effect in distributed edge computing, including task replication [10,11,12], coded computing [27,28,29], as well as workload reallocation [14,15]. For example, in [10], the original computation task is duplicated and assigned to multiple ENs so that the execution latency is determined by the fastest EN rather than the stragglers. The tradeoff between expected completion time and task replication ratio is explored in [11] to identify the optimal replication strategy for various computation straggling models. In [12], a balanced grouping replication strategy is further proposed, focusing on optimizing the redundancy level to minimize average job compute time while preserving task completion time predictability. In addition to task replication, coded computing has gained traction for the ability to mitigate straggling via coding theory [13]. Specifically, the maximum distance separable (MDS) codes are used in [27] for distributed task offloading. It is shown that MDS-coded offloading can effectively mitigate the straggling effect as the mobile user can reconstruct the original task result using feedback from only a subset of fast ENs. Besides, entangled polynomial codes are proposed in [28] to achieve the optimal tradeoff between the recovery threshold (i.e., the minimum number of ENs required to recover the original task) and the per-EN computation workload through different matrix partitioning. In [29], the impact of the coding parameter selection under different task execution models is investigated to alleviate the straggling effect and minimize the processing latency. Moreover, workload reallocation is another widely employed method for straggler mitigation. For instance, in [14], when a faster EN completes its task and becomes idle, part of the straggling ENs’ remaining workloads can be re-transmitted to this faster EN, thereby rebalancing workload and addressing straggling. In [15], by detecting straggling ENs continuously in the computation process, the mobile user can promptly reassign tasks to faster ENs, thereby minimizing idle time and improving overall system performance. Nevertheless, while these methods effectively mitigate the straggling effect, their performance remains constrained. Specifically, task replication and coded computing incur increased communication and computational costs, because replicated or coded subtasks are often larger than the subtasks in traditional distributed computing due to the introduced task redundancy. Besides, workload reallocation necessitates task data retransmission, which brings extra communication costs and limits the overall performance.

On the other hand, significant research has been conducted on strengthening communication in distributed edge computing using non-orthogonal transmission techniques. For instance, in [22], the computation subtasks are offloaded to the ENs via NOMA, and the processing latency is minimized by jointly optimizing transmission duration and workload allocation. In [23], the benefits of NOMA are investigated for both uplink and downlink transmissions in mobile edge computing. In [24], the total multi-user processing latency of an RSMA-based distributed edge computing system is optimized and the superiority of RSMA in serving distributed edge computing is manifested as compared to NOMA. However, most of these pioneering works mainly consider how non-orthogonal transmission strengthens the communication in distributed edge computing, while the straggling issue is largely ignored.

To handle straggling without incurring extra computation and communication costs, the information recycling mechanism is proposed in [16], which can be integrated into the non-orthogonal transmission techniques. In particular, by recycling the task information intended for other ENs in the decoded common stream, the faster ENs can help execute the corresponding part of the task for the slower ENs, thereby mitigating the straggling effect. However, the effectiveness of the existing information recycling-assisted distributed edge computing scheme highly hinges on the single common stream in rate-splitting and the rate of the common stream is inherently limited by the EN with the poorest channel quality. This constraint diminishes the advantages of information recycling, as dedicating resources to a single common stream can promote collaborative computation but may also degrade overall communication performance. In contrast to these pioneering works, by clustering ENs into multiple groups and transmitting several common streams, the proposed multi-group information recycling scheme can alleviate the single common stream’s communication limitation to facilitate the transmission of the common streams, thus promoting the information recycling mechanism for enhanced computation collaboration among the ENs.

3. System Model

In this section, the system model of rate-splitting-based distributed edge computing with stragglers will be presented, together with the conventional information recycling mechanism.

3.1. System Model

Consider a distributed edge computing system consisting of a mobile user and K ENs indexed by

K = {1, 2, \dots, K}

. The mobile user wirelessly offloads its computation-intensive task

W

to these K ENs for faster processing. Specifically, the computation task W can be divided into K independent subtasks

{W_{1}, W_{2}, \dots, W_{K}}

(Take (large-scale) matrix multiplication, a fundamental operation in many machine learning algorithms [30], as an example of the practical application. The original matrix multiplication task

Fb

will be partitioned into K sub-tasks {

F_{1} b, F_{2} b, \dots, F_{K} b

} with

F^{T} = [F_{1}^{T}, F_{2}^{T}, \dots, F_{K}^{T}]

). Each sub-task

W_{k}

will be offloaded to EN k for processing (As in the existing literature [1,23,31,32], it is assumed that the size of the computing results is much smaller compared to the offloaded subtasks, making the downlink transmission latency negligible).

3.1.1. Communication Phase

For better spectrum efficiency in communications, it is assumed that the mobile user employs RSMA to offload the subtasks to the K ENs during the communication phase. Specifically, each subtask

W_{k}

is split into a common part

W_{k}^{c}

and a private part

W_{k}^{p}

(In the matrix multiplication, this corresponds to partition

F_{k}

along the rows into two parts,

F_{k}^{c}

and

F_{k}^{p}

, where

F_{k}^{T} = [F_{k}^{c, T}, F_{k}^{c, T}]

). In the wireless physical layer, all the common parts

{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}}

are encoded into a single common stream

s^{c}

, while each private part

W_{k}^{p}

is encoded into a private stream

s_{k}^{p}

. By employing superposition coding, the transmitted signal

x

from the mobile user is given by

x = w^{c} s^{c} + \sum_{k \in K} w_{k}^{p} s_{k}^{p},

(1)

where

w^{c} \in C^{N_{t} \times 1}

and

w_{k}^{p} \in C^{N_{t} \times 1}

are the precoding vectors for the common stream

s^{c}

and the private stream

s_{k}^{p}

, respectively, with

N_{t}

denoting the number of antennas of the mobile user. The received wireless signal

y_{k}

at EN k can be expressed as

y_{k} = h_{k}^{H} x + n_{k},

(2)

where

h_{k} \in C^{N_{t} \times 1}

represents the gain of the wireless channel between the mobile user and EN k, and

n_{k} \sim CN (0, σ^{2})

is the corresponding additive white Gaussian noise. According to the decoding rules of rate-splitting [18], each EN k first decodes the common stream

s^{c}

by treating the interference from all private streams as noise and then subtracts it from the received signal using SIC. Thereafter, EN k decodes its private stream

s_{k}^{p}

by treating the remaining interference from the other private streams as noise. Hence, by plugging (1) into (2), the signal-to-interference-plus-noise ratios (SINRs) of the common stream

s^{c}

and the private stream

s_{k}^{p}

to EN k are, respectively, given by

\begin{matrix} γ_{k}^{c} = \frac{{|h_{k}^{H} w^{c}|}^{2}}{\sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}, \forall k \in K, \end{matrix}

(3)

\begin{matrix} γ_{k}^{p} = \frac{{|h_{k}^{H} w_{k}^{p}|}^{2}}{\sum_{j \in K ∖ k} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}, \forall k \in K . \end{matrix}

(4)

With the above SINRs, the rates

R_{k}^{p}

’s of the private steams should admit

R_{k}^{p} \leq B {log}_{2} (1 + γ_{k}^{p}), \forall k \in K,

(5)

where B is the system bandwidth. To ensure that the common stream

s^{c}

can be successfully decoded by all ENs, the rate

R^{c}

of the common stream should admit

R^{c} \leq {min}_{k \in K} \{B {log}_{2} (1 + γ_{k}^{c})\} .

(6)

As the rate

R^{c}

of the common stream is shared among all ENs to offload

{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}}

, it has

\sum_{k = 1}^{K} R_{k}^{c} = R^{c}

, where

R_{k}^{c}

is the uplink transmission rate of the common part

W_{k}^{c}

of subtask

W_{k}

. Hence, the overall uplink transmission rate

R_{k}

to EN k is given by

R_{k} = R_{k}^{c} + R_{k}^{p} .

(7)

3.1.2. Computation Phase

In the computation phase, each EN k processes its received offloaded subtask

W_{k}

. To account for the straggling effect [8,9], the computation rate

C_{k}

of EN k can be modeled as follows [29]

C_{k} = \bar{C} - Δ_{k} .

(8)

Here,

\bar{C} = \frac{f_{e}}{β}

denotes the maximum computation rate of the ENs, where

f_{e}

is the maximum processor frequency, and

β

represents the number of CPU cycles required to process one bit of data.

Δ_{k}

is a non-negative random variable representing the degradation in computation capability due to the straggling effect. The variables

{\{Δ_{k}\}}_{k = 1}^{K}

are assumed to be independent and identically distributed (i.i.d.).

3.2. Conventional Information Recycling

The idea of conventional information recycling for rate-splitting-based distributed edge computing [16] is introduced in the sequel. According to the rate-splitting transmission principle [18], each EN k will decode the task messages

\{W_{1}^{c}, \dots, W_{k - 1}^{c}, W_{k + 1}^{c}, \dots, W_{K}^{c}\}

intended for other ENs when decoding the common stream

s^{c}

to obtain its own message

W_{k}

. In conventional distributed computing, after decoding, these messages will be discarded directly. As a result, each EN can only compute its own subtask

W_{k}

, and the faster EN that completes its own task earlier can do nothing but wait for the straggling ENs. As shown in Figure 1a, both EN 1 and EN 3 can do nothing but wait for the straggling EN 3 and remain idle, which undoubtedly wastes the computation capability of the system. In contrast, the information recycling mechanism enables the ENs to proactively recycle the messages

\{W_{1}^{c}, \dots, W_{k - 1}^{c},

W_{k + 1}^{c}, \dots, W_{K}^{c}\}

, as shown in Figure 1b. In this way, each EN will have full knowledge of the common parts

{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}}

of all the subtasks

{\{W_{k}\}}_{k = 1}^{K}

. When a certain EN i is straggling, another faster EN j can help compute (the common part of) its subtask based on the recycled information

W_{i}^{c}

.

The specific process of collaborative computing for the common parts is introduced as follows and summarized in Algorithm 1. Each common part task

W_{k}^{c}

is divided into

A_{k}

mini-tasks

{W_{k, u}^{c}}_{u = 1}^{A_{k}}

, where

A_{k} = ⌈\frac{D_{k}^{c}}{D_{m}}⌉

. In the matrix multiplication, each common part

F_{k}^{c}

is further partitioned along the rows into

A_{k}

smaller parts

\{F_{k, 1}^{c}, F_{k, 2}^{c}, \dots, F_{k, A_{k}}^{c}\}

to form mini-tasks

{\{F_{k, u}^{c} b\}}_{u = 1}^{A_{k}}

, where

F_{k}^{c, T} = [F_{k, 1}^{c, T}, F_{k, 2}^{c, T}, \dots, F_{k, A_{k}}^{c, T}]

. Here,

D_{k}^{c}

and

D_{m}

are the sizes of the task

W_{k}^{c}

and each mini-task, respectively, (note that

D_{m}

is a parameter determined during the engineering implementation process; therefore, its specifics are beyond the scope of this paper). The system also maintains an ordered task queue for each EN k, which is initialized as

{W_{k, u}^{c}}_{u = 1}^{A_{k}}

. Each EN sequentially processes the mini-tasks according to their queuing order. Upon completing the mini-task

W_{k, u}^{c}

, EN j sends a finishing signal

S_{k, u}^{fin}

to inform the mobile user. Besides, whenever a certain EN j completes all its assigned mini-tasks and the overall computation task has not yet been finished, it can assist other straggling ENs in computing their mini-tasks by information recycling. Specifically, if straggling EN i has the longest unfinished task queue at the current time, and task

W_{k, v}^{c}

at the end of its task queue, the mobile user will send a recycling signal

S_{k, v}^{rec}

to EN j and a canceling signal

S_{k, v}^{can}

to EN i. The signals

S_{k, v}^{rec}

and

S_{k, v}^{can}

direct EN j to begin computing

W_{k, v}^{c}

using information recycling and EN i to remove

W_{k, v}^{c}

from its task queue, respectively. This collaboration continues until the overall computation task is completed, allowing fast ENs and slow ENs to collaboratively compute the common parts

{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}}

. An illustrative example is depicted in Figure 2. With the information recycling mechanism enabling computation collaboration, theoretically, the computation capabilities of all ENs can be captured for the computation of

\{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}\}

and the overall computation rate

C^{rec}

can be made to

C^{rec} = \sum_{k = 1}^{K} C_{k} .

(9)

Algorithm 1 The collaborative computing process of information recycling

1:: Initialize: For each EN k, divide the common part $W_{k}^{c}$ into mini-tasks and initialize the ordered task queue ${W_{k, u}^{c}}_{u = 1}^{A_{k}}$ ;
2:: Each EN starts to sequentially process the mini-tasks according to their queuing order;
3:: while The overall computation task is not completed do
4:: for $j = 1 \dots K$ do
5:: if EN j completes a mini-task $W_{k, u}^{c}$ then
6:: EN j sends a finishing signal $S_{k, u}^{fin}$ to the mobile user;
7:: end if
8:: if EN j receives a recycling signal $S_{k, v}^{rec}$ then
9:: EN j adds $W_{k, v}^{c}$ to its task queue and begins computing $W_{k, v}^{c}$ via information recycling;
10:: end if
11:: if EN j receives a canceling signal $S_{k, v}^{can}$ then
12:: EN j removes $W_{k, v}^{c}$ from its task queue;
13:: end if
14:: end for
15:: if The mobile user receives a finishing signal $S_{k, u}^{fin}$ from EN j and determines that EN j has completed all assigned mini-tasks then
16:: Finding the EN i with the longest unfinished task queue, and task $W_{k, v}^{c}$ at the end of its task queue, the mobile user sends a recycling signal $S_{k, v}^{rec}$ to EN j and a canceling signal $S_{k, v}^{can}$ to EN i;
17:: end if
18:: end while

4. Proposed Multi-Group Information Recycling

4.1. Multi-Group Information Recycling Scheme

The performance of the conventional information recycling assisted distributed edge computing highly depends on the single common stream [16]. Specifically, when more resources are allocated to the common stream, the common parts

\{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}\}

will occupy a larger portion of the task, thus facilitating the information recycling mechanism for straggler mitigation. Nonetheless, enforcing a too-strong common stream may not be necessarily good for communication, as the rate of the common stream

s^{c}

is restricted by the EN with the worst channel quality. This may offset the performance gain brought by the information recycling mechanism. For this reason, a new information recycling scheme that can overcome the communication limitation of the single common stream is required. To this end, a multi-group information recycling scheme is proposed for rate-splitting-based distributed edge computing below.

Assume that the K ENs are divided into G mutually-exclusive groups and each group g contains

K_{g}

ENs with

⋃_{g \in G} K_{g} = K

. For the communication phase, as shown in Figure 3a, the common parts

\{W_{k}^{c} | k \in K_{g}\}

of the ENs in each group g are combined as

W_{K_{g}}^{c}

and encoded into the common stream

s_{K_{g}}^{c}

in the wireless layer. Meanwhile, the private part

W_{k}^{p}

is still encoded into

s_{k}^{p}

. By employing superposition coding, the transmitted signal

x

from the mobile user is given by

x = \sum_{g \in G} w_{K_{g}}^{c} s_{K_{g}}^{c} + \sum_{k \in K} w_{k}^{p} s_{k}^{p},

(10)

where

w_{K_{g}}^{c} \in C^{N_{t} \times 1}

is the precoding vector for the common stream

s_{K_{g}}^{c}

. By plugging (10) into (2), the received signal

y_{k}

at EN k is given by

y_{k} = \sum_{g \in G} h_{k}^{H} w_{K_{g}}^{c} s_{K_{g}}^{c} + \sum_{j \in K} h_{k}^{H} w_{k}^{p} s_{j}^{p} + n_{k} .

(11)

The decoding procedure of EN k in group g (i.e.,

k \in K_{g}

) is performed as follows: EN k in group g decodes its common stream

s_{K_{g}}^{c}

by treating other common streams and all private streams as interference and removes it from the received signal using SIC. Thereafter, EN k decodes its private stream

s_{k}^{p}

by treating the other common streams and private streams as interference. Accordingly, the signal-to-interference-plus-noise ratios (SINRs) of the common stream

s_{K_{g}}^{c}

and the private stream

s_{k}^{p}

to EN k in group g are given as follows

\begin{matrix} γ_{K_{g}, k}^{c} = \frac{{|h_{k}^{H} w_{K_{g}}^{c}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}, g \in G, k \in K_{g}, \end{matrix}

(12)

\begin{matrix} γ_{k}^{p} = \frac{{|h_{k}^{H} w_{k}^{p}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K, j \neq k} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}, k \in K . \end{matrix}

(13)

With the above SINRs, the achievable rate

R_{k}^{p}

of the private stream

s_{k}^{p}

admits

R_{k}^{p} \leq B {log}_{2} (1 + γ_{k}^{p}), \forall k \in K .

(14)

To ensure that each common stream

s_{K_{g}}^{c}

can be successfully decoded by the ENs in group g, the achievable rate

R_{K_{g}}^{c}

of the common stream

s_{K_{g}}^{c}

should admit

R_{K_{g}}^{c} \leq min_{k \in K_{g}} \{B {log}_{2} (1 + γ_{K_{g}, k}^{c})\}, \forall g \in G .

(15)

For the computation phase, as shown in Figure 3b, the ENs in each group g will first collaboratively compute the common parts

\{W_{k}^{c} | k \in K_{g}\}

with information recycling. Next, each EN k will compute its private part

W_{k}^{p}

independently.

Remark 1.

In the proposed scheme, each common stream

s_{K_{g}}^{c}

is decoded by a smaller number of ENs instead of all ENs. As a result, the communication bottleneck of the single common stream can be alleviated, which allows more resources to be allocated to form stronger common streams. Consequently, the size of the common parts

\{W_{1}^{c}, W_{2}^{c}, \dots, W_{K}^{c}\}

can be increased, thereby boosting the advantage of information recycling.

4.2. Problem Formulation

As presented before, the original task W with size D will finally be converted into G combined common parts

\{W_{K_{1}}^{c}, W_{K_{2}}^{c}, \dots, W_{K_{G}}^{c}\}

and K private parts

\{W_{1}^{p}, W_{2}^{p}, \dots, W_{K}^{p}\}

for transmission. The uplink transmission time

T_{u, K_{g}}^{c}

of the combined common part

W_{K_{g}}^{c}

is given by

T_{u, K_{g}}^{c} = \frac{D_{K_{g}}^{c}}{R_{K_{g}}^{c}},

(16)

where

D_{K_{g}}^{c}

is the size of

W_{K_{g}}^{c}

. The uplink transmission time

T_{u, k}^{p}

of the private part

W_{k}^{p}

of the EN k is given by

T_{u, k}^{p} = \frac{D_{k}^{p}}{R_{k}^{p}},

(17)

where

D_{k}^{p}

is the size of

W_{k}^{p}

. As the common streams and the private streams are transmitted concurrently, the overall uplink transmission time

T_{u}

should admit

T_{u} \geq max \{max_{g \in G} T_{u, K_{g}}^{c}, max_{k \in K} T_{u, k}^{p}\} = max \{max_{g \in G} \frac{D_{K_{g}}^{c}}{R_{K_{g}}^{c}}, max_{k \in K} \frac{D_{k}^{p}}{R_{k}^{p}}\} .

(18)

In the computation phase, the ENs in each group g will first collaboratively compute the combined common part

W_{K_{g}}^{c} = \{W_{k}^{c} | k \in K_{g}\}

. By (9), the corresponding computation latency

T_{e, K_{g}}^{c}

in group g is given by

T_{e, K_{g}}^{c} = \frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η,

(19)

where

η

is the cost for the signaling transmission (c.f., Section 3.2). Next, the ENs in group g start to compute their own private part, and the computation latency

T_{e, k}^{p}

of the private part

W_{k}^{p}

is given by

T_{e, k}^{p} = \frac{D_{k}^{p}}{C_{k}} .

(20)

Accordingly, the computation latency

T_{e, K_{g}}^{p}

for the private parts in group g is given by

T_{e, K_{g}}^{p} = max_{k \in K_{g}} \{\frac{D_{k}^{p}}{C_{k}}\} .

(21)

Hence, the overall latency

T_{e}

for the total computation phase is given by

T_{e} = max_{g \in G} \{T_{e, K_{g}}^{c} + T_{e, K_{g}}^{p}\} = max_{g \in G} \{\frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η + max_{k \in K_{g}} \{\frac{D_{k}^{p}}{C_{k}}\}\} .

(22)

Based on the above, the expected latency minimization problem can be formulated as follows

\begin{matrix} P_{1} : min_{N, w, D, R} & T_{u} + E [T_{e}] \\ s . t . & (14), (15), (18), (22), \end{matrix}

\begin{matrix} \sum_{g \in G} D_{K_{g}}^{c} + \sum_{k \in K} D_{k}^{p} = D, \end{matrix}

(23)

\begin{matrix} tr (w w^{H}) \leq P_{t}, \end{matrix}

(24)

\begin{matrix} ⋃_{g \in G} K_{g} = K, \end{matrix}

(25)

\begin{matrix} K_{i} \cap K_{j} = \emptyset, \forall i, j \in G, \end{matrix}

(26)

where

N = [K_{1}, K_{2}, \dots, K_{G}], w = [w_{K_{1}}^{c}, w_{K_{2}}^{c}, \dots, w_{K_{G}}^{c}, w_{1}^{p}, w_{2}^{p}, \dots, w_{K}^{p}], D = [D_{K_{1}}^{c}, D_{K_{2}}^{c}, \dots, D_{K_{G}}^{c}, D_{1}^{p}, D_{2}^{p}, \dots, D_{K}^{p}]

, and

R = [R_{K_{1}}^{c}, R_{K_{2}}^{c}, \dots, R_{K_{G}}^{c}, R_{1}^{p}, R_{2}^{p}, \dots, R_{K}^{p}]

are compact representations of the corresponding variables. The expectation in

P_{1}

is over the randomness of the computation rates

{C_{k}}_{k = 1}^{K}

of the ENs (c.f. (8)). Constraint (24) restricts the total transmit power of the mobile user to

P_{t}

.

5. Optimization Algorithm

Since the original problem

P_{1}

is a non-convex combinatorial problem involving random variables, it turns out to be quite challenging to solve it directly. To this end, the optimization of the problem

P_{1}

is suboptimally decomposed into two steps: deriving an effective grouping strategy, and optimizing the rest of the variables under the given grouping strategy.

5.1. EN Grouping

The optimal grouping strategy can be identified by exhaustively searching through all potential grouping strategies

N

’s. However, this brute-force approach is computationally infeasible due to its prohibitively high complexity, which scales approximately as

O (G^{K})

. To this end, alternative methods can be employed to derive a sub-optimal grouping strategy with significantly lower complexity. Intuitively, if EN i and EN j exhibit sufficiently different channel conditions, clustering them together and transmitting a shared common stream will likely harm the communication rate due to the mismatch in their channel characteristics. Hence, it is essential to design an algorithm that can cluster ENs based on channel similarity to obtain a good grouping strategy. With this consideration, a K-medoids [33] based grouping algorithm is designed by jointly considering the channel alignment and the channel strength for the general multi-antenna case (It is worth noting that other heuristic clustering algorithms, such as K-means [34], can also be used for EN grouping in the proposed scheme). Besides, a more efficient continuous partitioning-based grouping algorithm is proposed for the special case of single-antenna.

Remark 2.

As collaborated by the simulation results in Section 6, both the K-medoids-based grouping algorithm and the continuous partitioning-based grouping algorithm can lead to a reasonably good grouping strategy.

For the K-medoids-based grouping algorithm, both channel strength and channel alignment are jointly considered to assess the similarity between the channel characteristics of different ENs. Accordingly, the distance of the EN i and j is defined as

d (i, j) = l |\frac{{∥ h_{i} ∥}_{2} - {∥ h_{j} ∥}_{2}}{max {{∥h_{i}∥}_{2}, {∥h_{j}∥}_{2}}}| + (1 - l) (1 - \frac{{∥h_{i}^{H} h_{j}∥}_{2}}{{∥ h_{i} ∥}_{2} {∥ h_{j} ∥}_{2}}),

(27)

where

l \in [0, 1]

is a weighting parameter; the first term and the second term account for the channel similarity in magnitude and direction, respectively.

Remark 3.

The intuition of (22) is the following: as shown in (12) and (15), decreasing the variation in channel strengths within each group positively impacts the transmission of common streams. Similarly, reducing differences in channel alignment yields comparable benefits.

With this distance metric, the standard K-medoids clustering algorithm can be applied, yielding a grouping strategy with complexity

O (K^{2} G)

[33]. The K-medoids-based grouping algorithm has significantly lower complexity compared to the naive brute-force search while achieving a reasonable performance for problem

P_{1}

. The specific process is summarized in Algorithm 2.

Algorithm 2 K-medoids-based grouping algorithm

Input:: Channel state information ${h_{1}, h_{2}, \dots, h_{K}}$ , number of groups G, distance metric $d (i, j)$ ;
Output:: Groups $\{K_{1}, K_{2}, \dots, K_{G}\}$ ;

1:: Randomly select G ENs $M = {M_{1}, M_{2}, \dots, M_{G}}$ as the initial medoids and set $M^{'} = M$ ;
2:: repeat
3:: Set $M = M^{'}$ and $K_{g} = \emptyset, \forall g \in G$ ;
4:: for $i = 1, \dots, K$ do
5:: Compute $j = arg {min}_{j \in G} d (i, M_{j})$ and Set $K_{j}$ = $K_{j} ⋃ {i}$ ;
6:: end for
7:: for $g = 1, \dots, G$ do
8:: Select a new medoid $M_{g}^{'}$ that minimizes the total intra-group distance: $M_{g}^{'} = arg {min}_{j \in K_{g}} \sum_{i \in K_{g}} d (i, j)$ ;
9:: end for
10:: until $M = M^{'}$

In the single-antenna case, besides the K-medoids-based grouping algorithm, a more efficient continuous partitioning-based grouping algorithm is further proposed. Specifically, in this case, the channel alignment term in the distance metric becomes irrelevant, focusing solely on channel strength. Without loss of generality, assume that the channel strengths of all ENs satisfy

|h_{1}| \leq |h_{2}| \leq \dots \leq |h_{K}|

. The continuous partition-based grouping algorithm defines each group g as

K_{g} = \{c_{g - 1} + 1, c_{g - 1} + 2, \dots, c_{g}\}

, where

c_{0} = 0 < c_{1} < \dots < c_{g - 1} < c_{g} = K

. By selecting different values for

\{c_{1}, c_{2}, \dots c_{g - 1}\}

from the set

\{1, 2, \dots, K - 1\}

, this algorithm can generate

(\binom{K - 1}{G - 1})

candidate strategies, denoted as

C

. Besides, through searching through all grouping strategies in

C

, the continuous partitioning-based grouping algorithm can also achieve a good performance with much lower complexity than the brute-force search approach. In particular, this algorithm can achieve the same performance as the naive brute-force search approach in a special case of problem

P_{1}

, where

D_{k}^{p} = D_{1}^{p}, \forall k \in K

. The latency optimization problem of this special case can be formulated as follows

\begin{matrix} P_{2} : min_{N, w, D, R} & T_{u} + E [T_{e}] \\ s . t . & (14), (15), (18), (22), (23), (24), (25), (26), \end{matrix}

\begin{matrix} D_{k}^{p} = D_{1}^{p}, \forall k \in K . \end{matrix}

(28)

Proposition 1.

For problem

P_{2}

, the optimal grouping strategy belongs to

C

.

Proof.

Please see Appendix A. □

5.2. Optimization Algorithm with Given Grouping

In this subsection, the optimization algorithm with a given grouping strategy

N

will be introduced. Specifically, consider the following problem

\begin{matrix} P_{3} : min_{w, D, R} & T_{u} + E [T_{e}] \\ s . t . & (14), (15), (18), (22), (23), (24) . \end{matrix}

(29)

In the objective function of

P_{3}

, due to the randomness of the computation capabilities

C_{k}

’s, it turns out to be difficult to find

E [T_{e}]

in a closed form. To this end, by (22), a lower bound of the objective function can be obtained as follows

\begin{matrix} T_{u} + E [T_{e}] \\ \geq & T_{u} + max_{g \in G} \{E [\frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η + max_{k \in K_{g}} \{\frac{D_{k}^{p}}{C_{k}}\}]\} \\ \geq & T_{u} + max_{g \in G} \{E [\frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η] + max_{k \in K_{g}} \{E [\frac{D_{k}^{p}}{C_{k}}]\}\} . \end{matrix}

(30)

Accordingly, the following optimization problem will be considered in the sequel

\begin{matrix} P_{4} : min_{w, D, R} & T_{u} + max_{g \in G} \{E [\frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η] + max_{k \in K_{g}} \{E [\frac{D_{k}^{p}}{C_{k}}]\}\} \\ s . t . & (14), (15), (18), (23), (24) . \end{matrix}

(31)

Remark 4.

As corroborated by the simulation results, using the solution obtained by optimizing this lower bound, a quite good expected processing latency can be achieved for

P_{1}

.

By introducing the auxiliary variables

t = [t_{e}, t_{e, K_{1}}, \dots, t_{e, K_{G}}]

, problem

P_{4}

can be readily transformed into the following equivalent problem

\begin{matrix} P_{5} : min_{w, D, R, t} & T_{u} + t_{e} \\ s . t . & (14), (15), (18), (23), (24), \end{matrix}

\begin{matrix} t_{e} \geq t_{e, K_{g}}, \forall g \in G, \end{matrix}

(32)

\begin{matrix} t_{e, K_{g}} \geq E [\frac{D_{K_{g}}^{c}}{\sum_{k \in K_{g}} C_{k}} + η] + E [\frac{D_{k}^{p}}{C_{k}}], \forall k \in K_{g} . \end{matrix}

(33)

In the sequel, the non-convex problem

P_{5}

will be first transformed into a DC programming problem. In particular, by introducing the auxiliary variables

ν = \{ν_{K_{g}, k}^{c}, ν_{k}^{p} | \forall g \in G, \forall k \in K_{g}\}

to

P_{5}

, constraints (14) and (15) can be equivalently written as

\{\begin{matrix} (34) & R_{K_{g}}^{c} \leq B {log}_{2} ν_{K_{g}, k}^{c}, \forall g \in G, k \in K_{g}, \\ (35) & R_{k}^{p} \leq B {log}_{2} ν_{k}^{p}, \forall k \in K, \\ (36) & ν_{K_{g}, k}^{c} \leq 1 + γ_{K_{g}, k}^{c}, \forall g \in G, \forall k \in K_{g}, \\ (37) & ν_{k}^{p} \leq 1 + γ_{k}^{p}, \forall k \in K . \end{matrix}

By invoking (12) and (13), (36) and (37) can be further transformed into the following DC form

\begin{matrix} \sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2} - \frac{\sum_{g^{'} \in G} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}{ν_{K_{g}, k}^{c}} \leq 0, \end{matrix}

\begin{matrix} \forall g \in G, k \in K_{g}, \end{matrix}

(38)

\begin{matrix} \sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K, j \neq k} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2} - \frac{\sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}}{ν_{k}^{p}} \leq 0, \end{matrix}

\begin{matrix} \forall g \in G, k \in K_{g} . \end{matrix}

(39)

Besides, it can be obtained that (18) can be rewritten into the following DC form

\begin{matrix} D_{K_{g}}^{c} + Ω (T_{u}, R_{K_{g}}^{c}) \leq 0, \forall g \in G, \end{matrix}

(40)

\begin{matrix} D_{k}^{p} + Ω (T_{u}, R_{k}^{p}) \leq 0, \forall k \in K, \end{matrix}

(41)

where

Ω (a, b) ≜ 1 / 4 \cdot {(a - b)}^{2} - 1 / 4 \cdot {(a + b)}^{2}

. Besides, it is not difficult to verified that constraint (32) is linear about

t_{e}

and

t_{e, K_{g}}

, and constraint (33) is linear about

t_{e, K_{g}}

,

D_{K_{g}}^{c}

and

D_{k}^{p}

. Consequently,

P_{5}

can be equivalently transformed into the following DC programming problem

\begin{matrix} P_{6} : min_{w, D, R, t, ν} & T_{u} + t_{e} \end{matrix}

(42)

\begin{matrix} s . t . & (23), (24), (32), (33), (34), (35), (38), (39), (40), (41) . \end{matrix}

The above DC programming problem

P_{6}

can be efficiently solved by the CCCP technique [35], which is briefly introduced in the following. Consider DC programming problems which have the form

\begin{matrix} min_{x} & f_{0} (x) - g_{0} (x) \\ s . t . & f_{a} (x) - g_{a} (x) \leq 0, a = 1, \dots, m, \end{matrix}

(43)

where x is the optimization variable and

f_{a}

and

g_{a}

are convex for

a = 0, 1, \dots, m

. The fundamental concept of CCCP is to solve a sequence of approximate convex problems iteratively. At the

i + 1

-th iteration, the approximated problem is given by the following:

\begin{matrix} min_{x} & f_{0} (x) - {\hat{g}}_{0} (x; x^{i}) \\ s . t . & f_{a} (x) - {\hat{g}}_{a} (x; x^{i}) \leq 0, a = 1, \dots, m, \end{matrix}

(44)

where

{\hat{g}}_{a} (x; x^{i}) = g_{a} (x^{i}) + \nabla g_{a} (x^{i}) (x - x^{i})

and

x^{i}

is the optimal solution of problem (44) at the i-th iteration. As the number of iterations increases, the optimal solution

x^{i}

will converge to a Karush–Kuhn–Tucker (KKT) point of the original problem [35].

For the considered problem

P_{6}

, at the

(i + 1)

-th iteration of the CCCP procedure, the concave parts in both the objective function and the constraints will be linearized at the point

(w^{i}, D^{i}, R^{i}, t^{i}, ν^{i})

, where

w^{i} = [w_{K_{1}}^{c, i}, w_{K_{2}}^{c, i}, \dots, w_{K_{G}}^{c, i}, w_{1}^{p, i}, w_{2}^{p, i}, \dots, w_{K}^{p, i}],

D^{i} = [D_{K_{1}}^{c, i}, D_{K_{2}}^{c, i}, \dots, D_{K_{G}}^{c, i}, D_{1}^{p, i}, D_{2}^{p, i}, \dots, D_{K}^{p, i}]

,

R^{i} = [R_{K_{1}}^{c, i}, R_{K_{2}}^{c, i}, \dots, R_{K_{G}}^{c, i}, R_{1}^{p, i}, R_{2}^{p, i}, \dots, R_{K}^{p, i}]

,

t^{i} = [t_{e}^{i}, t_{e, K_{1}}^{i}, \dots, t_{e, K_{G}}^{i}]

and

ν^{i} = \{ν_{K_{g}, k}^{c, i}, ν_{k}^{p, i} | \forall k \in K, \forall g \in G\}

are compact representations of the corresponding optimization variables obtained in the i-th iteration. Specifically, as the objective function (42) and the constraints (24), (32), (33), (34) and (35) are convex, they do not require the treatment of linearization. For the constraints (38) and (39), by linearizing their concave parts, their approximations for the

(i + 1)

-th iteration are, respectively, given by

\begin{matrix} \sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2} + (\sum_{g^{'} \in G} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}) \times \\ ν_{K_{g}, k}^{c} / {(ν_{K_{g}, k}^{c, i})}^{2} - 2 R [\sum_{g^{'} \in G} {(w_{K_{g^{'}}}^{c, i})}^{H} h_{k} h_{k}^{H} w_{K_{g^{'}}}^{c} + \sum_{j \in K} {(w_{j}^{p, i})}^{H} h_{k} h_{k}^{H} w_{j}^{p} + σ^{2}] / ν_{K_{g}, k}^{c, i} \leq 0, \end{matrix}

\begin{matrix} \forall g \in G, k \in K_{g}, \end{matrix}

(45)

\begin{matrix} \sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K, j \neq k} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2} + (\sum_{g^{'} \in G, g^{'} \neq g} {|h_{k}^{H} w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|h_{k}^{H} w_{j}^{p}|}^{2} + σ^{2}) \times \\ ν_{k}^{p} / {(ν_{k}^{p, i})}^{2} - 2 R [\sum_{g^{'} \in G, g^{'} \neq g} {(w_{K_{g^{'}}}^{c, i})}^{H} h_{k} h_{k}^{H} w_{K_{g^{'}}}^{c} + \sum_{j \in K} {(w_{j}^{p, i})}^{H} h_{k} h_{k}^{H} w_{j}^{p} + σ^{2}] / ν_{k}^{p, i} \leq 0, \end{matrix}

\begin{matrix} \forall g \in G, k \in K_{g} . \end{matrix}

(46)

Similarly, for the constraints (40) and (41), their approximations for the

(i + 1)

-th iteration are, respectively, given by

\begin{matrix} D_{K_{g}}^{c} + Φ (T^{u}, R_{K_{g}}^{c}; T_{u}^{i}, R_{K_{g}}^{c, i}) \leq 0, \forall g \in G, \end{matrix}

(47)

\begin{matrix} D_{k}^{p} + Φ (T^{u}, R_{k}^{p}; T_{u}^{i}, R_{k}^{p, i}) \leq 0, \forall k \in K, \end{matrix}

(48)

where

Φ^{i} (a, b; a^{i}, b^{i}) ≜ \frac{1}{4} {(a - b)}^{2} + \frac{1}{4} {(a^{i} + b^{i})}^{2} - \frac{1}{2} (a^{i} + b^{i}) (a + b)

and

T_{u}^{i}

can be computed by using (18) and

(D^{i}, R^{i})

.

Based on the above, given the optimal solution

(w^{i}, D^{i}, R^{i}, t^{i}, ν^{i})

at the i-th iteration, the approximate convex problem of

P_{6}

for the

(i + 1)

-th iteration is given by

\begin{matrix} P_{7} : min_{w, D, R, t, ν} & T_{u} + t_{e} \\ s . t . & (23), (24), (32), (33), (34), (35), (45), (46), (47), (48) . \end{matrix}

The convex problem

P_{7}

can be solved by standard methods [36]. Specifically, with the interior point method, the computational complexity for solving problem

P_{7}

is approximately

O ({((K + G) N_{t})}^{3.5})

[36]. The above procedure will be repeated until convergence as summarized in Algorithm 3. Additionally, since the problem size of

P_{7}

does not scale with the number of iterations, the computation complexity of Algorithm 3 is

O ({((K + G) N_{t})}^{3.5} I)

, where I is the total number of iterations required for the convergence.

Algorithm 3 CCCP-based optimization for problem

P_{3}

Input:: Tolerance $ϵ$ , power constraint $P_{t}$ , distributions of the computation rates of all ENs ${C_{k}}_{i = 1}^{K}$ , task size D;
Output:: Precoder $w^{*}$ , task partitioning vector $D^{*}$ , rate vector $R^{*}$ ;

1:: Set iteration index $i = 0$ ;
2:: Find an initial feasible point $(w^{0}, D^{0}, R^{0}, t^{0}, ν^{0})$ of $P_{5}$ ;
3:: repeat
4:: Using $(w^{i}, D^{i}, R^{i}, t^{i}, ν^{i})$ obtained from the last iteration, solve problem $P_{7}$ and obtain an optimal solution $(w^{i + 1}, D^{i + 1}, R^{i + 1}, t^{i + 1}, ν^{i + 1})$ ;
5:: Compute the corresponding objective function $O_{E}^{i + 1}$ ;
6:: Update iteration index $i = i + 1$ ;
7:: until $|O_{E}^{i} - O_{E}^{i - 1}| \leq ϵ$ ;
8:: return $w^{*} = w^{i}$ , $D^{*} = D^{i}$ , $R^{*} = R^{i}$ .

6. Simulation Results

In this section, simulation results are presented to collaborate on the effectiveness of the proposed scheme. In the simulations, the number K of the ENs is set to 8. The size of the original computation task is set to

1 \times 10^{6}

bits. For communication, the one-ring model [37] is adopted to characterize the channel between the mobile user and each EN, assuming that the mobile user’s antennas have a uniform linear array structure [25,38,39]. Besides, the location angle of EN k with respect to the mobile user’s antennas is set to

\frac{2 π k}{K}

and the angular spread of the scattering associated with EN k follows uniform distribution on

[\frac{π}{36}, \frac{π}{18}]

. The average channel power gain

ρ

is set to

- 85

dB. The bandwidth B is set to 1 (MHz); the noise power

σ^{2}

is set to

1 \times 10^{- 10}

(W); the maximum transmit power

P_{t}

of the mobile user is set to 1 (W). The number of antennas of the mobile user

N_{t}

is set to 6. For computation, processing each bit of the computation task is assumed to take

β = 400

CPU cycles. The maximum processor frequency

f_{e}

is set to

2.5

(GHz). The disturbed part

Δ_{k}

of the computation rate is assumed to follow Bernoulli distribution [29,40,41] with parameter

p_{k}

= 0.3; that is,

P {Δ_{k} = S} = p_{k}

and

P {Δ_{k} = 0} = 1 - p_{k}

. Here, S is the maximum value of the disturbed part and is set to

\frac{19}{20} \bar{C}

. The cost

η

for the signaling transmission of the collaboration is set to

1 \times 10^{- 2}

second. The weighting parameter l in (27) is set to 0.5. To demonstrate the advantage of the proposed scheme, three baseline schemes are considered. The first baseline is the naive (uncoded) distributed edge computing with neither information recycling nor coded computing. The second baseline is conventional coded computing with the optimal coding parameter. The third baseline is the conventional information recycling for distributed edge computing, (i.e., the proposed scheme with

G = 1

and

K_{G_{1}} = K

). Besides, the K-medoids algorithm and continuous partitioning algorithm are both demonstrated in the single-antenna case for the proposed scheme.

The task processing latency of the proposed scheme and those of the three baselines under different B with

N_{t} = 1

and

N_{t} = 6

are compared in Figure 4 and Figure 5, respectively. It can be seen that the performance gain brought by the proposed scheme is significant. For instance, when

N_{t} = 1

and the bandwidth is 1 (MHz), the proposed scheme with continuous partitioning algorithm can substantially reduce the processing latency to about

0.29

(s) and achieves about 59.8 % and 51.0 % performance gains as compared to Baseline 2 and Baseline 3. Intuitively, the reason is that the proposed scheme does not introduce extra task coding redundancy as compared to Baseline 2 and alleviates the communication limitation by transmitting multiple common streams as compared to Baseline 3. Besides, it can be observed from Figure 4 that the proposed scheme with the continuous partitioning algorithm achieves a better performance than that with the heuristic K-medoids algorithm. To better demonstrate the advantage of the proposed scheme over Baseline 3, the common streams’ power ratio

\frac{\sum_{g = 1}^{G} {|w_{K_{g}}^{c}|}^{2}}{P_{t}}

and task ratio

\frac{\sum_{g = 1}^{G} D_{K_{g}}}{D}

of the proposed scheme and Baseline 3 are compared in Figure 6. It can be observed that the proposed scheme has a higher power ratio

\frac{\sum_{g = 1}^{G} {|w_{K_{g}}^{c}|}^{2}}{P_{t}}

and task ratio

\frac{\sum_{g = 1}^{G} D_{K_{g}}}{D}

as compared to Baseline 3. For example, when the bandwidth is 0.5 (MHz), the power ratio and task ratio of the proposed scheme are 0.91 and 0.75, respectively, while the power ratio and task ratio of Baseline 3 are 0.62 and 0.23, respectively. The reason is that, by alleviating the rate limitation of the single common stream through the transmission of multiple common streams, the proposed scheme can allocate more power to enlarge the common streams, thereby increasing the task ratio of the common part(s) and promoting computation collaboration for processing latency reduction.

Besides, the task processing latency of the proposed scheme and those of the three baselines under different

ρ

with

N_{t} = 1

and

N_{t} = 6

are compared in Figure 7 and Figure 8, respectively. It can be seen that the advantages of the proposed scheme are still robust. For instance, when the average power gain

ρ

is

- 90

(dB), the proposed scheme can substantially reduce the processing latency to about

0.39

(s) and

0.30

(s), respectively, which further demonstrates the effectiveness of the proposed scheme.

Moreover, the communication phase latency and the computation phase latency under different

f_{e}

with

N_{t} = 6

are compared in Figure 9 and Figure 10, respectively. It can be seen that the proposed scheme achieves both lower communication latency and computation latency as compared to Baseline 3, further demonstrating its superiority in overcoming the communication limitation of the single common stream to facilitate communication and promote information recycling for enhanced computation collaboration. Meanwhile, it can be seen from Figure 9 that, the proposed scheme has much lower communication latency than Baseline 2. The reason is that, due to the introduction of task redundancy for straggler mitigation, Baseline 2 (i.e., coded computing) incurs additional communication overhead, resulting in relatively high communication latency.

Furthermore, the task processing latency of the proposed scheme and those of the three baselines with

K = 12

under different B and

ρ

are compared in Figure 11 and Figure 12, respectively.

It can be observed from Figure 11 and Figure 12 that, the performance gain brought by the proposed scheme scheme is still significant. For example, when the bandwidth is 0.5 (MHz), the proposed scheme can substantially reduce the processing latency to about 0.33 (s) and achieve about 28.5 % and 31.3% performance gains as compared to Baseline 2 and Baseline 3.

7. Conclusions and Further Works

In this work, a novel multi-group information recycling scheme is proposed for rate-splitting-based distributed edge computing, which can better utilize the information recycling mechanism for processing latency reduction. In particular, by clustering ENs into different groups and transmitting multiple common streams, the proposed scheme can overcome the communication limitation of the single common stream and allocate more resources to the common streams to foster information recycling for straggler mitigation. To achieve an effective EN grouping strategy, a K-medoids-based algorithm is developed by jointly considering the channel alignment and channel strength for the general multi-antenna case, and a more efficient continuous partitioning-based algorithm is proposed for the specific case of single-antenna. Besides, through bounding the corresponding latency optimization problem via DC programming, a CCCP-based algorithm is developed to find a good solution. Simulation results show that the proposed scheme can substantially outperform the baselines and greatly enhance the system’s performance.

Extension to group number optimization, advanced grouping strategies exploration, practical implementation and compatibility with existing infrastructures, imperfect channel state information problem, security and vulnerability issues related to the common streams transmission, energy consumption optimization and distributed network congestion, as well as integrating information recycling into other non-orthogonal transmission techniques for distributed edge computing are all worthwhile future works.

Author Contributions

Conceptualization, W.L. and X.H.; methodology, W.L. and X.H.; software, W.L.; validation, W.L.; formal analysis, W.L.; writing—original draft preparation, W.L. and X.H.; writing—review and editing, W.L. and X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Proposition 1

Proof.

Consider a grouping strategy

N^{'} = {K_{g}^{'}}_{g = 1}^{G} \notin C

, which admits

\exists a, b \in G, [inf K_{a}^{'}, sup K_{a}^{'}] ⋂ [inf K_{b}^{'}, sup K_{b}^{'}] \neq \emptyset .

(A1)

Without loss of generality, assume that

inf K_{b}^{'} < inf K_{a}^{'}

and the optimal point of

P_{2}

is

(N^{'}, w^{'}, D^{'}, R^{'})

. By contradiction, it can be verified that

K_{a}^{'} ⋂ [inf K_{b}^{'}, sup K_{b}^{'}] \neq \emptyset

. In the following, it will be proved that the performance of

(N^{'}, w^{'}, D^{'}, R^{'})

can be achieved by another point

(N, w, D, R)

constructed as follows

\begin{matrix} K_{g} & = K_{g}^{'}, \forall g \in G, g \neq a, b, \end{matrix}

(A2)

\begin{matrix} K_{a} & = \{K_{a}^{'} ∖ o\} ⋃ \{sup K_{b}^{'}\}, \end{matrix}

(A3)

\begin{matrix} K_{b} & = \{K_{b}^{'} ∖ sup K_{b}^{'}\} ⋃ \{o\}, \end{matrix}

(A4)

\begin{matrix} w_{K_{g}}^{c} & = w_{K_{g}^{'}}^{c,^{'}}, \forall g \in G, \end{matrix}

(A5)

\begin{matrix} w_{k}^{p} & = w_{k}^{p,^{'}}, \forall k \in K, k \neq o, sup K_{b}^{'}, \end{matrix}

(A6)

\begin{matrix} w_{o}^{p} & = w_{o}^{p,^{'}} {(\frac{\sum_{g^{'} \in G, g^{'} \neq b} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{o}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq a} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{o}|}^{2}})}^{1 / 2}, \end{matrix}

(A7)

\begin{matrix} w_{sup K_{b}^{'}}^{p} & = w_{sup K_{b}^{'}}^{p,^{'}} {(\frac{\sum_{g^{'} \in G, g^{'} \neq a} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{sup K_{b}^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq b} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{sup K_{b}^{'}}|}^{2}})}^{1 / 2}, \end{matrix}

(A8)

\begin{matrix} D & = D^{'}, \end{matrix}

(A9)

\begin{matrix} R & = R^{'}, \end{matrix}

(A10)

where

o \in K_{a}^{'} ⋂ [inf K_{b}^{'}, sup K_{b}^{'}]

. The feasibility of the point

(N, w, D, R)

is presented in the following. First, it is shown that the optimal point

(N^{'}, w^{'}, D^{'}, R^{'})

satisfies

{min}_{k \in K} R_{k}^{p,^{'}} \leq {min}_{k \in K} log (1 + γ_{k}^{p,^{'}})

. Assume that

{min}_{k \in K} R_{k}^{p,^{'}} > {min}_{k \in K} log (1 + γ_{k}^{p,^{'}})

and let

j = arg {min}_{k \in K} \{log (1 + γ_{k}^{p,^{'}})\}

. As

R_{j}^{p,^{'}} \leq log (1 + γ_{j}^{p,^{'}})

and

min {\{R_{k}^{p,^{'}}\}}_{k = 1}^{K} \leq R_{j}^{p,^{'}}

, a contradiction arises, following that

min {\{R_{k}^{p,^{'}}\}}_{k = 1}^{K} \leq min {\{log (1 + γ_{k}^{p,^{'}})\}}_{k = 1}^{K}

. Next, it is shown that the optimal point

(N^{'}, w^{'}, D^{'}, R^{'})

admits

γ_{i}^{p,^{'}} = γ_{j}^{p,^{'}}, \forall i, j \in K .

(A11)

Assume that

\exists k^{'} \in K, γ_{k^{'}}^{p,^{'}} > {min}_{k \in K} γ_{k}^{p,^{'}}

, where

γ_{k^{'}}^{p,^{'}}

admits (13). Multiplying

w_{k^{'}}^{p,^{'}}

by a factor

α

, where

0 < α < 1

, a new precoder

w^{″} =

[w_{K_{1}}^{c,^{'}}, \dots, w_{K_{G}}^{c,^{'}}, w_{1}^{p,^{'}}, \dots, α w_{k^{'}}^{p,^{'}}, \dots, w_{K}^{p,^{'}}]

can be constructed, where

w^{″}

admits that

\begin{matrix} γ_{k^{'}}^{p,^{″}} & = min_{k \in K} γ_{k}^{p,^{'}}, \\ γ_{k}^{p,^{″}} & > γ_{k}^{p,^{'}}, \forall k \in K ∖ k^{'}, \\ γ_{K_{g}, k}^{c,^{″}} & > γ_{K_{g}, k}^{c,^{'}} . \end{matrix}

(A12)

By (14) and (15), a feasible point

(N^{'}, w^{″}, D^{'}, R^{″})

can be further constructed, satisfying

\begin{matrix} R_{k}^{p,^{″}} & = min_{k \in K} log (1 + γ_{k}^{p,^{″}}) = min_{k \in K} log (1 + γ_{k}^{p,^{'}}) \geq min_{k \in K} R_{k}^{p,^{'}}, \forall k \in K, \\ R_{K_{g}^{'}}^{c,^{''}} & = R_{K_{g}^{'}}^{c,^{'}}, \forall g \in G . \end{matrix}

(A13)

By (18) and (22), it can be readily obtained that the point

(N^{'}, w^{″}, D^{'}, R^{″})

can achieve the same performance as the point

(N^{'}, w^{'}, D^{'}, R^{'})

. Further using the spare power

{|(1 - α) w_{k^{'}}^{p,^{'}}|}^{2}

, a point

(N^{'}, \frac{P_{t}}{tr (w^{'} w^{', H}) - {|(1 - α) w_{k^{'}}^{p}|}^{2}} w^{″}, D^{'}, \hat{R})

with better performance, where

\hat{R} ≻ R^{″}

, can be constructed, thus contradicting the assumption. Therefore, it follows that the point

(N^{'}, w^{'}, D^{'}, R^{'})

admits

\forall k^{'} \in K, γ_{k^{'}}^{p,^{'}} = {min}_{k \in K} γ_{k}^{p,^{'}}

(i.e.,

γ_{i}^{p,^{'}} = γ_{j}^{p,^{'}}, \forall i, j \in K

), which yields that

\begin{matrix} \frac{{|h_{o}^{H} w_{o}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq a} {|h_{o}^{H} w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K, j \neq o} {|h_{o}^{H} w_{j}^{p,^{'}}|}^{2} + σ^{2}} & = \frac{{|h_{sup K_{b}^{'}}^{H} w_{sup K_{b}^{'}}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq b} {|h_{sup K_{b}^{'}}^{H} w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K, j \neq sup K_{b}^{'}} {|h_{sup K_{b}^{'}}^{H} w_{j}^{p,^{'}}|}^{2} + σ^{2}} \\ \frac{{|h_{o}^{H} w_{o}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq a} {|h_{o}^{H} w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|h_{o}^{H} w_{j}^{p,^{'}}|}^{2} + σ^{2}} & = \frac{{|h_{sup K_{b}^{'}}^{H} w_{sup K_{b}^{'}}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq b} {|h_{sup K_{b}^{'}}^{H} w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|h_{sup K_{b}^{'}}^{H} w_{j}^{p,^{'}}|}^{2} + σ^{2}} . \end{matrix}

(A14)

Accordingly, in the special case of single-antenna, it can be computed that

\begin{matrix} {|w_{o}^{p}|}^{2} + {|w_{sup K_{b}^{'}}^{p}|}^{2} - {|w_{o}^{p,^{'}}|}^{2} - {|w_{sup K_{b}^{'}}^{p,^{'}}|}^{2} \\ = & ({|w_{K_{a}^{'}}^{c,^{'}}|}^{2} - {|w_{K_{b}^{'}}^{c,^{'}}|}^{2}) (\frac{{|w_{o}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq a} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{o}|}^{2}} - \frac{{|w_{sup K_{b}^{'}}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq b} {|w_{K_{g^{'}}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{sup K_{b}^{'}}|}^{2}}) \\ = & 0 . \end{matrix}

(A15)

By (A5), (A6), and (A15), it can be obtained that

w

is feasible for

P_{2}

. Moreover, by (13), (A7) and (A15), it can be computed that

\begin{matrix} γ_{o}^{p} & = \frac{{|w_{o}^{p}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq b} {|w_{K_{g^{'}}}^{c}|}^{2} + \sum_{j \in K} {|w_{j}^{p}|}^{2} - {|w_{o}^{p}|}^{2} + σ^{2} / {|h_{o}|}^{2}} \\ = \frac{1}{\frac{\sum_{g^{'} \in G, g^{'} \neq b} {|w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K} {|w_{j}^{p,^{'}}|}^{2} + σ^{2} / {|h_{o}|}^{2}}{{|w_{o}^{p}|}^{2}} - 1} \\ = \frac{{|h_{o}^{H} w_{o}^{p,^{'}}|}^{2}}{\sum_{g^{'} \in G, g^{'} \neq a} {|h_{o}^{H} w_{K_{g^{'}}^{'}}^{c,^{'}}|}^{2} + \sum_{j \in K, j \neq o} {|h_{o}^{H} w_{j}^{p,^{'}}|}^{2} + σ^{2}} = γ_{o}^{p,^{'}} . \end{matrix}

(A16)

Similarly, it can be derived that

γ_{sup K_{b}^{'}}^{p} = γ_{sup K_{b}^{'}}^{p,^{'}}

. By (12), (13), (A5), (A6), and (A15), it can be readily obtained that

\begin{matrix} min_{k \in K_{g}} γ_{K_{g}}^{c} = min_{k \in K_{g}^{'}} γ_{K_{g}^{'}}^{c,^{'}}, \forall g \in G, \\ γ_{k}^{p} = γ_{k}^{p,^{'}}, \forall k \in K, k \neq o, sup K_{b}^{'} . \end{matrix}

(A17)

Further, by (14) and (15), it can be obtained that the point

(N, w, D, R)

is feasible for

P_{2}

and achieves the same performance as the optimal point of the previous grouping strategy

N^{'}

. Note that

N

and

N^{'}

admits the following relationship

K_{a}^{'} ⋂ [inf K_{b}^{'}, sup K_{b}^{'}] = K_{a} ⋂ [inf K_{b}, sup K_{b}] ⋃ {o} .

(A18)

Hence, by recursively adjusting the grouping strategy as (A2)–(A4), a grouping strategy

N^{*} = {K_{g}^{*}}_{g = 1}^{G}

can achieve the same performance as

N^{'}

, where

\begin{matrix} K_{g}^{*} & = K_{g}^{'}, \forall g \in G, g \neq a, b, \\ K_{a}^{*} & ⋂ [inf K_{b}^{*}, sup K_{b}^{*}] = \emptyset . \end{matrix}

(A19)

Further adjusting the grouping strategy

N^{*}

, it can be obtained that a grouping strategy

N^{* *} = {K_{g}^{* *}}_{g = 1}^{G}

can achieve the same performance as

N^{'}

, where

\begin{matrix} K_{i}^{* *} & ⋂ [inf K_{j}^{* *}, sup K_{j}^{* *}] = \emptyset, \forall i, j \in G . \end{matrix}

(A20)

It can be readily verified that

N^{* *}

belongs to

C

, which yields the fact that

\forall N^{'} \notin C

is not the optimal grouping strategy. Thus, the proposition is proved. □

References

Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tuts. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
Wang, X.; Han, Y.; Leung, V.C.M.; Niyato, D.; Yan, X.; Chen, X. Convergence of Edge Computing and Deep Learning: A Comprehensive Survey. IEEE Commun. Surv. Tuts. 2020, 22, 869–904. [Google Scholar] [CrossRef]
Gong, X. Delay-Optimal Distributed Edge Computing in Wireless Edge Networks. In Proceedings of the IEEE INFOCOM, Toronto, ON, Canada, 6–9 July 2020. [Google Scholar]
Chen, M.; Gunduz, D.; Huang, K.; Saad, W.; Bennis, M.; Feljan, A.V.; Poor, H.V. Distributed Learning in Wireless Networks: Recent Progress and Future Challenges. IEEE J. Sel. Areas Commun. 2021, 39, 3579–3605. [Google Scholar] [CrossRef]
Nguyen, C.T.; Nguyen, D.N.; Hoang, D.T.; Phan, K.T.; Niyato, D.; Pham, H.A.; Dutkiewicz, E. Elastic Resource Allocation for Coded Distributed Computing over Heterogeneous Wireless Edge Networks. IEEE Trans. Wirel. Commun. 2023, 22, 2636–2649. [Google Scholar] [CrossRef]
Dutta, S.; Jeong, H.; Yang, Y.; Cadambe, V.; Low, T.M.; Grover, P. Addressing unreliability in emerging devices and non-von Neumann architectures using coded computing. Proc. IEEE 2020, 108, 1219–1234. [Google Scholar] [CrossRef]
Zhu, J.; Li, S. Generalized Lagrange Coded Computing: A Flexible Computation-Communication Tradeoff. In Proceedings of the IEEE ISIT, Espoo, Finland, 26 June–1 July 2022. [Google Scholar]
Ng, J.S.; Lim, W.Y.B.; Luong, N.C.; Xiong, Z.; Asheralieva, A.; Niyato, D.; Leung, C.; Miao, C. A Comprehensive Survey on Coded Distributed Computing: Fundamentals, Challenges, and Networking Applications. IEEE Commun. Surv. Tuts. 2021, 23, 1800–1837. [Google Scholar] [CrossRef]
Lee, K.; Lam, M.; Pedarsani, R.; Papailiopoulos, D.; Ramchandran, K. Speeding Up Distributed Machine Learning Using Codes. IEEE Trans. Inf. Theory 2018, 64, 1514–1529. [Google Scholar] [CrossRef]
Shah, N.B.; Lee, K.; Ramchandran, K. When Do Redundant Requests Reduce Latency? IEEE Trans. Commun. 2016, 64, 715–722. [Google Scholar] [CrossRef]
Wang, D.; Joshi, G.; Wornell, G.W. Efficient Straggler Replication in Large-Scale Parallel Computing. ACM Trans. Model. Perform. Eval. Comput. Syst. 2019, 4, 1–23. [Google Scholar] [CrossRef]
Behrouzi-Far, A.; Soljanin, E. Efficient Replication for Fast and Predictable Performance in Distributed Computing. IEEE/ACM Trans. Netw. 2021, 29, 1467–1476. [Google Scholar] [CrossRef]
Ramamoorthy, A.; Das, A.B.; Tang, L. Straggler-Resistant Distributed Matrix Computation via Coding Theory: Removing a Bottleneck in Large-Scale Data Processing. IEEE Signal Process. Mag. 2020, 37, 136–145. [Google Scholar] [CrossRef]
Blumofe, R.D.; Leiserson, C.E. Scheduling multithreaded computations by work stealing. J. ACM 1999, 46, 720–748. [Google Scholar] [CrossRef]
Harlap, A.; Cui, H.; Dai, W.; Wei, J.; Ganger, G.R.; Gibbons, P.B.; Gibson, G.A.; Xing, E.P. Addressing the straggler problem for iterative convergent parallel ML. In Proceedings of the ACM Symposium on Cloud Computing, Santa Clara, CA, USA, 5–7 October 2016. [Google Scholar]
Liang, W.; Li, T.; He, X. Information Recycling Assisted Collaborative Edge Computing for Distributed Learning. In Proceedings of the IEEE INFOCOM WKSHPS, Hoboken, NJ, USA, 17–20 May 2023. [Google Scholar]
Clerckx, B.; Mao, Y.; Jorswieck, E.A.; Yuan, J.; Love, D.J.; Erkip, E.; Niyato, D. A Primer on Rate-Splitting Multiple Access: Tutorial, Myths, and Frequently Asked Questions. IEEE J. Sel. Areas Commun. 2023, 41, 1265–1308. [Google Scholar] [CrossRef]
Mao, Y.; Dizdar, O.; Clerckx, B.; Schober, R.; Popovski, P.; Poor, H.V. Rate-Splitting Multiple Access: Fundamentals, Survey, and Future Research Trends. IEEE Commun. Surv. Tuts. 2022, 24, 2073–2126. [Google Scholar] [CrossRef]
Dai, L.; Wang, B.; Ding, Z.; Wang, Z.; Chen, S.; Hanzo, L. A Survey of Non-Orthogonal Multiple Access for 5G. IEEE Commun. Surv. Tuts. 2018, 20, 2294–2323. [Google Scholar] [CrossRef]
Park, S.H.; Lee, H. Completion Time Minimization of Fog-RAN-Assisted Federated Learning with Rate-Splitting Transmission. IEEE Trans. Veh. Technol. 2022, 71, 10209–10214. [Google Scholar] [CrossRef]
Nguyen, T.H.; Park, H.; Kim, M.; Park, L. DRL-Enabled RSMA-Assisted Task Offloading in Multi-Server Edge Computing. In Proceedings of the ICOIN, Ho Chi Minh City, Vietnam, 17–19 January 2024. [Google Scholar]
Zhu, B.; Chi, K.; Liu, J.; Yu, K.; Mumtaz, S. Efficient Offloading for Minimizing Task Computation Delay of NOMA-Based Multiaccess Edge Computing. IEEE Trans. Commun. 2022, 70, 3186–3203. [Google Scholar] [CrossRef]
Ding, Z.; Fan, P.; Poor, H.V. Impact of Non-Orthogonal Multiple Access on the Offloading of Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2019, 67, 375–390. [Google Scholar] [CrossRef]
Diamanti, M.; Pelekis, C.; Tsiropoulou, E.E.; Papavassiliou, S. Delay Minimization for Rate-Splitting Multiple Access-Based Multi-Server MEC Offloading. IEEE/ACM Trans. Netw. 2024, 32, 1035–1047. [Google Scholar] [CrossRef]
Dai, M.; Clerckx, B.; Gesbert, D.; Caire, G. A Rate Splitting Strategy for Massive MIMO with Imperfect CSIT. IEEE Trans. Wirel. Commun. 2016, 15, 4611–4624. [Google Scholar] [CrossRef]
Wang, Y.; Wong, V.W.S.; Wang, J. Flexible Rate-Splitting Multiple Access with Finite Blocklength. IEEE J. Sel. Areas Commun. 2023, 41, 1398–1412. [Google Scholar] [CrossRef]
Ko, D.; Chae, S.H.; Choi, W. MDS Coded Task Offloading in Stochastic Wireless Edge Computing Networks. IEEE Trans. Wirel. Commun. 2022, 21, 2107–2121. [Google Scholar] [CrossRef]
Yu, Q.; Maddah-Ali, M.A.; Avestimehr, A.S. Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding. IEEE Trans. Inf. Theory 2020, 66, 1920–1933. [Google Scholar] [CrossRef]
Peng, P.; Soljanin, E.; Whiting, P. Diversity/Parallelism Trade-Off in Distributed Systems with Redundancy. IEEE Trans. Inf. Theory 2022, 68, 1279–1295. [Google Scholar] [CrossRef]
Bekkerman, R.; Bilenko, M.; Langford, J. Scaling up Machine Learning: Parallel and Distributed Approaches; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Ding, Z.; Xu, J.; Dobre, O.A.; Poor, H.V. Joint Power and Time Allocation for NOMA–MEC Offloading. IEEE Trans. Veh. Technol. 2019, 68, 6207–6211. [Google Scholar] [CrossRef]
Kiani, A.; Ansari, N. Edge Computing Aware NOMA for 5G Networks. IEEE Internet Things J. 2018, 5, 1299–1306. [Google Scholar] [CrossRef]
Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Sun, Y.; Babu, P.; Palomar, D.P. Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning. IEEE Trans. Signal Process. 2017, 65, 794–816. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Goldsmith, A. Wireless Communications; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Li, Z.; Ye, C.; Cui, Y.; Yang, S.; Shamai, S. Rate Splitting for Multi-Antenna Downlink: Precoder Design and Practical Implementation. IEEE J. Sel. Areas Commun. 2020, 38, 1910–1924. [Google Scholar] [CrossRef]
Sadeghabadi, E.; Blostein, S. Low Complexity Rate Splitting Using Hierarchical User Grouping. In Proceedings of the IEEE ICC WKSHPS, Montreal, QC, Canada, 14–23 June 2021. [Google Scholar]
Behrouzi-Far, A.; Soljanin, E. Redundancy Scheduling in Systems with Bi-Modal Job Service Time Distributions. In Proceedings of the Allerton, Monticello, IL, USA, 24–27 September 2019. [Google Scholar]
Tandon, R.; Lei, Q.; Dimakis, A.G.; Karampatziakis, N. Gradient Coding: Avoiding Stragglers in Distributed Learning. In Proceedings of the ICML, Sydney, NSW, Australia, 6–11 August 2017. [Google Scholar]

Figure 1. Comparison between the conventional distributed computing scheme and the conventional information recycling scheme.

Figure 2. An illustrative example of the collaborative computing among ENs under information recycling. Suppose that the task queues of the three ENs are initialized as

{W_{1, k}^{c}}_{k = 1}^{4}

,

{W_{2, k}^{c}}_{k = 1}^{8}

, and

{W_{3, k}^{c}}_{k = 1}^{6}

, respectively. (a) After it finished the mini-task

W_{1, 2}^{c}

, EN 1 sends the finishing signal

S_{1, 2}^{fin}

to the mobile user to indicate the completion. Meanwhile, the mobile user has already received three finishing signals

{S_{1, 1}^{fin}, S_{2, 1}^{fin}, S_{3, 1}^{fin}}

from the three ENs. (b) EN 1 finishes its last assigned task

W_{1, 4}^{c}

and sends a finishing signal

S_{1, 4}^{fin}

to the mobile user. (c) Upon receiving the finishing signal

S_{1, 4}^{fin}

, the mobile user is notified that EN 1 has completed all its assigned mini-tasks. As EN 2 has the a recycling signal

S_{2, 8}^{rec}

and a canceling signal

S_{2, 8}^{can}

to EN 1 and EN 2, respectively. (d) EN 1 receives the recycling signal

S_{2, 8}^{rec}

and starts the computation of

W_{2, 8}^{c}

. EN 2 receives the canceling signal

S_{2, 8}^{can}

and removes the mini-task

W_{2, 8}^{c}

from its task queue.

Figure 2. An illustrative example of the collaborative computing among ENs under information recycling. Suppose that the task queues of the three ENs are initialized as

{W_{1, k}^{c}}_{k = 1}^{4}

,

{W_{2, k}^{c}}_{k = 1}^{8}

, and

{W_{3, k}^{c}}_{k = 1}^{6}

, respectively. (a) After it finished the mini-task

W_{1, 2}^{c}

, EN 1 sends the finishing signal

S_{1, 2}^{fin}

to the mobile user to indicate the completion. Meanwhile, the mobile user has already received three finishing signals

{S_{1, 1}^{fin}, S_{2, 1}^{fin}, S_{3, 1}^{fin}}

from the three ENs. (b) EN 1 finishes its last assigned task

W_{1, 4}^{c}

and sends a finishing signal

S_{1, 4}^{fin}

to the mobile user. (c) Upon receiving the finishing signal

S_{1, 4}^{fin}

, the mobile user is notified that EN 1 has completed all its assigned mini-tasks. As EN 2 has the a recycling signal

S_{2, 8}^{rec}

and a canceling signal

S_{2, 8}^{can}

to EN 1 and EN 2, respectively. (d) EN 1 receives the recycling signal

S_{2, 8}^{rec}

and starts the computation of

W_{2, 8}^{c}

. EN 2 receives the canceling signal

S_{2, 8}^{can}

and removes the mini-task

W_{2, 8}^{c}

from its task queue.

Figure 3. An illustration of the proposed multi-group information recycling scheme.

Figure 4. Comparison of processing latency under different B (

N_{t} = 1

).

Figure 4. Comparison of processing latency under different B (

N_{t} = 1

).

Figure 5. Comparison of the processing latency under different B (

N_{t} = 6

).

Figure 5. Comparison of the processing latency under different B (

N_{t} = 6

).

Figure 6. Comparison of the power ratio

\frac{\sum_{g = 1}^{G} {|w_{K_{g}}^{c}|}^{2}}{P_{t}}

and task ratio

\frac{\sum_{g = 1}^{G} D_{K_{g}}}{D}

under different B (

N_{t}

= 6).

Figure 6. Comparison of the power ratio

\frac{\sum_{g = 1}^{G} {|w_{K_{g}}^{c}|}^{2}}{P_{t}}

and task ratio

\frac{\sum_{g = 1}^{G} D_{K_{g}}}{D}

under different B (

N_{t}

= 6).

Figure 7. Comparison of processing latency under different

ρ

(

N_{t} = 1

).

Figure 7. Comparison of processing latency under different

ρ

(

N_{t} = 1

).

Figure 8. Comparison of the processing latency under different

ρ

(

N_{t} = 6

).

Figure 8. Comparison of the processing latency under different

ρ

(

N_{t} = 6

).

Figure 9. Comparison of communication phase latency under different

f_{e}

(

N_{t} = 6

).

Figure 9. Comparison of communication phase latency under different

f_{e}

(

N_{t} = 6

).

Figure 10. Comparison of computation phase latency under different

f_{e}

(

N_{t} = 6

).

Figure 10. Comparison of computation phase latency under different

f_{e}

(

N_{t} = 6

).

Figure 11. Comparison of processing latency under different B (

K = 12

).

Figure 11. Comparison of processing latency under different B (

K = 12

).

Figure 12. Comparison of processing latency under different

ρ

(

K = 12

).

Figure 12. Comparison of processing latency under different

ρ

(

K = 12

).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, W.; He, X. Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling. Electronics 2024, 13, 4403. https://doi.org/10.3390/electronics13224403

AMA Style

Liang W, He X. Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling. Electronics. 2024; 13(22):4403. https://doi.org/10.3390/electronics13224403

Chicago/Turabian Style

Liang, Wanlin, and Xiaofan He. 2024. "Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling" Electronics 13, no. 22: 4403. https://doi.org/10.3390/electronics13224403

APA Style

Liang, W., & He, X. (2024). Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling. Electronics, 13(22), 4403. https://doi.org/10.3390/electronics13224403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Rate-Splitting-Based Distributed Edge Computing via Multi-Group Information Recycling

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. System Model

3.1.1. Communication Phase

3.1.2. Computation Phase

3.2. Conventional Information Recycling

4. Proposed Multi-Group Information Recycling

4.1. Multi-Group Information Recycling Scheme

4.2. Problem Formulation

5. Optimization Algorithm

5.1. EN Grouping

5.2. Optimization Algorithm with Given Grouping

6. Simulation Results

7. Conclusions and Further Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Proposition 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI