1. Introduction
With the recent progress in requirement definitions and developing potential technologies of the sixth generation (6G) wireless communication, the capacity of the 6G network will increase by nearly 1000 times compared with the fifth generation (5G) to support the ubiquitous Internet of Things (IoT) devices [
1,
2,
3]. Full-duplex (FD) technology (i.e., in-band co-time co-frequency) can double the spectral efficiency (SE) at most compared with the traditional time division duplex (TDD) or frequency division duplex (FDD). Thus, it is regarded as one of the Beyond 5G/6G candidate technologies [
1,
4,
5,
6]. On the other hand, the reconfigurable intelligent surface (RIS) has reflection elements with programmable super-atomic structure and ultra-low power consumption, which can manipulate the signals’ reflection and scattering scenes to improve coverage and quality of service (QoS). Moreover, it can reduce energy consumption compared with conventional relays [
7]. In view of the revolutionary technology that endows network entities with reconfigurable properties, RIS has become a promising technology for 6G networks [
8,
9] and could be extensively applied in the coverage tribulations circumstances [
10,
11]. Overall, the united technology of FD and RIS can theoretically obtain two-fold performance gains, including time-frequency domain multiplexing and signal enhancement. So, RIS-aided FD technology is expected to be applied in a wide range of IoT devices in smart cities that require superior performances in terms of high data transmission rate and extensive coverage [
3,
12].
It is known that FD technology is limited by strong self-interference (SI). Although the existing SI cancellation technology (i.e., active cancellation and passive cancellation) [
5] can eliminate the SI to approximate the noise floor [
13], residual SI still exists due to hardware impairment [
14]. With the RIS introduced, the residual SI can be mitigated to a degree. However, the FD system additionally affiliates with the loop-interference (LI) caused by the transmit signal rebounded via RIS unexpectedly. Particularly when IoT devices are located in complicated urban outdoor environments, including severe attenuation, reflections, and blockages [
15], these two types of interference tend to occur frequently, which will not give full play to the potential performance of the RIS-aided FD systems.
Scholars often model an SE or energy efficiency (EE) optimization to overcome interference-related issues. To tackle the residual SI in the FD system, the scholars of [
16] constructed a maximum SE problem concerning sub-carrier and power allocation. They transformed the non-convex problem into a convex problem through first-order Taylor approximation. The authors of [
17,
18] mainly studied the allocation of FD antennas and the design of beamforming strategy to construct sum rate maximization problems. Specifically, in [
17], the maximum ratio transmission and genetic algorithms were used to solve the beamforming and antenna selection concerned subproblems, respectively. The authors in [
18] devised a block coordinate descent method to solve the multi-variable optimization problem alternately. Refs. [
16,
17,
18] proved that the optimized FD system with residual SI was still superior to the half-duplex (HD) system, such as TDD, in improving the performance of SE. Meanwhile, RIS is a disruptive technology that is greatly concerned by many scholars. Therefore, the scholars of [
19] introduced RIS to formulate a target optimization problem with respect to channel cascade. They obtained the global EE optimum by alternating optimization of fixed variables. Other scholars applied RIS to FD/HD relay scenarios [
20]. They established a problem of minimizing the required power of the base station (BS) and relay to enhance EE, where the QoS constraints were also covered. To solve the two scenarios involved problems, they proposed the semi-definite programming and the maximum weakest hop-signal–noise-ratio methods. Simulation results showed that RIS joined with the FD relay was superior to other cases. Based on the previous inferences, the authors of [
21,
22,
23,
24,
25,
26] considered the union of RIS and FD technology and demonstrated the performance gains. To name a few, the authors of [
21] proposed an SI mitigation method in RIS-assisted FD system, thus mitigating the strong SI to feed in the analog-to-digital converters of finite bit resolutions. The authors of [
22] formulated the mathematical expressions of outage probability and ergodic capacity. They concluded that RIS could indirectly diminish the adverse consequence of the residual SI and ameliorate the performance in the FD system. Other scholars designed the minimized transmit power objective with active and passive beamforming to strengthen EE by suppressing interference [
23]. The work of [
24] discussed the influence of different numbers of reflection elements and receiving antennas on FD system performance.
Deep reinforcement learning (DRL) can allow a wireless communication system agent to seek the optimal policy by observing the reward without a priori knowledge. Accordingly, the mathematically intractable problems could be settled by the agent interacting with the environment [
27,
28,
29]. The authors established the objective by considering SE and energy harvest in the FD system [
30]. Then, they devised a hybrid deep deterministic policy gradient and deep double Q-learning network approach to train the networks regarding different variables, respectively. Based on DRL, the authors of [
31] proposed a maximizing entropy scheme to solve active and passive beamforming. In [
32], neural epsilon-greedy, deep Q-learning network, upper confidence bound, and other DRL approaches were considered. They took the sum rate of the RIS system as the objective and proved that the trained artificial neural network (NN) could improve performance compared with some traditional non-convex algorithms on certain occasions. The authors of [
25] employed a DRL method to work out the single and distributed RIS-related issues. They demonstrated that DRL could reduce the optimum performance loss without pre-relaxation.
Although several previous studies have focused on the performance enhancement of FD RIS-aided systems, they did not discuss the influence of LI with different user wireless environments and RIS locations. More importantly, to the best of our knowledge, there are no two-step algorithms combined with a closed-form solution and a learning method in multi-user RIS-aided FD systems in the previous works. Furthermore, the discrete phase shifts model of RIS is consistent with the RIS physical realization, which is more practical than the continuous phase shifts model. Motivated by this, we have studied the RIS-aided FD system with discrete phase shifts, considering the effects of residual SI, LI, and RIS location in different urban outdoor scenarios. Then, we devise a two-step solution to the formulated non-convex problem. The main contributions are as follows:
We introduce a discrete phase shifts RIS model in the blockage of the line-of-sight (LoS) outdoor environment of smart cities. The objective and constraints concerning the residual SI and LI are formulated according to two typical scenarios. Under the two scenarios, we can focus on the influence of the primary interference. Next, a two-step algorithm based on the variable types is proposed, and we can further emphasize the proposal advantage through the scenario handoff.
Specifically, the original optimization problem is decomposed into two subproblems in light of the type of variables. We attempt to optimize the transmitting and receive beamforming matrices through fixed phase shifts in subproblem one. Due to the non-convexity of this problem, we design a novel successive convex approximation (SCA) method to obtain the approximate convex lower bound of the objective function and constraints. Therefore, the original non-convex problem is transformed into a convex one that can be directly solved.
Then, we tackle subproblem two to optimize the phase shifts vector via given beamforming matrices. In view of the non-convex optimization for the discrete variable to be solved, we develop a discrete soft actor–critic (SAC) algorithm based on DRL. We seek to maximize the reward to obtain the optimal sum SE by defining the corresponding action, state, and reward. Remarkably, our devised DRL-based method only involves discrete phase shifts, dramatically reducing the dimensions of the action space. Additionally, the state of the environment is a vector consisting of signal to interference-plus-noise ratio (SINR) of each IoT device, which can be efficiently applied to the multi-user case.
Finally, after iteratively optimizing the beamforming matrices and phase shifts vector, we evaluate the performance of the proposed algorithm from various perspectives and draw relevant conclusions. To be specific, the extensive simulation results show that the low-complexity proposal performance is second only to the exhaustive search method and outweighs the fixed-point iteration baseline. Particularly, the proposed algorithm performs outstandingly in scenario two, demonstrating the superiority of our proposal to mitigate interference in the complicated urban outdoor environment.
The rest of this paper is organized as follows. The system model and problem formulation are described in
Section 2.
Section 3 presents our proposal that concerns a joint beamforming and phase shifts design. The computational complexity is also provided.
Section 4 discusses numerical results to evaluate the performance of the proposed algorithm. A conclusion and future issues are given in
Section 5.
Notations: For a general matrix A, AH, and denote the Hermitian and Frobenius norm of A. The subscript/superscript t, r, d, and u indicate transmitting, receiving, downlink (DL), and uplink (UL) related. represents the expectation. We signify the conjugate and real part of a complex number by and , respectively. For a general vector x, diag{x} denotes a diagonal matrix with diagonal elements x. The modulus of x is denoted by . represents the space of complex number matrix. 1 and 0 denote a vector or matrix where all elements are one and zero. An identity matrix is represented by the letter I. denotes a complex Gaussian distribution. Other calligraphy upper-case letters, such as and , stand for the sets. The letter of superscript apostrophes, such as , indicates the element of the supplementary set of {}. The partial derivative of to x is signified by .
The acronyms used in this paper are given in
Table 1.
3. Solution to SE Maximization Problem in RIS-Aided FD System
Considering that the variables involved in can be classified into continuity (such as W, U with continuous weights) and discreteness (such as with discrete phase shifts), it motivates us to adopt different schemes to solve problems with respect to different variable characteristics. Consequently, we decompose into two subproblems to simplify and find the two-step solution through iteration until convergence. The rest of this section describes the optimization of subproblems one and two in developing beamforming and phase shifts, respectively.
3.1. Beamforming Design
Given the phase shifts vector
, the
in objective function is a constant, and unit-modulus constraint (13d) does not need to be considered. Thus, the design of beamforming matrices can be transformed into the following expression.
(13b), (13c), where
Obviously, the DL SE depends on the transmitting beamforming matrix. However, the UL SE is up to the receive and the residual SI-related transmitting beamforming, which shows the high coupling relationship between W and U. Therefore, makes its solution challenging due to the non-convex objective function (14a) and constraints (14b), (14c).
In view of the two coupled variables, the core idea is to combine alternating optimization with SCA to transform the non-convexity into convexity. Motivated by [
39], we can approximate the lower bound by inequality transformation to acquire the convexity as follows.
where
,
.
First, we introduce an auxiliary variable set
to help find the lower bounds of the SINR of the DL IoT devices, which satisfies
where
indicates the lower bound of the intended signal of IoT
k.
Given the feasible point
at the
n-th iteration, we can acquire the boundary of
by (16b) and (17a).
From (18), we can easily obtain that the slack variable is convex with respect to .
Similarly, with the feasible point
,
is lower bounded by (16a).
where
and
is the relaxed SINR of IoT
k.
It is obviously known that the right-hand side of (19) is convex about due to its quadratic form. Substituting (18) into (19), we can acquire the convexity of (19) in relation to .
Next, we will relax the expressions about the SINR of UL IoT devices. Similar to (17a), we bring in auxiliary variable sets
,
,
,
to approximate, which follow the supplementary constraints.
where
in (21a) acts on the slack of the BS power budget. Together,
and
in (21b) restrict the modulus of the receive beamforming vector for IoT
j and react on the residual SI.
in (21d) and
in (21c) determine the boundaries of the desired received signal and the interference caused by IoT
j’, respectively.
Although (21a) is a convex second-order cone (SOC) constraint, the constraints (21b)–(21d) are non-convex. Like (18), given the feasible point
,
,
and
, the (21b)–(21d) are transformed as
Obviously, we achieve the linear constraints. Substituting (21a), (22a)–(22c) into (9), we obtain the lower bound of
.
where
is obtained by
Thus, we can acquire the slacked
with the additional feasible point
similar to (19) as follows
where
and
is the approximate SINR of IoT
j.
The right-hand side of (25) is convex with respect to and , respectively. Since and are convex about (see (22c) and (26)), we can obtain the convexity of (25) relating to .
Above all, we achieve the convex lower bound of (14a) via the SCA method, which is reformulated as
by (19) and (25).
We now turn our attention to the non-convex constraints (14b), (14c).
Similar to (18), constraint (14b) can be rewritten as
Meanwhile, by applying (21a), (22a)–(22c), constraint (14c) can be expressed as
Therefore, we have obtained the linear constraint of (28) and SOC constraint of (29) to approximate (14b) and (14c), respectively.
Finally, the problem
is reconstructed as
(13b), (13c), (28), (29).
The problem is a convex SCA that can be optimally solved via a convex optimization tool such as CVX.
The proposed SCA algorithm for solving subproblem one is summarized in Algorithm 1.
Algorithm 1: Proposed SCA Algorithm for Problem |
- 1
- 2
Repeat: - 3
- 4
- 5
- 6
Until: The value of sum SE converges. - 7
|
3.2. RIS Phase Shifts Design
We now focus on the optimization of
when the beamforming matrices are fixed. The problem
is simplified as
(13b), (13d).
The variable in the non-convex objective function (31a) with fractional structure is associated with DL and UL IoT devices. Moreover, has an additional unit-modulus constraint. Especially in scenario two, is also relevant to (see (15)). All the mentioned properties determine that is more challenging compared with .
Since the DRL method is implemented without slackness, it can reduce the optimal performance loss and computational complexity. Based on a model-free way, DRL can be directly used to solve tough mathematical problems. Therefore, scholars resort to the DRL method as a powerful tool to tackle wireless network issues [
40,
41]. In addition, it works with non-labeled data sets that merit high storage efficiency [
42].
SAC is a model-free DRL algorithm based on maximum entropy, which avoids local optimization via entropy regularization. Meanwhile, it applies two independent
Q networks and an off-policy scheme to promote stability and learning efficiency [
43], which is also suitable for discrete action space [
44].
3.2.1. Markov Decision Process
Considering that the Markov decision process (MDP) is the theoretical cornerstone of reinforcement learning (RL) [
45], we assume that the RIS controller exercises an agent role for pursuing the maximum reward through sequential decision making, thus maximizing the sum SE. Hence, we map the RIS-aided FD system with the key elements of MDP as follows.
Action: A phase shifts vector is treated as a one-dimensional discrete action, such as an action at time step
t defined as
We normalize the actions via the policy network to meet the constraint (13d). It is worthwhile mentioning that the action is only relevant to the phase shifts with
L elements, which improves the learning efficiency in the multi-user scenario through a reduced action space.
State: The state vector includes the SINR of each IoT device and is represented as
at time step
t.
can be acquired via the interaction between the agent RIS controller and the environment based on the given phase shifts and state of the previous time step. Given that the characteristic dimension of the state can reduce the RL performance [
46], the state vector should be whitened each time after the SINR is observed. Since the state is associated with each IoT device’s wireless communication condition, it can efficiently cover the multi-user scenario. In general, the phase shifts vector to be performed at the current step only depends on the real-time conditions of each IoT device without regard to the previous action, which simplifies the interaction.
Reward: Considering that the state is only dependent on each individual instead of the entirety, we take the reward as a guide of the global policy to the RIS controller. With the aim of the maximum sum SE, we choose a modified objective as a reward.
where
and
indicate the return of SE at time step
t for IoT
k and IoT
j, respectively. Notably, we bring in the penalty to inspire the agent to find a policy that satisfies the QoS constraints (31b), (31c).
In summary, when the RIS controller chooses a series of phase shifts with a certain probability, the SINR of each IoT device will be updated. Thus, the next state is transitioned with the corresponding reward acquired. The process of the RIS controller interacting with the defined wireless environment model (i.e., the urban outdoor scenario one or two) can be deemed as an MDP, visualized in
Figure 2. It is noteworthy that Scenario One and Scenario Two in
Figure 2 only differ in wireless environments where the IoT devices are located. This distinction will not influence the workflow of the model-free DRL method due to the interaction mechanism between the agent and the environment.
The RIS controller decision making would introduce delay in practice, which takes seconds due to the computational power of the RIS controller and the magnitude of changes in the environment. Nevertheless, if the RIS controller has been saturated from learning, it will immediately make an optimal decision at the general channel condition changes. More importantly, the RIS controller can also rapidly cope with large environmental transformations over time. This is because the RIS controller would learn more about potential fluctuations of the environment in the long term, which endows the RIS controller with the capability to tackle extreme cases easily. Overall, the DRL-based method can better reflect advantage in the long run.
3.2.2. Mechanism of Soft Actor–Critic Learning
The SAC architecture contains actor and critic networks for action selection and evaluation. Specifically, a random policy network constitutes the actor network. In addition, the critic network consists of online and target subnetworks, which include two online and two target
Q networks, respectively. The online and target
Q networks have the same structure but differ in the update method and frequency. The critic network copes with the provided action from the actor network and selects a minimum
Q value from the two calculated soft
Q values, thus avoiding overfitting [
47]. Our goal is to train the above SAC network to acquire the ability to output the best phase shifts vector over extended interactions with the environment. The training of the SAC network, including implementation and learning processes, is described in detail.
Implementation: and are derived correspondingly by inputting at the current state . Then, the acquired transition tuple is stored in a replay buffer that gradually enriches with multiple interactions. Notably, for the sake of the agent to explore comprehensively, the replay buffer can be stuffed off-policy.
Learning: In each time step, a mini-batch containing several transition tuples is randomly sampled from the replay buffer. Then, the learning process in a mini-batch is as follows.
In particular, unlike the conventional DRL-based method, the soft-state value function in the SAC algorithm introducing a relative entropy is defined as
where
is the network parameter relating to the
i-th online
Q network.
represents the policy with probability
, calculated through the policy network, in the action space
.
and
are the parameters with respect to the actor network and temperature.
also determines the importance of entropy relative to the
Q value. In addition, the
Q value is an action-state value function written as
where
is the discount factor used to calculate the cumulative returns.
indicates the probability of executing a specific action. Accordingly, the state is transitioning from
to
. Above all, the SAC algorithm intends to explore as many varied actions as possible to maximize the target entropy based on a given
. This process is also accompanied sequentially by loss calculation and parameter updating for the three modules below.
First, the online
Q network is trained by minimizing the Bellman residual as follows.
where
and
represent the
Q network’s learning rate and loss function.
is the network parameter concerning the
i-th target
Q network.
D denotes the replay buffer.
Similar to the critic network, the actor network is trained subsequently.
where
denotes the actor network’s learning rate.
signifies the loss function of the policy network.
Meanwhile,
is also trained by updating the loss function to avoid being set as a hyper-parameter and to reduce the estimation error, written as
where
represents the temperature’s learning rate.
indicates the loss of temperature.
is a constant vector equivalent to the hyper-parameter of target entropy.
Finally, the two target
Q network parameters are updated as
where
is the soft update factor.
The stage of (37a)–(40) implies the agent learning in one time step. After the ET time steps of E episodes, the agent’s performance will saturate, and the optimized phase shifts vector has been acquired.
Figure 3 shows the parameter updating process for actor and critic networks, and the proposed SAC algorithm is summarized in Algorithm 2.
Algorithm 2: Proposed SAC Algorithm for Problem |
- 1
- 2
- 3
Repeat: - 4
- 5
- 6
Repeat: - 7
by (32)–(34).
- 8
in the replay buffer D. - 9
Randomly sample min-batch transition tuples with batch size N from D. - 10
of the online Q networks by (37a), (37b). - 11
Update the parameter of the actor Q networks by (38a), (38b). - 12
Update the temperature parameter by (39a), (39b). - 13
of the target Q networks by (40). - 14
- 15
- 16
- 17
- 18
|
3.2.3. Proposed Deep Neural Network Design
Each NN contains an input, an output, and two hidden layers. The hidden layer of both the actor and critic networks includes () neurons. The input and output layers of the actor network have and L neurons—the same number as state and action sizes, respectively. Since the input layer of the critic network additionally concatenates the action that the actor network selects, it includes neurons. After getting the action and state, the critic network evaluates the action and gives a corresponding Q value as an assessment result, thus occupying one neuron at its output layer. The ReLU activation functions are adopted after the hidden layers of actor and critic networks. Moreover, the output layer of the critic network applies the Linear activation function. In contrast, the output layer of the actor network introduces a Softmax activation function to ensure that the discrete actions are distributed with effective probabilities. All networks use an Adam optimizer for parameter updating.
3.3. Algorithm Development and Computational Complexity
The proposed two-step algorithm is presented in Algorithm 3 by merging the two solutions for
and
. In particular,
is a convex problem that ensures the convergence in each iterative sub-solution. Meanwhile, the convergence of
is also guaranteed by tuning the hyper-parameters. Since each sub-solution of
and
outputs the optimal value, the SE value after each iteration
m satisfies
. Moreover, the objective SE is upper-bounded depending on the transmit power constraint and the interference level, so the convergence of Algorithm 3 can be acquired.
Algorithm 3: Proposed Two-step Algorithm for Problem |
- 1
- 2
Repeat: - 3
by Algorithm 1. - 4
- 5
by Algorithm 2. - 6
- 7
- 8
Until: The value of sum SE converges. - 9
|
Since
only contains linear and SOC constraints, the computational complexity of Algorithm 1 is relatively low. The polynomial time complexity considers
[
48]. According to the dimensions of the aforementioned NN, the complexity of Algorithm 2 is
. Compared to the fully exhaustive search method with exponential complexity applied in a combinatorial problem
, the complexity of Algorithm 2 is significantly reduced.
4. Performance Evaluation
This section provides comprehensive numerical results to validate the effectiveness of our proposal. The two-step solution is solved through Matlab and Python tools. To be specific, subproblem one with parameters and is worked out through Matlab (version: Matlab R2020b), and subproblem two with parameter is tackled with Python (version: Python 3.8).
4.1. Simulation Setup and Parameters Setting
According to
Figure 4, as the simplified system model, we consider a three-dimensional coordinate system where the BS antennas and RIS are located at (0 m, 20 m, 0 m) and (
x m, 40 m, 0 m), respectively. In addition, the IoT devices are uniformly distributed in a horizontal square region with a side length of 40 m, which is centered at (100 m, 1 m, 0 m), where 1 m signifies the working height of each IoT device. Without loss of generality, for
, we set the coordinates of the DL IoT devices as (90 m, 1 m, 10 m), (100 m, 1 m, −10 m), and (110 m, 1 m, 10 m), while UL IoT devices as (90 m, 1 m, −10 m), (100 m, 1 m, 10m), and (110 m, 1 m, −10 m). The QoS of each IoT device is considered 1 bps/Hz to guarantee all devices, especially the devices with relatively poor channel conditions, in normal communications. The channel path loss of large-scale fading is defined as
where
d represents the distance between two nodes. The value of −22 indicates that the path loss exponent is set at 2.2. Meanwhile, the value of −35.6 denotes the path loss at the reference distance of 1 m, which depends on the average channel attenuation and antenna characteristics [
49].
The small-scale fading is modeled by
where
and
represent the deterministic LoS components of BS-RIS and RIS-IoT device channels for UL/DL, respectively.
and
mean the stochastic NLoS components in a similar manner [
50]. The rician factor
is set to 10. We assume the AWGN
dBm and the equivalent AWGN
dBm,
dBm (Since we assume LI in scenario one is an approximated AWGN, we set the value of
−96 dBm at
m according to the average distance between the RIS and the IoT to manifest its subordinate influence compared with the UDI in this occasion). Considering the different channel characteristics and the up-to-date capability of SI elimination, we set the rician factor
a and power elimination level
of the residual SI at BS to 1 and −100 dB, respectively [
17]. Other required simulation parameters will be listed in the title of the corresponding figures. Each hidden layer of our proposed SAC algorithm contains 256 neurons. The main DRL-related hyper-parameters refer to
Table 2.
Furthermore, we introduce a fully exhaustive search (i.e., an upper bound for discrete phase shifts) method as a benchmark to solve
. However, due to the non-deterministic polynomial time, the exhaustive search approach lacks practicality in real-world applications. We additionally take a relatively low complexity local fixed-point iteration method [
51], where the complexity is
and
for scenarios one and two, respectively. Meanwhile, to highlight the gain from the phase shifts optimization, we additionally bring the random phase shifts method and take the case without RIS as a reference. Finally, the Riemannian manifold under continuous phase shifts is also introduced as an ideal case to reveal the effectiveness of the discrete phase shifts methods [
37]. The simulation results are averaged based on 300 channel realizations.
4.2. Optimizer Performance
AdaGrad, RMSprop, and Adam all have their own merits. However, empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods [
52]. In this regard, we take Adam optimizer in the SAC network and give a performance comparison between AdaGrad, RMSprop, and Adam based on our framework. From
Figure 5, the learning process of the Adam optimizer saturates faster than RMSprop, while the AdaGrad optimizer is absolutely unable to work in our framework.
4.3. Convergence of Algorithm 3
Figure 6 shows the convergence behavior of the proposed two-step algorithm with respect to different parameter settings in both scenarios. It can be observed that increasing the number of BS antennas and IoT devices with the fixed RIS configuration will slow down the convergence. For instance, the parameter settings of
and
undergo 10–15 iterations to convergence, while nearly 20 iterations are required for setting
. Moreover, there are no evident distinctions for convergence between the two scenarios. It confirms that the proposed DRL method is less affected by the scenarios.
4.4. Impact of the RIS Location
Figure 7a illustrates the performance impact of RIS location in scenario one. It is evident that the RIS deployed either close to the BS or the IoT devices will improve the sum SE. The proposed algorithm can obtain 135% and 62% performance gains with RIS deployment on the BS (
m) and IoT devices side (
m) compared with that at
m, respectively. The reason is that RIS can reconstruct the signal utmost at these positions, thus significantly enhancing the desired signal and reducing interference via adjusting the different transmission directions when the LoS link is severely blocked. Further, with
x increasing, the sum SE decreases. It is caused by the weakening reflected signal, which degrades the capability of suppressing the residual SI at BS. When the deployment of RIS is far from the BS and begins approaching the IoT devices, the sum SE increases. It implies that the RIS next to the IoT devices can further alleviate the UDI. In addition, it is found that the continuous phase shifts method outperforms other algorithms owing to the infinite RIS resolution. Although the proposal’s performance is slightly lower than the exhaustive search method, it is better than the sub-optimal local fixed-point iteration method. This is because the fixed-point iteration method easily falls into local optimization when RIS includes many reflection elements. Next, because of the aimless signal reconstruction of the random phase shifts method, it achieves trivial profit from the RIS and is slightly better than the case without RIS. It is also for this reason that the random phase shifts method, or the case without RIS, is naturally less (
2.84 bps/Hz) or none affected by the location of the RIS.
Figure 7b shows the impact of the RIS location in scenario two. Since the LI on the IoT devices side is emphasized in this scenario, it will further highlight the deployment benefit when RIS is located on the BS side. That is because we assume BS can absolutely eliminate the LI on the BS side, while the IoT devices incur the major performance impact of LI at the consideration of the trivial UDI. The trend of the curves is also consistent with
Figure 7a when
x is less than 70 m, which is the same reason as
Figure 7a has explained. For
m, the sum SE decreases with
x increasing. It is caused by the enhanced LI when RIS is excessively approaching the IoT devices side that the loss of LI will partially offset the profit of signal reconstruction.
Furthermore, the performance gap between the proposal and the fixed-point iteration method is distinct in scenario two, where the gap at
m is 3.22 bps/Hz, compared to 1.54 bps/Hz at the same location in scenario one. It is relevant to the different objective functions in relation to phase shifts in each scenario. Specifically, the related objective in scenario two is more complicated than that in scenario one, leading to a larger error of the fixed-point iteration method when calculating the unit operations of each related phase shift to the next iteration. However, the proposed algorithm is based on the model-free, which does not particularly care about the concrete objective structure. Similarly to the fixed-point iteration method, the continuous phase shifts scheme based on the Riemannian manifold tends to deviate from manifold space when updating the tangent space in scenario two. Thus, the gap between the continuous phase shift and the proposal is smaller in scenario two than in one, such as the 3.90 bps/Hz performance gap at
m in
Figure 7b, while 4.33 bps/Hz in
Figure 7a. Even if under the heavy LI at
m of scenario two, the proposed algorithm outperforms the continuous phase shifts and fixed-point iteration methods by 1.03 bps/Hz and 0.68 bps/Hz relative gains with respect to scenario one, respectively. It suggests that the proposed algorithm is more advantageous in LI mitigation than the other algorithms under the scenario handoff.
Notably, scenario two with LI stressed is more likely to reproduce due to the high density of population and buildings in the smart cities. Thus, our proposal is more adaptable to practical application in the urban outdoor environment.
Joining
Figure 7a,b, we can conclude that the SE performance concerning RIS mainly depends on RIS location and the wireless environment of IoT devices. In any case, the performance gain from signal reconstruction of the optimized RIS is better than that of non-RIS and random RIS methods. Since the varied performance among scenarios mainly depends on the two relevant factors we have discussed above, we only display the numerical results with respect to other influencing elements (as we shall see below) at
m in scenario two. We can concisely highlight our proposal through the scenario handoff from scenarios one to two and the trade-off of interference mitigation between the residual SI and LI at
m.
4.5. Impact of the Number of RIS Reflection Elements
Figure 8 shows the influence of the number of RIS reflection elements. We can observe that the sum SE increases, and the gap between the continuous phase shifts method and others widens with the increase in
L. This expected result is mainly produced by the more reflection elements, the more cumulative gain from the reconstructed signal. For
, the fixed-point iteration and proposed algorithms are obviously weaker than the continuous phase shifts method. The discrepancy in performance loss of the proposed algorithm ascends from 19% at
to 29% at
in reference to the continuous phase shifts method. Same as this, the fixed-point iteration method is up from 42% to 51%. The reason is that our proposal reduces computational complexity at the cost of a certain performance when the action dimension is large. Moreover, the fixed-point iteration easily falls into the local optimization as the number of
L is large, which is consistent with the conclusion in
Figure 7a.
4.6. Impact of the Number of Bits
In
Figure 9, we evaluate the performance trend under different
b. It can be seen that the performance improves with
b increases except for the resolution uncorrelated methods, such as the continuous phase shifts, random phase shifts, and case without RIS methods. Especially for the random phase shifts method, the gain of RIS only depends on the number of reflection elements instead of the resolution. It accounts for the fact that different random phases in one element cannot reconstruct the signal well due to the casual transmission direction. Additionally, the performance of the exhaustive search method is approximately equal to the continuous phase shifts method when
b is set to 6 bits and only loses 0.12 bps/Hz. It explains that the discrete phase shifts method can also approach the continuous one at some point. Therefore, the rationality of our proposal adopting discrete phase shifts is proved.
Nevertheless, the performance of the proposed algorithm decreases at 6 bits. Compared with the fixed-point iteration method, the proposal has an extra 7% gain deficit at relative to . This is due to the fact that a larger resolution will incur an exponential increment in the action space, which will reduce the learning efficiency. It suggests that our proposal should find a trade-off between performance and resolution.
4.7. Impact of the Residual Self-Interference
Figure 10 presents the sum SE trend under different residual SI levels. Obviously, the performance is more afflicted with the increased residual SI, and our proposal can better eliminate the residual SI at the condition of low (−120 dB
dB) and normal (−105 dB
dB) levels. To be specific, the maximum loss of our proposal caused by the increasing residual SI is 0.91 bps/Hz and 3.01 bps/Hz during low and normal levels, respectively, whereas that of the case without RIS is 1.79 bps/Hz and 4.77 bps/Hz. For the ideal case, the corresponding loss of the continuous phase method is 0.22 bps/Hz and 2.71 bps/Hz. It demonstrates that it is practical to adopt the proposed algorithm to reduce the FD performance loss by the residual SI.
The sum SE drops distinctly when the residual SI level exceeds −90 dB. At dB, our proposal outperforms the fixed-point iteration and random phase shifts methods by 3.15 bps/Hz and 8.71 bps/Hz, respectively. Neglecting the benefit factor of the optimized RIS by comparing the random phase shifts method, we can infer that the proposed algorithm can also brilliantly restrain the LI to improve performance.
4.8. Impact of the Transmit Power
Figure 11a,b show the performance impact of the transmit power. In
Figure 11a, with the BS power budget increasing, the sum SE improves and reaches the bottleneck at 30 dBm. This is mainly due to the effect of residual SI and DL to DL interference. For instance, the excess transmit power at BS will severely enhance its SI, thus degrading the performance. Accordingly, the actual power allocated by BS will be lower than the threshold to seek the optimum sum SE when the transmit power budget is oversaturated. At the saturation point, the proposed algorithm outweighs the fixed-point iteration method by 3.22 bps/Hz, while it only has 1.98 bps/Hz SE less than the exhaustive search method.
On the other hand, we further compare the influence of the IoT device transmit power. The simulation result in
Figure 11b presents that performance starts to decline when the transmit power of IoT devices is over 20 dBm, except for the case without RIS. For example, the proposal and fixed-point iteration methods degrade 1.16 bps/Hz and 1.47 bps/Hz from
dBm to 25 dBm, respectively. We can infer that the composite strong LI and UL to UL interference due to the superfluous transmit power mainly creates the trend of these curves.
Because of no LI in the case without RIS, the peak rests on dBm. However, if p keeps rising, the performance decreases due to the difficulty of BS in decoding the signal mixed with an intensive UL to UL interference. It explains that the IoT device transmit power in real-world applications should have an upper bound. In fact, we can draw another interesting conclusion that the optimum transmit power of IoT devices with RIS is lower than that without RIS, which can illustrate that RIS-aided systems not only improve SE but also save energy. On the other hand, we also conclude that if the phase shifts are not well tuned in the intense LI situation, the random phase shifts method can even be lower by 0.84 bps/Hz than the case without RIS. This underscores the importance of RIS optimization in the urban outdoor environment. Meanwhile, for dBm, our proposal outweighs the unoptimized RIS approach by 6.23 bps/Hz. It demonstrates that the proposed algorithm still performs well even with an excessively high transmit power.
To further illustrate the merit of the proposal in terms of complexity and power consumption, we list the performance of discrete phase shifts related methods with transmit powers
and
dBm in
Table 3.
It can be seen that the EE of the proposal is only 7.4% less than the exhaustive, but the complexity is greatly reduced.
4.9. Impact of the Rician Factor
Finally, we evaluate the SE performance under different rician factors in
Figure 12. With the increase in the rician factor value, the sum SE decreases, especially for the optimized RIS cases. Moreover, it also can be seen that with
growth, the gain loss with the proposed algorithm is more evident than the fixed-point iteration method. For instance, the gap between the proposal and the case without RIS is 18.58 bps/Hz at
and reduces to 8.47 bps/Hz at
, whereas the fixed-point iteration method decreases from 13.49 to 5.25 bps/Hz with the same settings. The reason is that RIS improves performance by reconstructing signals to increase multi-path diversity. A larger
will not be conducive to promoting multi-path and cause severe co-channel interference due to the enhanced main path. Therefore, the gain brought by multi-path diversity decreases coupled with the increase in
, and the greater profit deriving from the diversity gain will cause more performance decline under the same level of the enlarged
. We conclude that the RIS should be deployed in a relatively rich scattering environment to seek optimum performance. Thus, our proposal fairly suits the urban outdoor environment with rich scattering.