1. Introduction
Mine Internet of Things (MIoT) devices are widely applied in intelligent mines to improve safety and mineral production [
1]. In the mining industry, IoT networks play a crucial role in controlling mining equipment and gathering essential environmental data, including temperature, humidity, and wind speed, which are instrumental in safeguarding the personal safety of mine workers [
2,
3]. The accurate, reliable, and durable operation of MIoT devices is essential for the stable and long-term service of intelligent mines. Therefore, MIoT devices must provide high-speed transmission and low energy consumption. However, the wireless transmission characteristics of electromagnetic waves in MIoT often experience severe scattering, substantial interference, and Non-Line-of-Sight (NLoS) propagation, which necessitates innovation in new transmission technology [
4].
Furthermore, despite significant advancements in wireless communication technology in recent decades, most MIoT networks, particularly those deployed in open pit mines, remain susceptible to physical layer threats. The open nature of wireless channels exposes these MIoT devices to vulnerabilities such as jamming and eavesdropping, highlighting the need for enhanced security measures [
5]. Malicious devices connected to the system can wiretap confidential information, which can lead to data leakages, such as mineral production schedules, the distribution of mineral resources, and safety aspects of the operations. The intruder can use the stolen data to commit fraud and extortion for illegal profit or pose security threats, such as negative impacts on production and deliberately creating catastrophes. In this case, MIoT devices must be able to withstand smart attacks, particularly active eavesdropping, which involves simultaneous eavesdropping and jamming to increase the MIoT device’s transmit power and intercept more data [
6].
As an emerging technology, intelligent reflecting surfaces (IRS) have attracted extensive research interest. The low cost of IRS makes them a highly suitable technology for wide adoption in MIoT communication. IRS contain metamaterial designed to reflect the incident waves from the source towards the destination [
7,
8]. With properly adjusted elements, IRS can construct an artificial Line-of-Sight (LoS) link and significantly improve transmission performance in NLoS scenes. Moreover, adding the nonreflected signal and the IRS-reflected signal at the eavesdropper can produce destructive interference, effectively suppressing eavesdropping activity [
9]. In this paper, IRS establish a favorable propagation environment, increasing the access point (AP)’s received signal power and decreasing the eavesdropper’s received signal power, thus increasing the secrecy rate of the MIoT system in the presence of active eavesdropping.
Due to the complex and random time-varying channel characteristics in MIoT, acquiring the optimal transmission scheme using traditional techniques is typically not feasible [
10]. The wiretap policy is also challenging to estimate, making it harder to find the optimum secure transmission policy. Motivated by the advances in model-free deep reinforcement learning (DRL), we model the secure transmission procedure as a Markov Decision Process (MDP). The increasing computational capability of IoT devices, such as the Qualcomm Snap-dragon 800 [
11], makes it possible to apply DRL techniques in practical mining IoT communication systems.
In this paper, we propose an innovative secure transmission scheme that leverages IRS and the deep deterministic policy gradient (DDPG) algorithm to enhance the secrecy rate of the system in the presence of an active eavesdropper, specifically in a dynamic MIoT environment. In the proposed scheme, RL is utilized to adapt the time-varying channel characteristics and make the optimal choice without knowing the specific transmission model and attack model. The DDPG-based scheme can select policies in a continuous space while avoiding discretization errors. This enables the MIoT device to jointly optimize the IRS phase shifts and the MIoT device’s transmit power in a mine environment. Strategically adjusting the phase shifts and transmission power of IRS, as well as leveraging the utilization of reflected signals, is helpful to enhance the effectiveness of legitimate transmission and ensure a safe mine environment.
According to our simulation results, the proposed DDPG-based IRS-assisted secure transmission (DIST) scheme achieves higher utility than the IRS randomly configured (IRC) scheme and the IRS-free (IF) scheme. By changing the number of IRS elements, we also assess the system utility of both the proposed DIST scheme and the IRC scheme. The main contributions of this paper can be outlined as follows:
We construct a joint optimization problem of the MIoT device’s transmit power and IRS reflecting beamforming to maximize the system’s utility. We present an IRS-assisted secure transmission scheme against active eavesdropping in MIoT.
A DRL-based intelligent beamforming and power control framework is presented to achieve the optimal IRS phase shifts and MIoT device’s transmit power. We formulate the control of the IRS elements as an MDP and employ the DDPG algorithm to achieve real-time and continuous phase control based on the dynamic MIoT environment.
Simulation results demonstrate our proposed DRL-based IRS-assisted secure transmission scheme’s performance suppresses eavesdropping and enhances legitimate transmission compared with the IRC and IF schemes.
The subsequent sections of this paper are organized as follows:
Section 2 discusses the related works.
Section 3 introduces the proposed system model, channel model, and problem formulation. In
Section 4, we introduce our proposed DIST scheme.
Section 5 provides simulation results, and
Section 6 concludes the paper.
Notations: In this paper, we present matrices and vectors with boldface. and denote the transpose and conjugate transpose operations, respectively. denotes a diagonal matrix, and j is the imaginary unit. denotes the expectation operation. is the absolute value of a scalar. denotes a complex-valued matrix with a size of .
2. Related Works
Numerous methods have been proposed to improve physical layer security (PLS) performance, including artificial noise (AN) [
12], physical layer authentication (PLA) [
13], and beamforming [
14]. However, these methods have limitations, such as extra power consumption for AN, computing resource requirements for PLA, and limited security guarantees for beamforming. For mining scenario, the authors in [
15] investigate PLS in an underground mine environment using an amplify-and-forward relay-aided system with multiple eavesdroppers. The authors employ a block coordinate descent algorithm to design the precoding and jamming matrix at both the source and the relay, similar to other traditional PLS techniques, rather than during the propagation process. Recently, the use of IRS has gained significant attention to address PLS issues in the propagation process. Several studies have explored the use of IRS in secure communication systems in [
16,
17,
18,
19]. A genetic algorithm (GA) is introduced in [
16] to optimize the phase shift of an IRS in a multiple-input multiple-output (MIMO) system, with the goal of improving security performance in the presence of an eavesdropper. To reduce the overhead of computing resources, a low-complexity algorithm is studied in [
17] based on fractional programming (FP) and manifold optimization (MO) to circumvent the nonconvex optimization problem and obtain near-optimal IRS phase shifts. However, the optimization technique in both [
16,
17] rely on a specific transmission model and lack robustness. Moreover, a more practical system model comprising multiple eavesdroppers and imperfect channel state information (CSI) are studied in [
18,
19]. The interuser interference (IUI) among each mobile user (MU) is studied in [
19]. Additionally, none of the existing IRS-assisted PLS approaches consider an active eavesdropper scenario where jamming attacks interfere with the legitimate transmission and raise the transmit power.
Artificial Intelligence (AI) has introduced a new way to solve PLS problems through RL. Recent studies in [
20,
21] have considered PLS problems concerning smart attackers conducting jamming, eavesdropping, and spoofing attacks. For instance, prospect theory (PT) in an unmanned aerial vehicle (UAV) transmission system is investigated in [
20], where the attacker is considered to be selfish and subjective. To enhance the secrecy performance and the utility of the legitimate UAV, a power allocation approach utilizing deep Q-networks (DQN) is put forth to determine the optimal policy, in cases where the attack and channel models are unknown. RL techniques are studied in [
21] to configure IRS beamforming design. The authors first establish the interaction between the base station (BS) and the smart attacker as a non-cooperative game and derive the Nash equilibrium of the game. Then, a DQN-based antismart attacker strategy is proposed to make the BS and IRS intelligent and restrain the attack, thus improving the system’s security. However, since the study assumes a static channel, the proposed strategy may be less adaptable to varying channel conditions, despite its focus on the game-theoretic interaction between the base station (BS) and the attacker. To address these limitations, a novel DRL framework is proposed in [
22] to enable the prediction of IRS reflection matrices without the need for extensive channel estimation or beamforming train overhead. Additionally, an integrated DRL and extremum-seeking control (ESC) is studied in [
23] to control the IRS and make the system more adaptive to the dynamic channel state without subchannel CSI.
The implementation of IRS and RL in the mining industry is a relatively unexplored research area. Machine learning is applied in a mining system to remove the operator from hazardous environments without compromising task execution [
24]. Ref. [
25] is the first work implementing IRS in a coal mine. In this study, IRS are placed at the inflection points of the nonlinear routes (i.e., zigzag tunnels) to improve wireless communication quality. Although an approximation-based algorithm is utilized to address the optimization problem, the complex and dynamic nature of the channel state is ignored. Thus, the proposed method in [
25] may not be practical in most mining scenarios. Furthermore, neither [
24] nor [
25] use RL or IRS to solve the PLS problems and enhance secure transmission in a mine environment. The mainly related work is summarized in
Table 1.
3. System Model and Problem Formulation
3.1. System Model
Considering a single-input single-output (SISO) uplink system, as shown in
Figure 1, one MIoT device, equipped with a single antenna, establishes communication with a single-antenna AP. Simultaneously, we introduce a single-antenna active eavesdropper with the intention of intercepting the transmission. The MIoT device collects data, such as temperature and gas density, and transmits the data to the AP, which is located
meters away. To establish a dependable communication environment, a passive IRS is deployed at a distance of
meters from the MIoT device, with
reflecting elements. All elements are configured through a wireless IRS controller that receives the control signal from the MIoT device. The IRS reflect the signal to enhance the transmission from the MIoT device to the AP and suppress the wiretap signal at the eavesdropper, thereby obtaining the maximum secrecy rate. The data are then updated to a cloud server and used by the remote control on the ground for digital management in the mine IoT applications.
Upon receiving the control signal, the micro IRS controller sets the bias voltage to apply the phase shift on each IRS reflecting element. The phase shifts configuration can be modeled as , where and are the amplitude reflection coefficient and phase shift of the n-th IRS element, respectively. For simplicity, we set for N reflecting elements.
The transmission policy in an IRS-assisted secure transmission system relies on precisely acquiring CSI. In our proposed system model, the legitimate channel state is obtained by the pilot-based channel estimation [
26]. We also assume the CSI of the wiretap channel to be perfectly known to the MIoT device. This is because the eavesdropper is considered an active user in the system but is not trusted by the legitimate receiver [
9].
3.2. Channel Model
The channel path losses from the MIoT device to the AP, from the MIoT to the eavesdropper, and from the jamming to the AP are denoted by
,
, and
. The channel path losses above are all regarded as Rayleigh fading, which means that the Line-of-Sight signal between the transmitter and receiver is blocked and can be expressed as [
27]:
where
is the path loss.
contains independent and identically distributed (i.i.d) circularly symmetric complex Gaussian distribution with zero mean and unit variance,
.
The distance-dependent path loss
is modeled as
where
dB is the reference channel path loss for the reference distance
m,
is the path loss exponent, and
d is the distance from the transmitter to the receiver.
The channel path loss from the MIoT device to the IRS, from the IRS to the AP, from the IRS to the eavesdropper, and from the jamming to the IRS are denoted by
,
,
, and
. The channel path losses above are all assumed to be small-scale Rician fading, which suggests the LoS link coexists with NLoS link, and the channel path loss can be expressed as [
7,
23]
where
K is the Rician-K factor and denotes the proportion of power between the LoS link and the NLoS link.
is the random components caused by multipath effect with i.i.d and
distributed elements. The deterministic component
is position-dependent and can be expressed as [
28]
where the superscripts “A” and “D” stand for “Arrival” and “Departure”, respectively.
Without loss of generality, we place the IRS on the
plane. So, the component
in Equation (
4) can be expressed as
where
where
is the carrier wavelength, and
d is the distance between two adjacent IRS elements. Furthermore,
represents the azimuth angle and
represents the elevation angle. The LoS component
is solely dependent on
and
, meaning that once the locations of each unit are obtained,
is fully determined.
For the proposed system model, the MIoT device sends message m with zero mean and unit variance to the AP with transmission power p, where , , , and is the minimum, and the maximum values of the MIoT device transmit power, respectively.
3.3. Problem Formulation
The received signal
at the AP and
[
9] at the eavesdropper can be denoted as
where
denotes the complex additive white Gaussian noise (AWGN).
The active eavesdropper aims to wiretap more data by increasing the jamming power
. Therefore, we assume that the active eavesdropper has no self-interference. That is to say, we ignore the LoS channel between the eavesdropper and the jamming device and only consider the IRS-reflected jamming signals. Thus, the received jamming signal
at the AP and
at the eavesdropper can be expressed as
Then, we can calculate the signal-to-interference-and-noise ratio (SINR)
at the AP and
at the eavesdropper [
29], and they can be expressed as
We evaluate the eavesdropping policy according to the AP’s received jamming power
, which can be denoted as
The achievable rates at AP
and eavesdropper
in bps/Hz can be denoted as [
6,
19]
Thus, the achievable secrecy rate
[
19,
30] can be denoted as
To achieve the maximum secrecy rate , there is a trade-off in configuring the IRS reflecting coefficient matrix . On the one hand, we synchronize the phase of the reflected channel with the direct channel to strengthen AP’s received signal and thus maximize . On the other hand, we reverse synchronize the phase of the reflect channel with the direct channel to weaken the eavesdropper’s received signal and decrease .
Then, the MIoT system’s utility function [
6] is defined as follows:
where
,
; weights
and
denote the coefficients. The coefficients
and
represent the weight of the achievable secrecy rate and the transmit power, which are set for balancing the influence factors of the utility function.
We aim to optimize the IRS phase shifts
and the MIoT device’s transmit power
p to maximize the utility. The following formulation represents the optimization problem:
However, it is difficult to solve the formulated problem, as its objective function is nonconvex concerning either or p. Additionally, the complex time-varying channel fading makes it impossible to obtain an optimal solution for long-term system utility using traditional optimization techniques.
5. Simulation Setup and Results
In this section, we comprehensively illustrate the performance of our proposed DIST scheme under the presence of an active eavesdropper in mining scenarios. The system topology and coordinate of each unit are shown in
Figure 3. The red line, blue line, and black line represent the eavesdropping channel, jamming channel, and legitimate transmission channel, respectively. In real-world mining operations, the positions of devices may vary. The changing positions may affect the value of the system performance. However, it will not impact the advantage trend of the proposed DIST scheme compared with the benchmarks. Simulations are implemented using Pytorch 1.13.1 with Python 3.9. The number of MIoT devices, jamming devices, and active eavesdroppers is set to 1, and they are all equipped with one single antenna. The MIoT device observes and estimates the CSI at each time slot. The IRS is composed of a total of
[
6] reflecting elements, specifically
elements aligned parallel to the y-axis and
elements aligned parallel to the z-axis. The background noise power
is set to
dBm [
27]. The MIoT device is specifically configured to operate within a transmit power range, with a minimum power
setting of 1 mW and a maximum power
setting of 9 mW. The jamming power is randomly generated in the range of 1 mW and 5 mW. The Rician factors
,
,
, and
are assumed to be equal and set to 10 [
33].
and
[
34] are the path loss exponents of the LoS link and NLoS link, respectively.
The learning model in the proposed DIST framework consists of a three-layer deep neural network (DNN). The hidden layer contains 32 neurons. The actor and critic learning rates are set to
and
, respectively. Moreover, the discount factor is determined to be
, whereas the soft update parameter is configured to be
. We set the max buffer size to 10,240 and the batch size to 16. Moreover, we set the time slot number in each episode to
and the episode number to
. The parameters
and
in Equation (
18) are set to 1 and 500, respectively, to balance the secrecy rate gain and power consumption loss. For the settings of the parameters mentioned above, we determined them through multiple experiments conducted by our research team.
Two benchmark schemes are considered, shown as follows:
IRS randomly configured (IRC): In this case, the reflection coefficients of each IRS element are generated randomly. We only use the DDPG algorithm to optimize the transmit power [
35].
IRS-free (IF): We consider a classical communication system in MIoT without introducing the IRS. In this case, the MIoT device only chooses the transmit power based on the DDPG algorithm [
36].
Figure 4 provides a comprehensive evaluation of the system utility across all schemes. Our proposed DIST scheme converges after 400 episodes and achieves the utility increment from −1.8 to 1.3. Specifically, in episode 600, our proposed DIST scheme achieves 5.5 and 4.25 times higher utility than the IF and IRC schemes, respectively. This proves the remarkable utility increase from applying the IRS in MIoT wireless communication. And it also emphasizes the significance of applying the RL mechanism to solve the IRS beamforming design problem in a secure transmission scene.
Figure 5 investigates the
,
,
, and
p of all schemes. For the secrecy rate shown in
Figure 5a, our proposed DIST scheme outperforms the IF scheme and the IRC scheme by 70.6% and 141.7% in secrecy capacity. We then dig into the detailed performance. Particularly, in
Figure 5d, in our proposed DIST scheme, we observe that the eavesdropping rate increases from
bps/Hz to
bps/Hz from episode 80 to 160, and then falls to
bps/Hz. The reason is that the MIoT device explores the environment and chooses the policy aiming to obtain the maximum utility. In this process, the eavesdropping rate may go up a bit, but in
Figure 5a,c, the signal transmission rate at AP and the secrecy rate are still rising. Several factors contribute to the continuous rise in system utility in this interval. Among these are the factors mentioned above and the declining transmit power, as shown in
Figure 5b.
Additionally, in
Figure 5c, the signal transmission rate at AP degrades a little bit from
bps/Hz to
bps/Hz after 190 episodes. The reason is that the MIoT device’s transmit power converges more slowly than the IRS phase shifts. After 200 episodes, the transmit power is still declining. According to Equations (
12) and (
15), lower transmit power will lead to a lower AP signal transmission rate when the reflecting coefficients converge to the optimal value.
As shown in
Figure 6, we investigate the performance of our proposed DIST scheme and the IRC scheme by varying the number of IRS elements. The significant improvement of our proposed scheme demonstrated in
Figure 6 results from more IRS elements bringing more reflected signals. When the IRS are well-adjusted, the reflected signal can be intelligently combined at AP to provide higher signal strength and deliberately manipulated at the eavesdropper to attenuate its received signal power, thereby diminishing its ability to intercept the transmission.
Moreover, the system utility of the IRC scheme decreases slightly as the number of IRS elements increases. This is because without IRS properly adjusted, the reflected signal with random phase will be added constructively or destructively, generating a stronger or weaker signal. Thus, the more IRS elements used, the larger the range of the SINR. According to Equations (
15) and (
16), the average
and
will decrease due to the different slope of function
when the SINR range gets bigger, eventually resulting in performance degradation.