1. Introduction
A wireless sensor network (WSN) enables the automation of industrial sites through the collection, exchange, and analysis of data between interconnected sensor devices. One of the promising WSN applications is a real-time monitoring system in which sensor nodes collect data and forward them to a sink node through a multi-hop transmission. For the freshness of data, the monitoring system has a strong demand for real-time communication; hence, it is important to design a low-latency and highly reliable routing protocol. Specifically, a routing protocol for monitoring applications has the following requirements:
(a) High reliability: Considering the limited transmission range of sensors, sampled data should be forwarded to the destination through a multi-hop transmission after self-configuration of a network. However, if data are frequently dropped owing to poor link quality between sensors, the service requirements, especially for mission-critical applications, cannot be satisfied.
(b) Low latency: The data sampled from sensors should be delivered to the destination quickly; otherwise, it will not be possible to respond to emergencies. In particular, stale data are fatal for patients with serious illness in a health monitoring system. Hence, it is necessary to build the shortest path or reduce the queuing delay via load balancing to forward the data to the destination in a timely manner.
(c) Energy efficiency: Sensors for monitoring applications are typically installed in a wild field where battery replacement is difficult. If the energy of a specific node is quickly exhausted, it leads to a decrease in the network lifetime. Thus, it is necessary to reduce the additional energy consumption owing to network maintenance or to balance energy consumption between sensors.
Conventional ad hoc routing protocols [
1,
2] can ensure reliable transmission and satisfy service requirements of source nodes in WSNs; however, they involve a high computational cost (i.e., control-message overhead) for sensors to maintain routing paths. To reduce the energy consumption from the control overhead, tree-based routing can be a feasible solution in WSNs. The parent selection is a key role of a tree-based routing protocol because the links between parent and child nodes are used as routing paths. Conventional tree routing [
3] uses the number of hops to the sink as a decision criterion for parent selection. Each source node can build the shortest path to the sink by choosing a node with the least number of hops as a parent node. This approach can provide a low latency; however, it cannot ensure reliable transmission because the link quality between the parent node and the child node is not considered in the parent selection process.
To build a stable tree topology, some studies [
4,
5] adopt the link quality as a decision criterion. However, they do not consider the load balancing during the parent selection; hence, an excessive load can be placed on a specific node, resulting in significant packet loss. To prevent congestion, some studies [
6,
7,
8] propose a load-aware parent selection algorithm; however, they do not jointly consider link quality and energy efficiency. Although energy-aware parent selection schemes are proposed [
9,
10,
11,
12,
13], they also cannot jointly achieve the above service requirements. These tree-based routing protocols address diverse parent selection problems independently; hence, they cannot jointly improve multiple performance metrics, such as the transmission reliability, end-to-end delay, and energy consumption.
To address this problem, some studies [
14,
15,
16,
17,
18] attempt to select the best parent node taking into account multiple cognitive metrics. To jointly achieve multiple objectives, they employ a linear weighted-sum method in which each node calculates the weighted cost by integrating multiple metrics. However, the linear weighted-sum method adopts subjective weights, which are often less objective and reduce their acceptance within the scientific community. As a result, they cannot provide a flexible trade-off between performance metrics. To overcome this limitation, in our previous work [
19], we proposed a multi-criteria decision making (MCDM)-based parent selection scheme. The proposed scheme finds the best parent node by determining the relative importance of multiple decision criteria using the analytical hierarchy process (AHP) [
20] and then derives the weighted cost using the simple additive weighting (SAW) method [
21]. However, the decision maker needs to calculate the weights of each metric based on its preference or background knowledge of the target network. In addition, it is challenging to find an optimal parent node in a large-scale WSN because the scale of WSNs is growing and the deployment is more complex.
To eliminate the bias of a decision maker and make adaptive decisions in a complex deployment scenario, in this work, we extend our previous work with the specific goal of achieving multiple objectives, such as high reliability, low latency, and energy efficiency. To find the best parent node, we use empirical data about the target network obtained through Q-learning. Specifically, we define a state space, action set, and reward function using multiple cognitive metrics, and then find the best parent node through trial and error.
The main contributions of this work are summarized as follows:
To jointly achieve multiple objectives in WSNs, in this work, we revisit the classic problem of finding an optimal parent node in the tree-based routing. Specifically, we propose an enhanced tree-based routing protocol based on reinforcement learning (RL);
To cope with various network scenarios, e.g., link breakage and congestion, we propose multiple cognitive metrics. Specifically, in addition to hop count, three types of cognitive metrics are formulated: weighted average of received signal strength, buffer occupancy ratio, and power consumption ratio. These metrics affect the decision rules in tree routing;
We present the system model for applying RL algorithm in WSNs and specify the basic operations of the proposed tree-based routing protocol. Specifically, we specify concrete algorithms for loop detection and parent update, as well as tree construction using build and hello messages;
To make adaptive decisions in a complex deployment scenario, we define a state space, action set, and reward function. The agent recognizes the network state with the proposed cognitive metrics and finds the best parent node with the highest reward through trial and error;
Through a comparative study using diverse simulations, we verify that the proposed parent selection scheme supports a reasonable trade-off between the performance metrics, i.e., end-to-end delay, packet delivery ratio, and energy consumption.
2. Related Work
The routing protocol has an important role in a WSN, ensuring a reliable transmission and satisfying the QoS of individual nodes. The node batteries of a network in the wild are difficult to replace; hence, the routing protocols should create a routing path at low computational cost. In reactive routing protocols, e.g., ad hoc on-demand distance vector (AODV) routing [
1], a source node sends a route request message whenever it needs a routing path to the destination node. To maintain a reliable routing path, the node periodically broadcasts control messages and checks the status of the links with its neighboring nodes. Reactive routing can guarantee a reliable transmission in a network in which the nodes frequently move, such as mobile ad hoc networks; however, the control overhead is significantly increased to maintain the routing paths.
To reduce the number of control messages, proactive routing protocols, e.g., optimized link state routing protocol [
2], select nodes with high connectivity between neighboring nodes as the parent nodes. To maintain the topology information, the parent nodes are obliged to broadcast control messages periodically. This table-driven routing can minimize the number of control messages because only the parent nodes broadcast the topology information; however, the broadcast overhead and memory usage are increased when there is a large amount of node information to be transmitted. Moreover, in a network in which the nodes frequently move, the control overhead significantly increases because topology information must be frequently broadcast. To reduce the energy consumption from the broadcast overhead, tree-based routing is the best solution for a small-scale WSN. A tree-based routing protocol consists of one root node and multiple sensor nodes, which have parent–child relationships and are interconnected based on a tree structure. Each node chooses the optimal parent node and forwards the data to the parent until it reaches the root node. Tree-based routing can eliminate a path search and avoid extensive broadcast messages. The communication links between parent and child nodes are used as routing paths, and parent selection is a key aspect of the tree-based routing protocol.
For example, conventional tree-based routing protocols [
22,
23,
24] select as the parent node the neighbor node with the lowest number of hops to the sink node. By using the hop count as a decision metric, they can minimize the end-to-end delay; however, the PDR is reduced because frequent node movements degrade the link quality between the child and parent nodes. To build a stable tree structure, the authors of [
4] propose a route stability framework in which each node calculates an overall score for each neighboring node. The scores are weighted based on the link quality and relative distance with the neighboring node. The neighbor node with the best overall score is selected as a parent node. In [
5], each node chooses the neighboring node with the best link quality metric as a parent node. The link quality metric is calculated using cognitive metrics, i.e., the link packet delivery and link stability. However, this approach does not consider the load balancing during the parent selection process. In other words, an excessive load may be placed on a particular node; thus, congestion occurs at certain nodes, resulting in significant packet loss. In addition, the network lifetime is shortened due to unbalanced energy consumption between nodes.
To prevent congestion, the authors of [
6] propose a congestion-aware tree routing protocol. With this scheme, each node evaluates the link cost of each neighboring node, and the node with the best link cost is selected. A weighted-sum equation is used to determine the link cost, and the average queue size and remaining energy of the nodes are considered in the proposed formula. The authors of [
7] propose a load-aware dynamic cost function to weight the links between the parent and child nodes. The cost of the links is updated based on the congestion metric. If a node is congested, the value of congestion metric will increase. In this case, it is not preferable for neighboring nodes to select this link as a parent node. To improve the network lifetime, the authors of [
9] propose an energy-aware parent node selection algorithm. With this scheme, each node builds a hierarchical structure based on the hop count to the sink node. If the energy level of a node is lower than half of the original battery capacity, the node is moved one level lower in the tree model. By lowering the hierarchical level, a node with a low battery level cannot be selected as a parent node. The authors of [
11] construct a binary tree based on the number of hops to the sink node. To balance the energy consumption, the parent node chooses the link with the smaller number of packets it receives from the child on the left or right. In [
10,
12], each node calculates the link cost using the distance vector and remaining energy. In [
13], each node selects a parent node from the neighboring nodes with predefined cognitive parameters, which is called a fitness function using the flower pollination algorithm. The fitness function considers the energy consumption and distance ratios of the average distance.
Several tree-based routing protocols take multiple cognitive metrics into consideration concurrently and aim to jointly achieve multiple objectives by considering several decision criteria. For example, the authors of [
14] take three cognitive metrics, i.e., the channel error rate, residual energy, and buffer capacity, and integrate the proposed metrics using heuristic coefficients. These coefficients denote the significance of each metric, and the sum of the coefficients is equal to 1. Each node then selects the neighboring node with the highest weighted sum as a parent node. Similarly, the authors of [
15] take the link quality, residual energy, and relay node frequency and integrate these metrics using the weighted-sum method. The authors of [
16] integrate multiple cognitive metrics, namely, the distance, number of associated nodes, and residual energy using the weighted-sum method. The authors of [
17] integrate three cognitive metrics, such as the hop count, residual energy, and link quality using the heuristic coefficients. The authors of [
18] consider four attributes, i.e., the link expiration time, trip time, node speed, and lifetime, and integrate them using an “if-else” statement. However, these approaches adopt subjective weights, which are often less objective and reduce their acceptance by the scientific community. To provide a flexible trade-off between the conflicting objectives, in our previous work [
19], we propose a MCDM-based parent selection scheme in which each node finds the best parent node by determining the relative importance of multiple decision criteria using the AHP and then derives the weighted cost using the SAW method.
5. Performance Evaluation
5.1. Simulation Setup
In this work, we simulate the proposed scheme using the OPNET modeler version 18.7. The simulation parameters are given in
Table 1. We conducted the simulation 100 times with a 95% confidence interval. Specifically, we run the simulations for 3600 s and set the network size to 5000 m × 5000 m. To build a large-scale network, we set the number of sensor nodes to 100 and the number of sink nodes to 5. To build a tree topology, the sink node periodically broadcasts a build message every 20 s. After receiving the build message, each node updates the number of hops to the sink node in the neighbor table and start to choose a parent node. In addition, all nodes periodically send a hello message to its neighbors every 5 s. We set the sizes of the build and hello messages to 192 and 128 bits, respectively. To verify that the proposed scheme provides a reasonable trade-off between multiple objectives by adaptively changing the parent node according to changes in network conditions, we observe the performance metrics by varying the traffic bit rate and bit error rate. To simulate congestion, we change the traffic bit rate from 1000 to 5000 bits/s. In addition, we change the bit error rate from
to
in order to make the link condition dynamic.
We compare the performance of the proposed scheme with the linear weighted sum-based parent selection algorithm [
17] and the MCMD-based parent selection algorithm. The linear weighted sum-based scheme considers the number of hops, buffer occupancy ratio, link quality, and residual energy as decision metrics. Here, the weight of each metric is fixed at the same ratio. On the other hand, the MCDM-based parent selection scheme logically determines the relative importance between the decision metrics based on the preferences or background knowledges of the decision maker. Specifically, in our previous work, we (i.e., the decision maker) set the relative importance of each metric based on the following rules. When the network is unstable, we increase the weight of the received signal strength, leading each node to choose a parent node with good link conditions. If the link conditions of the candidate nodes are not significantly different, each node changes another node with a low buffer occupancy and high residual energy to a parent node.
5.2. Reliability
The main factors affecting the packet delivery ratio are buffer overflows caused by congestion and bit errors according to the link conditions. As shown in
Figure 5, all schemes cause packet loss owing to congestion as the traffic bit rate increases. The linear weighted sum-based scheme considers link quality in the decision-making process. However, since the weights among the multiple metrics are equal, the parent node is rarely changed even if a buffer overflow occurs. On the other hand, since the MCMD-based scheme has a higher weight of the buffer occupancy ratio, packet loss owing to congestion is less than that of the linear weighted sum-based scheme. In addition, it adaptively selects a node with good link quality as the bit error rate increases, thus showing a better packet forwarding rate.
However, in the MCDM-based scheme, the weights between metrics cannot be changed at runtime; thus, it has a clear limitation in that it is difficult to respond to changes in network scenarios. In contrast, the proposed scheme shows the best performance because it learns how to cope with changes in network conditions at runtime through trial and error.
5.3. Latency
The main factors affecting the end-to-end delay in tree routing are the number of hops to the sink node and the queuing delay. The traditional tree routing chooses a parent node based on the number of hops to the sink node. This shows the best performance when the traffic bit rate is low. However, as the traffic bit rate increases, congestion occurs at a node with a small number of hops to the sink node, resulting in a sharp increase in end-to-end delay. To solve this problem, the linear weighted sum-based scheme considers the congestion metric with the number of hops in the decision-making process. As shown in
Figure 6, since the weights between the metrics are the same in the linear weighted sum-based scheme, load distribution is not effectively performed as the traffic bit rate increases. This increases the queuing delay and thus it shows the worst performance. In addition, when the bit error rate increases, a node having a better link quality rather than having a lower number of hops can be selected as a parent node; hence, the end-to-end delay is slightly increased. In the MCMD-based scheme, load balancing is performed properly because the decision maker pre-determines the weight of the congestion metric by considering the network environment. However, when the bit error rate increases, a node with a better link quality rather than with a lower number of hops is selected as a parent node; hence, the end-to-end delay is slightly increased.
As described above, in the MCMD-based scheme, the decision maker pre-determines the weight of each metric based on prior knowledge of the network environment. Thus, it performs well for pre-configured network scenarios, but cannot cope with changes in network conditions that occur at runtime. In contrast, in the proposed scheme, each node learns the optimal action according to the network environment through Q-learning. The proposed scheme shows the best performance because it can adaptively cope with changes in network conditions.
5.4. Energy Efficiency
The main factor that increases average power consumption in tree routing is frame retransmissions owing to packet loss caused by congestion and link breakage. In addition to energy efficiency, in tree routing, it is necessary to balance the energy consumption between sensor nodes to ensure the diversity of routing paths.
The traditional tree routing selects a parent node based on the number of hops to the sink node; thus, the packet loss significantly occurs owing to congestion as the traffic bit rate increases. As a result, additional energy consumption increases owing to frame retransmissions. In addition, when the link quality is bad, the number of retransmissions also increases. To address this problem, the linear weighted sum-based scheme considers the link quality and buffer occupancy ratio together in the decision-making process, but it is difficult to respond to changes in network conditions because the weights between multiple metrics are not properly adjusted. As a result, as shown in
Figure 7, the energy consumption per unit time is the highest, and the number of dead nodes is the highest.
The MCMD-based scheme shows better performance than the linear weighted sum-based scheme because the decision maker knows the network environment in advance and determines appropriate weights accordingly. However, as previously analyzed, it is not possible to adjust the weights of metrics at runtime. By contrast, the proposed scheme recognizes the network environment through various cognitive metrics and finds the best policy to optimize the performance metrics through trial and error. As a result, the proposed scheme shows better performance than the MCDM-based parent selection scheme.
5.5. Discussion
As the traditional tree routing selects a parent node based on the number of hops to the destination, it cannot cope with various problems, such as link breakage and congestion. To address this problem, the linear weighted sum-based scheme jointly considers multiple cognitive metrics, such as the hop count, link quality, buffer capacity, and residual energy in the decision-making process. However, in the process of deriving the weighted sum, the weights between the multiple metrics are determined to be the same; thus, it cannot cope with changes in network conditions.
To solve this problem, the MCDM-based scheme proposed in our previous study aims to respond to changes in network environment by logically determining the weights between multiple metrics. However, as the decision maker pre-determines the weights of each metric based on prior knowledge, there is a clear limitation in that the weights cannot be changed at runtime. To overcome this limitation, the proposed scheme recognizes the network environment using multiple cognitive metrics and finds the optimal policy to jointly optimize the performance metrics through Q-learning. In addition, we use WMA to prevent sudden changes in cognitive metrics. As a result, the proposed scheme provides a flexible trade-off between conflicting performance metrics by adaptively changing the parent node according to changes in network conditions.