1. Introduction
Wearable sensor systems consist of devices with one or many sensor nodes. Each sensor node is usually equipped with a low-speed microprocessor, limited memory, and a radio transceiver and receiver [
1,
2,
3]. In real world, these wearable sensor systems usually form a wireless sensor network to transfer collected data. With the development of technologies, wearable sensor systems are used extensively in civilian and military fields, such as monitoring the environment and habitats, tracking objects, transferring data within a tactical unit in battlefield, detecting fires, and controlling traffic. Since the data generated in thus system is very large and dense, it is difficult to determine how to collect data in different environments. In many cases, people have only focused on a part of observed objects (for instance, the maximum or minimum values). Therefore, how to process a query and collect needed data in wearable sensor systems has become the subject of extensive and in-depth research and learning [
4,
5,
6,
7,
8,
9,
10]. In general, the capabilities of communication, computing power, storage capacities, and battery lifetime of wearable devices are very limited. So, power consumption is still a major issue in wearable sensor systems and wireless sensor networks [
11,
12,
13,
14,
15,
16].
Unlike the traditional sense of sensor systems, wearable sensor systems usually form networks with dynamic topology and generate data over a large range. Clearly, the existing data-window filter and tree-based routing path methods can’t achieve good performance in these networks. First, tree-based routing algorithms, such as the directed diffusion algorithm, use sensor node’s level or distance from base station as the metric to construct data transmission path. Although the shortest path tree can be achieved, a relatively fixed path still leads to imbalances in energy consumption and is not suitable for networks with dynamic topology. Second, window-based algorithms also result in additional communication overhead when the filter is updating, especially when the measured data of sensor nodes change frequently over a large range. This is just the general circumstance in wearable sensor systems.
Regarding these issues, we introduce a novel top-
k query paradigm. Unlike directed diffusion scheme, the proposed method utilizes the ‘happened-before’ relation of received messages, which is a concept used in distributed system and means one event happening before another [
17], to determine which paths should be used, and constructs, on demand, a dynamic routing tree that considers the remaining energy, processing load, and the time drift of sensor nodes to avoid overconsumption of energy. This is specially designed for dynamic topology of wearable sensor systems. In order to reduce in-network traffic, each sensor node executes a data aggregation algorithm to merge the data measured by itself and transmitted from its children. Then, the base station extracts the top-
k result after collecting all the messages. The data aggregation algorithm is a distributed sorting and query scheme that restricts the data-merging process inside sensor nodes and makes it possible for every node to send only one message during data-reporting procedure. Though wearable sensor systems are the considered network environment, the proposed methods could also be used in other wireless sensor networks.
We implement the proposed methods and algorithms over data link layer directly in the network simulation environment and test the performance from various aspects. We also compare the proposed method with filter and tree-based top-k monitoring methods with respect to the lifetime of network and the balance of energy consumption. The obtained results indicate that the proposed method produces a better balance of energy consumption and longer lifetime of network when there are large changes in the range of data.
The remainder of this paper is organized as follows:
Section 2 discusses the related work on data aggregation and top-
k query methods in wireless sensors networks.
Section 3 presents our data aggregation and top-
k query method in details. Then, we propose an approach for constructing routing tree. In
Section 4, we evaluate the proposed method through extensive of simulations. Our conclusions and recommendations for further work are presented in
Section 5.
2. Related Work
Generally, most of the energy is consumed by communication between nodes in wireless sensor networks, such as wearable sensor systems [
11,
14,
18]. Therefore, minimizing the amount of traffic in processing queries can improve the efficiency of search and save a significant amount of energy, thereby prolonging the life of networks. Obviously, energy can be saved if the amount of data requested during a query could be minimized. Such query is called top-
k query [
5], which is considered as a kind of data aggregation methods. Top-
k query is designed to determine the
k highest (or lowest) observed values and is a very important task in data acquisition.
The pioneering work related to top-
k query is deemed to be TAG (Tiny AGgregation) [
6], despite the facts that the concept of top-
k is not mentioned distinctly in it and the data grouping method proposed by TAG is different from top-
k. TAG uses a tree-based routing scheme to construct its data aggregation data path, which is similar to directed diffusion [
4]. In TAG, an in-network aggregation technique called ‘grouping’ was presented to reduce network traffic. When a query is pushing down through an established routing tree, an expression over one or more attributes is brought to every sensor node. According to the expression, the reading of each sensor is placed into one group, the groups are partitioned, and the values are aggregated in the appropriate groups when answers to the query are flowing back. The result is that the base station can collect aggregated data directly, and the data that each sensor node sends are reduced. However, data collecting and aggregated data controlling in TAG depend on time epoch, which covers all the levels of the tree. So, the depth of TAG tree is limited. Otherwise, the time epoch of one query round would be too long. In addition, sensor nodes near the tree’s root (usually base station) will consume more energy than other nodes [
19], which leads to an imbalance in energy consumption.
Basing on TAG routing tree, Wu et al. proposed a data query method called the Filter-Based Monitoring Approach (FILA) [
5]. Differing from the way of grouping data in TAG, FILA introduced a data filter to each sensor node to decrease unnecessary updates. The basic assumption is that readings of all sensor nodes will fall into a certain range, and, in most cases, the reading of a sensor node will change only in a small interval. Thus, the range of values could be divided into several intervals by a carefully planned schema, namely windows. For the nodes whose readings belong to top-
k result set, non-overlapping, distinct windows should be assigned to each of them. Conversely, other nodes could share the same windows [
5]. During the initial phase of data collection, the base station queries all readings from sensor nodes and sorts them to determine the top-
k result set. Then, the base station calculates a filter window for each sensor node and sends it all nodes. At the following data report, only the nodes whose readings change beyond the filtering window should transmit their data to the base station. Then, the base station will re-calculate the filter settings. Apparently, the mechanism of filtering window will reduce the amount of data transmitted markedly when the readings of most sensor nodes in top-
k result set do not go beyond their windows. Otherwise, sensor nodes will consume additional energy to update their filters.
Myungho Yeo et al. proposed a priority-based approach called PRIM [
20]. Their basic idea was self-sorting by the value of sensor readings for collecting readings in an orderly manner. That is, sensor nodes must determine their sequence without any centralized controls. So, the authors proposed a data-aware, priority-assignment algorithm (DPA), which took advantage of the time slots in TDMA frame. In PRIM, a routing tree is established using hierarchical clustering and routing algorithms, such as LEACH [
21] and HEED [
22]. Based on similar thought, the authors also introduced a sequence-aware, top-
k (SAT) monitoring scheme that also makes sensor nodes determine their order for the data-gathering phase [
23].
Hang et al. studied filter-based, top-
k queries and proposed three improvements, i.e., distributed top-
k queries, setting filter values, and predicting the available interval for each node by ARMA [
24]. EXTOK [
3] is another top-
k query solution, and it considers both tree topology and filtering-based query. Unlike other top-
k query methods, EXTOK addresses the case in which the exact set of top-
k values must be retrieved, regardless of how many nodes reported it. Zhang et al. [
25] studied multi-dimensional, top-
k query and used data-prediction method to establish the bi-boundary filter rule to refine data. Silberstein et al. [
26] proposed to use samples of past sensor node readings to formulate the problem of optimizing approximate top-
k queries under an energy constraint as a linear program. Threshold Join Algorithm (TJA) [
27] is an efficient, top-
k query algorithm that uses a non-uniform threshold on queried reading to minimize the redundancy of data. Babcock et al. studied the distributed, top-
k monitoring problem with a user-specified error tolerance [
28]. In their approach, first, an initial top-
k set was computed. Then, the monitored nodes were installed with arithmetic constraints, which were used to ensure the continuing accuracy of the initial top-
k set to within the user-supplied error tolerance. When a constraint is violated, it is re-computed by the coordinator (root node). Obviously, it is just an earlier version of filter-based, top-
k query method.
Inevitably, top-k query must be based on the route of wireless sensor network. So, another key point is to determine how to construct an efficient route that will satisfy energy conservation demand of top-k query.
TAG uses a tree-based routing approach. The base station first broadcasts a message to build the tree of path. Nodes that receive this message will become the children of sender and will forward the message to their neighbors. This process is repeated until every node has received the message. After message broadcast, a tree of path will be rooted at the base station. Normally, it is a shortest-path-tree. As mentioned above, in such a structure, nodes close to tree’s root will consume more energy and shorten network’s lifetime. Further, in most cases when a tree is constructed, it will have the same structure as the preceding trees.
Directed diffusion (DD) is a data-centric dissemination protocol, i.e., all communication in DD is for named data. This makes it possible to save energy by selecting good paths [
4]. The schematic for DD has three steps, i.e., (a) interest propagation; (b) setting up of the initial gradients, and (c) delivery of data along a reinforced path. These steps indicate that it is possible to deliver the named data through high-quality paths, but this requires an initial query flooded to explore paths [
29]. MADD [
29] applies DD in a multi-hop environment and uses the gradient of DD to dispatch the mobile agent (MA) [
30]. The performance of MADD in the proposed gradient-based routing scheme was better than DD in terms of energy consumption.
Sensor Protocols for Information via Negotiation (SPIN) [
31] is a family of data dissemination protocols for wireless sensor networks, including SPIN-PP, SPIN-BC, SPIN-RL, and SPIN-EC. The protocols use meta-data negotiation to eliminate the transmission of redundant data throughout a network. These negotiations ensure that nodes transmit data only when it is necessary, so they never waste energy on useless transmissions. Also, the protocols use resource-adaptation to distribute data efficiently when energy supply is limited. Because nodes are resource-aware, they reduce their activities when their resources are low to increase their longevity. SPIN-PP is used in networks for point-to-point communication media, and SPIN-BC is used in broadcast communication media. SPIN-EC and SPIN-RL are modified versions of the first two protocols.
The Low Energy Adaptive Clustering Hierarchy (LEACH) [
21] is representative of the clustering structure, which partitions the sensor nodes into several clusters according to their positions in network. During operation, LEACH reconstructs the clusters in a circle. Each process of reconstructing the clusters is described as a ‘round’, and each round begins with a set-up phase during which the clusters are organized, and this is followed by a steady-state phase in which the data are transferred from the nodes to the head of cluster and to the base station. The cluster-building process has four stages, i.e., selecting the head of cluster, broadcasting the selection result establishing the cluster, and generating the scheduling mechanism. After clustering, member nodes select and join a suitable cluster head. In the steady-state phase, each member node sends the data it senses to its cluster head. The cluster head collects and aggregates the incoming data from its member node and sends them to the base station. Apparently, LEACH can be used to generate the path of top-
k query. It can also be combined with data aggregation algorithm to reduce the amount of data to be transmitted.
Except the above common methods, there are some new ones in literature. Mo et al. combined network clustering with top-
k query and introduced cluster-based routing for top-
k querying (CRTQ) [
19]. Ref. [
32] proposed a tree structure named Partial Ordered Tree (POT) to maintain the clusters with highest readings. In POT, each node chooses a node with highest reading as its parent, and the highest node in a local area will act as the root of POT of this area. Then, a network is partitioned into many POTs, each of which has a root node. Essentially, POT is a routing tree construction method based on network clustering. It builds routing tree dynamically. The weakness of POT is that when a network has uniform readings, its performance declines evidently.
EXTOK [
3] employed DST (Dominating Set Tree) [
33] to construct routing tree and disseminate the query to all nodes. Differing from dissemination by flooding in SPT (Shortest Path Tree), the idea of DST is to ensure that ‘only the smallest possible subset of non-leaf nodes transmit the query’. DST connects nodes named dominating nodes [
34] to a tree rooted at base station. Obviously, once a DST is constructed, the number of disseminated queries will reduce evidently. But finding dominating nodes is an NP-hard problem, though ref. [
34] proposed a distributed method. Due to the complexity of DST construction, frequent reconstruction should be avoided. This results in a fixed routing tree used by EXTOK.
DAS (Data Aggregation Scheduling) [
35] introduced a data aggregation scheduling based on maximal independent sets. It consists of two phases: constructing a distributed aggregation tree and performing distributed aggregation scheduling. In the first phase, connected dominating set is employed, which is also the basis of DST. So, DAS faces the same problems as DST.
Many researchers have shown that the method of route used in top-k query is closely related to the performance of query [
3,
5,
19,
23,
24]. The same applies to the method of collecting data [
7,
15,
35].
3. Local Data Aggregation Basing on Routing Tree
3.1. Problem
Consider a wireless sensor network composed of densely deployed wearable sensor devices, each of which monitors surrounding environment and measures physical phenomena (e.g., humidity, temperature, and residual energy) at a fixed rate determined by data gatherer. Suppose there is also a base station set up for serving as the bridge between the network and data gatherer and energy is supplied continuously to the base station. But, the resources (e.g., energy, communication band) of wearable devices are strictly limited. We then focus on how sensor nodes in the network respond to a top-k query of base station in an energy-efficient way.
Let
be the sensor node set and
be the corresponding reading value set of all nodes. For convenience, here we discuss maximum problem only. The task of top-
k query can be expressed in the form,
As discussed in
Section 2, the aim of top-
k query is to reduce energy consumption used in data transmission. However, it is challenging to determine how to find
k highest readings from all sensor nodes distributed in a network. In addition to the filter-based method, distributed local data aggregation is a feasible idea. So, the issue is to make a way to achieve this kind of aggregation. In order to reach this goal, we propose a top-
query approach basing on routing tree, which includes two parts, local data aggregation and routing tree construction, as presented below.
3.2. Local Data Aggregation
Suppose
is a sub-network of sensor network
. We call
the kernel node of
if
where
is the sink node or a node in another sub-network,
is the physical distance between node
and node
, and
is the distance threshold of communication between two sensor nodes. From another perspective, if a sub-network
satisfies Equation (2),
will be considered as the kernel node of
, and all other nodes in
, named member node, can communicate with it. For example, in
Figure 1a,b, Node A is a kernel node with 4 member nodes. In
Figure 1b Node C is also a kernel node with 3 member nodes. If Node A and Node B are considered a sub-network, Node B is the kernel and Node A is the member node.
Kernel node is charged with collecting all readings from its member nodes and itself and getting
defined by Equation (1), and then send
to the sink node or a node in another sub-network, as shown in
Figure 1b. We name this way Local Data Aggregation (LDA), in which kernel node employs a variant merge sort algorithm to obtain top
k readings. The basic merge sort algorithm is as illustrated in
Figure 2a.
Figure 2b shows an example of LDA. In the example, Node A is the kernel node of a sub-network illustrated by red dashes circle. When Node A obtains all readings of the sub-network, it will calculate local top
k by merge sort algorithm. And then Node A sends these
k readings to its parent Node B which is the kernel node of another sub-network. Comparing with the whole network,
has relatively lesser nodes obviously. In addition, a kernel node, such as Node C in
Figure 2b, will receive no more than
k readings from any of other kernel nodes. So, less data need to be sorted. Because kernel node uploads at most
sorted readings rather than all that it received, the traffic in network and sorting computations can be significantly reduced, e.g., Node C in
Figure 2b. Further, unlike uploading message individually, up to
k readings can be uploaded within one message, which means that every node must send only one message to the kernel node it connects to in a query round. It is important to balance the energy consumption of all sensor nodes in a network.
The complexities of space and calculation of LDA algorithm can be reduced further when messages are processed one by one as opposed to processing them all at the same time. In the worst case, k comparisons are made to aggregate a new reading. Because the stored list of values is in an ordered state at all times, the worst case is rare.
Obviously, the aggregation process is similar to the last merge step of the merge-sort algorithm [
36]. So, its time complexity and space complexity are all O(
k). If a kernel node has
m child nodes, the time complexity will be O(
mk) in the worst case performance, and the space complexity will still be O(
k). LDA has the following characteristics, (1) distribute the sorting tasks into multiple nodes, which is beneficial for the balance of energy consumption; (2) restrict the maximal number of data in a message to
k; (3) simplify the calculation of each node.
In most applications of wireless sensor networks, the parameter k is small, and kernel node has only a few child nodes. So, the values of the readings can be aggregated quickly and efficiently through the LDA approach proposed in this section. Further, local aggregation of LDA allows most data to be sent only once and installs multi-data into one message, which would cause a considerable reduction of energy for sending data.
3.3. Top-k Query Based on a Tree
As shown in
Figure 3, it is assumed that all of the sensor nodes in a network communicate with the base station through a routing tree. The construction of routing tree will be discussed in the next section. After the routing tree constructed, each node records its children nodes and the kernel node it connects to. Note that each sensor node must connect to one and only one kernel node. The number of children nodes in a sub-network depends on the topology of the network, and, if a sensor node has no children, it is considered as leaf-node. Correspondingly, a node that has children is kernel node.
Initially, the base station broadcasts a query message into the network, and all of the sensor nodes must respond to the base station by their local readings. Every leaf-node, i.e., the border node, embeds its readings into a message and uploads the message to the kernel node it connects to in the routing tree.
The responsibilities of kernel node are slightly more complex than other node. When receiving a message from a child , first, the kernel node stores the message and then deletes from the list of its children. If the list of its children is not empty, the kernel node must store the message in memory and wait until another message is received. If the list is empty, which means all of the children have uploaded their readings, the kernel node begins to aggregate all of the messages that have been received by using LDA. However, if a kernel node waits for a time beyond a preset threshold, it also begins to aggregate the messages. Note that the aggregation process could be conducted along with the arrival of the message. We call this Routing Tree based Local Data Aggregate process LDA-RT.
3.4. Routing Tree Construction
To avoid unbalanced energy consumption resulting from fixed route, we introduce a method to construct dynamic route on demand, which is composed of two procedures, i.e., Interest Forwarding and Path Establishment. The two procedures are both distributed processes that occur at every sensor node. Therefore, they are inseparably intertwined from the view of the entire network.
Assume that each of the sensor nodes in a network has a unique, assigned ID that is used to distinguish them from each other. At the beginning of a query task, first, the base station injects an interest message into the network. For the purpose of constructing routing tree, the base station uses limited power to broadcast the interest message. So, only the sensor nodes, that coverage the base station, can receive the interest message. The message is a query or an interrogation, which specifies what a user wants.
The message is a quintuple composed of qid, sid, lts, rpw and k. Among them, qid is the unique identifier of this interest message, which is generated by base station and does not change in the current query round. Term sid represents the ID of the node that sends this message. At the moment of message generating, base station ID is put into sid. Term lts is the local timestamp of the message sender, and rpw the remaining energy, which is useless when the message sender is base station. Term k is the number of readings expected to retrieve in this query round.
Consider a sensor node . When receiving an interest message M, goes into the Interest Forwarding procedure and decides whether to forward the message to its neighbors according to M.qid. If a received message is with the qid that has been received and stored in cache, the message will not be re-sent. Otherwise, it should be sent to all neighbors of after updated, i.e., .ID ⇒ M.sid. The rule indicates that a sensor node must forward the interest message only once in a query round.
In addition to disposing of interest message, the sensor node
begins to run the procedure of
Path Establishment. In most routing solutions of wireless sensor network, hops between nodes are used as the metric of path length. And meanwhile, the node will be chosen as the next point in the shortest path if the message it forwarded arrives first. This will result in the same path’s being constructed at all times, especially when there are only a few sensor nodes in a network. Assuming that all sensor nodes in a network can keep time synchronization [
37], the timestamp of the message can be used as the basis for selecting the next point. It reflects the current load of a sensor node, and a node with a light load should be chosen. But in a practical sense, each sensor node may stay in different status at a given time, i.e., different local time, different time drift, and different amount of remaining energy. All of these will affect the synchronization of time and sending messages. For utilizing the timestamp to balance energy assumption, we use the idea of ‘happened-before’ to select the node that should appear in the routing path.
The relationship of ‘happened-before’ with the set of events of a system (denoted: →) must satisfy three conditions [
17]:
- (1)
If a and b are events in the same process and a comes before b, then a→b.
- (2)
If a is the event of sending a message by one process and b is the event of receiving the same message by another process, then a→b.
- (3)
If a→b and b→c, then a→c. Two distinct events a and b are said to be concurrent events if a↛b and b↛a.
Making use of the ‘happened-before’ principle, we propose the following method for constructing the routing path. The basic idea is to choose a node with more residual energy and shorter path to base station as the kernel node. Here the shorter path is not in ‘hop’ sense but in ‘time’ sense, which would carry more information favoring the balance of energy consumption, such as the current load status of nodes.
Consider that node has a local time .lts generated by its local clock. When receiving an interest message , node first checks whether a piece of interest message with the same qid as M.qid has been stored in the interest message cache. If there is no matching interest message in the cache, node should write message to its cache and choose the node denoted by M.sid as its kernel node. It also should inform the kernel node after making the choice. Then, it updates its local time .lts by . In this case, node should forward message M with updated fields, i.e., .ID ⇒ M.sid, .lts ⇒ M.lts and .rpw ⇒ M.rpw.
However, if a matching interest message M’ exists in the interest cache, the new received interest message M will not be broadcast to other sensor nodes. But, the sensor node should also compare the message time M.lts with M’.lts. If M’.lts > M.lts, or M’.lts ≤ M.lts and M’.rpw < M.rpw, sensor node changes its kernel node to the sensor node with M.sid and notes the change to both its previous kernel node and its new kernel. It should also update its local time .lts according to the rule .
After an interest message is forwarded to the entire network, a routing tree can be built. Note the following properties associated with the procedure of building routing tree:
In a query round, each sensor node in the network forwards the interest message only once, no matter how many time it receives the message.
During the process of constructing routing tree, the kernel node that a sensor node connects to can be updated. But the change is limited to the node itself and does not affect its children nodes.
Each sensor node forwards interest message immediately after receiving it. This benefits the fast contribution of the routing tree.
The interest messages that are forwarded by different nodes are allowed to arrive at a sensor node asynchronously. And the ‘happened-before’ mechanism makes it possible to build a dynamic routing path, which is an approximate shortest path and is helpful for accommodating dynamic network topology, balancing the energy consumption and synchronizing the time clocks of the sensor nodes.
Ideally, after some query rounds, all nodes will have similar local time. And then, remaining energy of nodes will become the deciding factor of routing path.
In general, reconstructing routing tree will consume more energy. But for a sensor network, interest messages should be sent to each node inevitably. Even in filter-based top- query approach, the configuration about new window size still need to be sent, whenever the measured data exceed the range of the given window. On the other hand, without reconstruction, fixed routing would make some node loss their effectiveness early. The proposed method focuses on build a dynamic routing tree for data collection. And the above properties would reduce the energy consumption of routing tree construction as far as possible.