1. Introduction
As the perceptual layer of the Internet of Things (IoT) [
1,
2], wireless sensor networks (WSNs) [
3] are widely deployed for purposes such as environment monitoring [
4], industry automation [
5] and military reconnaissance [
6]. WSNs consist of many sensors and play a key role in sensing and gathering data from the surrounding environment. Because of harsh environments and energy-limited nodes, there are two key considerations in WSNs design: reliability and energy efficiency. In addition, nodes that are closer to the sink require more forwarding tasks than others, resulting in higher energy consumption as well as a reduction in the lifetime of the entire network.
Compressive sensing (CS) theory [
7,
8] provides a new method for reducing communication energy consumption. CS points out that, for the compressible signals in WSNs, a small collection of linear projections is sufficient to achieve near-perfect reconstruction, which reduces energy consumption and prolongs network lifetime. Thus, a considerable amount of research has been conducted concerning ways to utilize CS to gather data in WSNs. The CS-based data-gathering schemes in [
9,
10,
11] obtained the member node readings utilizing fixed routing, in which ordinary nodes forward compressed data to the static sink node through multi-hops. Lou et al. [
9] and Lou et al. [
10] combined CS and routing protocols to reduce the number of transmissions. In [
11,
12,
13,
14,
15], the use of sparse measurement matrix is investigated to reduce the number of nodes involved in data gathering. Introducing CS effectively reduces the energy required for communication and distributes energy consumption loads more evenly. However, if a parent node (which holds a combination of child node readings) loses its packet, then all the information from the child nodes is also lost. Hence, unreliable links have a serious impact on data gathering and make it difficult to reliable gather data reliably through a centralized sink node. Additionally, Kong et al. [
16] reported that unreliable links are widespread in WSNs, where the average packet loss rate is 40–50%. Thus, assuming completely reliable links is unfeasible and oversimplifies the problem.
To resolve this problem, distributed data storage (DDS) [
17,
18,
19] is proposed to enable reliable data gathering by employing redundancy. In contrast to a centralized sink, a mobile sink collects data from a small subset of the total nodes to recover all the data. It is worth mentioning that DDS effectively reduces the impact of packet loss on data gathering because there is no static routing, although few researchers have focused on this advantage. However, DDS requires a large number of transmission tasks to ensure sufficient redundancy, which is potentially catastrophic for nodes with energy limitations. Thus, it is imperative to investigate effective ways to apply DDS for data gathering with the dual purposes of resisting packet loss and reducing the number of transmissions.
To address this problem, many studies have been carried out on this topic. In [
20,
21,
22], CS is combined with DDS to exploit the advantages of both technologies. The goal of Talari et al. [
20] was to reduce the number of transmissions by exploiting the spatial correlations of nodes based on CS with the broadcast properties of wireless channels. In this scheme, the nodes store received data and broadcast the data with a given probability. The performance of data reconstruction was further improved in [
21]. Yang et al. [
21] found that the number of receptions was higher than the number of transmissions. Hence, Yang et al. [
21] focused on reducing the total number of both transmissions and receptions simultaneously. In [
22], both the spatial and temporal correlations of nodes are exploited to reduce the number of transmissions. All the above studies take advantage of broadcast routing and consider how to reduce the transmission energy cost. However, compared with fixed routing, such as tree routing and cluster routing, broadcasting data consumes more reception energy because neighboring nodes receive broadcast data whether they need it or not. For example, in [
20,
21,
22], the neighboring nodes first receive the broadcasting data and then determine whether to merge the data based on certain conditions. Consequently, broadcasting data consumes large amount of reception energy, although the received data are rarely merged. Furthermore, none of these studies consider the problem of packet loss; instead, they make the unrealistic assumption that the wireless links are completely reliable.
Tackling the abovementioned consideration, two challenges must be resolved. The first involves how to effectively reduce the quantity of data disseminated (transmissions and receptions), especially the number of receptions rather than the number of fusions. The second problem is related to reducing the impact of lossy links (namely, the packet loss rate) on data reconstruction. To solve these two challenges, a distributed data storage and gathering algorithm based on compressive sensing (CS-DDSG) is proposed utilizing CS and DDS. Relying on collected data, the mobile sink generates a sparse measurement matrix aimed at reducing communication energy consumption. Furthermore, it is proven that the measurement matrix satisfies the restricted isometry property (RIP) [
23]. Based on random geometric graph theory, an expression of the total number of transmissions and receptions is formulated to analyze the energy consumption of CS-DDSG.
The reminder of this paper is organized as follows. In
Section 2, we commence by reviewing the CS theory and introduce the network model. In
Section 3, we present the proposed CS-DDSG algorithm, describe the formulation of the measurement matrix and provide a proof that this matrix can satisfy RIP. Based on the proposed scheme, we formulate the expression of the total number of transmissions and receptions in
Section 4. We present our simulations and their results and investigate the performance of CS-DDSC in
Section 5. Finally, concluding remarks are provided in
Section 6.
4. Formulating the Expression of the Total Number of Transmissions and Receptions
Compared with the mainstream algorithms [
15,
20,
21], the proposed scheme CS-DDSG reduces the number of transmissions and receptions rather than the number of fusions. In this section, we formulate the total number of transmissions
and receptions
based on the random geometric graph (RGG) mode [
26] and the torus convention [
27] to investigate the efficiency in reducing
and
.
According to
Section 3,
and
can be expressed as follows:
where
and
denote the number of transmitting and reception nodes in Stage 2, respectively.
and
denote the number of transmitting and receiving nodes in Stage 3, respectively. Similarly,
and
represent the number of transmitting and receiving nodes in the
forwarding of Stage 3, respectively.
denotes the number of forwarding iterations. In Stage 2,
nodes are selected to broadcast, thus
. Because the receiving nodes in Stage 3 forward their packet with the probability
,
and
satisfy the following:
When , no node forwards packets and the forwarding process is completed. Thus, , . Additionally, and . Next, we formulate the expression of , and .
4.1. Formulating
Proposition 1. The number of receptions in Stage 2 is: Proof. According to the procedures of Stage 2,
equals the number of neighboring nodes around all the source nodes
minus the number of nodes
located in the overlapping communication region of the two sources nodes. This relation occurs because each node receives just one packet and the number of receptions for those nodes is counted twice, thus
can be represented as follows:
The average number of neighboring nodes for all source nodes
is expressed as follows:
In
Figure 5, the red circle denotes the communication region and
represents the shaded area jointly covered by the two source nodes. A and B are two intersections. When the distance between two source nodes
satisfies
,
exists. Thus, the probability
of an existing communication between the two nodes is expressed as follows:
In the
source nodes, an average of
nodes pairs satisfy the condition in Equation (26) (i.e.,
source nodes pairs can communicate with each other). The expressions for
and
are, respectively, as follows:
Because the nodes are uniformly distributed and , the probability is equal to
Thus, the probability density function (PDF) is
In this case, the area
equals the area of sector OAB minus the area of triangle OAB:
Thus, the expected area of
is calculated as follows:
Combining Equations (27), (28) and (32),
can be formulated as:
Finally, we substitute Equations (25) and (33) into Equation (24), to obtain the representation of
:
4.2. Formulating
Figure 6 shows the forwarding procedure of Stage 3, where
denotes the transmitting node in the
forwarding; its communication range is represented by the black circle. Node
broadcasts its packet in the
forwarding process. Because of the reception conditions, the nodes located in area
can receive the forwarded packet broadcast by
. Let
denotes the number of receiving nodes in area
.
Besides, there are two situations should be considered:
Case 1: As presented in
Figure 7, there are two broadcasting nodes
and
in the
forwarding, while their communication ranges are represented by the two red circles. This case can be divided into two situations via the distance
: (a)
; and (b)
. Taking the first situation as an example, if node
is located in the black area
, the nodes in the shadow area
can receive packets from nodes
or
, thus the number of receptions of those nodes is counted twice, and that value should be subtracted from
. Suppose that the number of receiving nodes in areas such as
is
and that the number in areas such as
is
.
Case 2: Similarly, there are two transmitting nodes
and
in the
forwarding, whose communication ranges are represented by two black circles in
Figure 8. This case can be divided into two situations via the distance
: (a)
; and (b)
. Taking the first situation as an example, when node
is distributed in the black area
, the nodes located in shadow area
receive one of the packets broadcasted by node
or
. Thus, the number of receptions for those nodes is counted twice, which should be subtracted from
. Suppose that the number of reception nodes in areas such as
is
and the number in areas such as
is
.
In conclusion, the number of receptions
for Stage 3 in the
forwarding can be expressed as follows:
Next, we formulate the expression of and .
4.2.1. Calculating
As shown in
Figure 6, the nodes in shadow area
would receive the packet. Thus,
is calculated as follows:
In the above formula,
can be expressed as follows:
Because the nodes are uniformly distributed and , the probability equals
Thus, the PDF is
Combining Equations (31), (37) and (39), we obtain
Thus, we obtain
4.2.2. Calculating
Next, we formulate the expression of
. As presented in
Figure 7a, the value of
is the number of receive node in area
, thus we have
where
denotes the expected area of the black region,
denotes the expected area of the shadow region, and
denotes the probability that the distance between two nodes satisfies
. Thus, we have the following:
As shown in
Figure 7a, the area
equals twice the area of region ACD minus the half intersection area of circle A and B, i.e.,
, plus the half intersection area of circle
and
, i.e.,
. The area of region ACD equals to the area of sector ACD plus the area of sector OAC minus the area of triangle OCA; thus,
Combining Equations (39) and (46), we obtain
where
. According to the method in [
28], we can get the approximate value of
, i.e.,
. Finally, combining Equations (43), (44) and (47), we obtain
4.2.3. Calculating
The expression of
is similar to that of
:
As shown in
Figure 7b, because
,
is calculated as follows:
where
, and
is expressed as
where
is the intersection area of circle
and circle
when
, thus
and
Combining Equations (49), (50) and (53), we obtain
4.2.4. Calculating and
As illustrated in
Figure 8, the two black circles denote the communication range of two transmitting nodes,
and
, in the
forwarding. The red circle denotes the communication range of transmission node
in the
forwarding. The calculation of
and
is similar to that of
and
:
where
and
denote the area of the black region and
and
denote the area of the shadow region. Compared with
Figure 7 and
Figure 8, we have the following:
Thus, the expressions of
and
are:
In conclusion, by combining Equations (35), (42), (48), (54), (58) and (59), we can obtain the expression of
:
4.3. The Formulation of and
Theorem 1. Assume that all sensor nodes are deployed randomly and uniformly in a distributed WSNs with a boundary length of 1, and each node has a transmission range of. If we gather data based on CS-DDSG scheme, thenandare, respectively, expressed as follows:where,and, wheresatisfies. The expression foris given in Equation (34).
Proof. As presented in the above derivation, we can obviously obtain Equation (61) based on Equation (21) and the correlative description in Stage 2 of CS-DDSG. Furthermore, by combining Equations (21), (34) and (60), we can obtain the expression of Equation (62). □
5. Performance Evaluation and Analysis
To evaluate the effectiveness of CS-DDSG, we ran simulations in MATLAB 2012b. The simulation parameters were set as shown in
Table 1. Furthermore, we adopted the FFT orthonormal basis and the orthogonal matching pursuit (OMP) method for the reconstruction algorithm. We used the real sensor readings extracted from the GreenOrbs [
29] system.
In this paper, we present the performance comparations of CS-DDSG, Compressive Sensing Data storage (CStorage) [
20], Improved CStorage (ICStorage) [
21], Compressed Network Coding based Distributed data Storage (CNCDS) [
21] and Direct Cluster-Based Compressive Sensing Data Collection (DCCS) [
15] on unreliable links. These first four schemes all combine DDS and CS to gather data. CStorage, ICStorage and CNCDS are concerned with reducing the number of transmission and fusions. In CStorage, intermediate nodes receive the broadcasting packets when they first receive, and then, they forward the received packet with a given probability. The intermediate nodes in ICStorage forward their own readings rather than the received source nodes readings. In the CNCDS scheme, the intermediate nodes receive broadcast packets only if the receiving node does not share any node IDs with the corresponding transmitting node. We also analyze the numbers of transmissions, receptions and fusions involved in the first four algorithms. DCCS combines CS and cluster topology to reduce the total power consumption with no consideration of packet loss rate. All member nodes gather data and transmit to cluster heads, where the CS measurements and measurement matrices are generated and send to sink directly. Additionally, we discuss the impact of packet loss rate, the number of measurements and the proportion of source nodes on the performance of CS-DDSG. The simulation results shown are the average values from 1000 runs.
First, we evaluate the performance on unreliable links when
, as shown in
Figure 9. It can be seen that: (1) As
increases, the reconstruction accuracy of all the algorithms decreases in
Figure 9a. When
, the NMAEs of the four algorithms are stable and increase gradually, which indicates that CS-DDSG is effective at resisting the packet loss. Although the packet loss rate impacts the nodes receiving broadcasting packets, the sink still gathers enough packets to recover the data. In addition, the sink constructs the measurement matrix based on received packets, which avoids the need to measure the lost nodes and reduces the impact of unreliable links on measurement vector
. However, the performance of DCCS is poor with an increase in
. Sink cannot find the lossy nodes and still reconstructs data based on the original measurement matrices. Thus, DCCS is sensitive to
. (2) CS-DDSG outperforms the other algorithms. This improved performance occurs because in CS-DDSG, nodes receive only one packet which is broadcasted by its neighbor nodes in CS-DDSG. Thus, the measurement vectors have the characteristic of strong spatial correlation, which is utilized by CS to recover the data. However, in the other algorithms, nodes would fuse packets from distant nodes as long as the receipt condition is satisfied, which leads to a weak spatial correlation of measurement vectors. Thus, CS-DDSG outperforms the other algorithms.
We present the total number of transmissions, receptions and fusions of the four algorithms in
Figure 10 when
and
. CS-DDSG requires fewer transmissions, receptions and fusions than do the CNCDS, CStorage and ICStorage schemes. This is because nodes in CS-DDSG receive packets only the first time and broadcast their packets with the probability
, after which then they do not receive any data. However, for CNCDS, CStorage and ICStorage, nodes continue to receive packets as long as the reception condition is satisfied. CStorage and ICStorage in particular focus on reducing the number of transmissions. Moreover, compared with CNCDS, CStorage and ICStorage, CS-DDSG scheme reduces
by up to 23.9%, 42.5% and 67.8%, respectively, and reduces
by up to 73.8%, 80.2% and 89.9%, respectively.
Furthermore, we investigate the fusion proportion of the total number of receptions. As presented in
Figure 11, only 41% of the receiving nodes in CNCDS merge the received packets; the authors consider that only 41% of nodes lose energy. In fact, 59% of the receiving nodes also consume energy because they would receive the broadcast packet first and then determine whether the condition of CNCDS are satisfied; the received packets will be merged only if they satisfy the condition. Thus, energy is consumed even when the received packets are not fused. However, the number of receptions in [
21] is the same as the number of fusions, which is less counted. Similarly, 46% and 48% of the receiving reception nodes in CStorage and ICStorage merge the packets, respectively. In CS-DDSG, all received nodes are fused and no redundancy occurs because the nodes receive packets only once. Thus, the energy consumption of CS-DDSG receiving nodes is much smaller than that of the other algorithms. In conclusion, CS-DDSG effectively reduces both the number of transmissions and receptions.
Figure 12 presents the number of fusions and receiving nodes during each forwarding round when
. The forwarding process of CS-DDSG repeats five times, until no node remains to accept the broadcast packets, while CNCDS, CStorage and ICStorage repeat six, nine and twelve times, respectively. The network employing CS-DDSG has the fastest convergence and characteristics of efficiency due to the strictest reception conditions. Moreover, most of the data fusion occurs during Stage 2, and subsequently the number of fusions rapidly decreases in Stage 3 except in ICStorage.
In
Figure 13, we investigate the recovery performance of the algorithms when
and the number of measurements
, which is queried by the mobile sink, ranges from 15 to 150. It can be observed that, with an increase in
, the recovery accuracy of ICStorage, CStorage, CNCDS and CS-DDSG are improved and equivalent, while the performance of CS-DDSG becomes slightly better when
. This improvement occurs because the more information that is gathered, the better is the reconstruction accuracy. According to Equation (12), the sink constructs measurement matrix
based on the packets fused by the forwarding nodes. The forwarding nodes of CS-DDSG receive only one packet, and the
is sparser than that in the others algorithms. Consequently, less information is gathered and fewer nodes contribute to data recovery for CS-DDSG. However, with an increase in
, more information is gathered and the gaps separating the four algorithms decrease. When
, CS-DDSG outperforms the four DDS-based algorithms due to the strong spatial correlation of the measurement vector. Moreover, the reconstruction accuracy of DCCS is the best when
is large enough and there is no packet loss. All nodes in DCCS participate in gathering data and DCCS adopts dense measurement matrix in clusters. Thus, more information is gathered. In addition, performance tends to be stable as
increases.
Figure 14 shows the performance of CS-DDSG under different packet loss ratios
and probabilities
when
. As
increases, the value of NMAE remains stable, i.e.,
. This result indicates that CS-DDSG effectively resists the packet loss and maintains high reconstruction accuracy even when unreliable links exist. Additionally, its accuracy is not influenced by
due to the very sparse measurement matrix.
Finally, we investigate how the proportion of source nodes
impact the recovery accuracy in
Figure 15. The simulation results show the following: (1) When
, the value of NMAE decreases as
increases because more nodes participate in data reconstruction as
increases. (2) When
is fixed, CS-DDSG performance is improved and the trend of NMAE values is very close to the value of
, varying from 0 to 0.6. This effect occurs because, when there are more source nodes, more nodes will receive broadcast packets before the sink obtains data. Hence, the amount of information used for reconstruction increases. However, due to the reception condition, the measurement matrix
is sparse. Thus, information is increasingly limited. As a result, the trends of the NMAE values are close to the different value of
.