1. Introduction
As we all know, wireless sensor networks (WSNs) are currently widely employed in various fields, especially in military reconnaissance [
1], drone monitoring [
2], industrial control [
3], transportation [
4], etc., to collect various physical or environmental data. Sometimes, WSN is deployed in places that humans cannot reach or in enemy areas. Distributed detection is a common decision-making method in WSN, which means that multiple sensors are deployed in a target area to measure the physical state of a target in a decentralized manner. Each sensor makes a local decision based on its observations, and then it sends the decision to the Ally Fusion Center (AFC) through a wireless channel. The AFC utilizes all the received sensor data to make the final decision with regard to the target. However, as the sensor data are transmitted in a broadcast method, the Enemy Fusion Center (EFC) in the network can easily eavesdrop on the data, which is a serious threat to the confidentiality of the data transmissions in WSN. In addition, WSN is usually battery-powered to maintain its work. Therefore, the traditional encryption methods, which rely on complex calculations, are unsuitable for WSN. To extend the duration of sensor nodes and ensure the confidentiality of the data transmission, it is important to find a lightweight and safe data transmission scheme for WSN.
Faced with the above challenges, several lightweight encryption schemes have been proposed by researchers. Khashan et al. [
5] proposed an automated and lightweight encryption scheme using a dynamic clustering technique which supports mobility in WSN, and it dynamically controls the complexity of the encryption process based on the currently limited resources. Cao et al. [
6] proposed an improved identity-based encryption algorithm that lies between the traditional public-key encryption and identity-based public tweezers’ encryption, effectively simplifying the key generation process. In [
7], the scheme first forwards all traffic packets in a disordered manner, using different network paths and protocols, and then distributes the traffic packets from one stream to another based on different encryption schemes. Usually, these schemes that rely on traditional encryptions to secure data confidentiality still face the computational burden of encryption and decryption.
On the other hand, several novel lightweight efficient schemes have been proposed. A physical layer-based security framework was proposed by Bashar et al. [
8]. Assuming that the transmission channel of WSN is exposed to generalized K-fading, the framework uses cumulative distribution, optimal sensor, and round-robin scheduling schemes to reduce the probability of being eavesdropped. In [
9], the authors designed a secure transmission scheme based on artificial noise. The transmitter designs the noise precoding matrix based on the known channel state information of the legitimate channel. The receiver can calculate the transmitted data from the probability distribution of channel state information, but the eavesdropper cannot. In [
10], Jeon et al. proposed to explore the randomness of the wireless channel gains to encrypt the sensors’ local measurements with random flipping of the measured data. However, the scheme is only suitable for ideal situations, such that the instantaneous channel gains of sensors to the AFC all conform to the expected Rayleigh distribution, which is difficult to achieve in the real time-varying physical channels. In [
11], Zhang et al. improved Jeon’s work. They optimized the flipping thresholds and let some sensors stay inactive to save transmission energy according to the sensors’ local detection confidence levels. In [
12], Chen et al. designed an immune-based differential evolution algorithm to find the optimal rollover rate to minimize the fusion error of AFC while maintaining the fusion error of EFC at a high level. In [
13], Yaacoub et al. proposed a clustering algorithm for grouping sensor nodes into cooperative clusters, and the sensor sends data to the parent node with minimal energy consumption and randomly flips bits based on channel state information to reduce the energy consumption and transmission time of the sensor data. In [
14], the security transmission scheme is extended to a scenario with multiple quantization scales and candidate states. Currently, most of the schemes are designed based on the instantaneous channel state conformed to an expected ideal distribution. If the statistic of the channel state changes from time to time, it is very difficult to meet the required conditions, which seriously restricts the applications of these security models.
In this paper, we propose a secure data transmission scheme based on information hiding and simple data flipping. The basic idea is to first hide the local measurements into a customized binary string based on the principle of encoding hiding [
15]. Then, the sensors and the AFC generate a random number sequence synchronously based on the output of a pre-deployed pseudo-random function, which determines the data flipping. Finally, the flipped binary strings along with error correction codes are sent to the AFC, which can recover the flipped strings and then extract the sensor observations from the strings. Our scheme has a simpler calculation process than traditional encryption and decryption algorithms, and it is more suitable for resource-constrained WSN.
The contributions of this paper can be summarized as follows:
(1) The proposed scheme has the anti-interference ability, and it can resist the change of channel environment and malicious attacks of EFC.
(2) The scheme is based on information hiding, which increases the data confidentiality, such that as long as one bit of the binary string is not correctly recovered, the EFC fails to withdraw the true measurements coded in the string.
(3) The energy consumption is lower than that of the traditional encryption method. Traditional encryption algorithms are relayed on complex key distribution, S-box confusion, and multiple rounds of iterative calculations. On the contrary, our scheme only needs simple data flipping, which is much more efficient.
(4) Groups of comparisons are carried to test the confidentiality and efficiency of our method in both simulations and real environments.
The rest of this article is organized as follows.
Section 2 gives the system model.
Section 3 gives a safety analysis. The computational complexity is discussed in
Section 4.
Section 5 is simulation experiments and physical experiments.
Section 6 is the power consumption test. Conclusions are drawn in
Section 7.
2. The Description of the Security Model
2.1. System Model
N sensor nodes are deployed in a target area to monitor a common physical state in a distributed manner. The measurement of each sensor is quantized into binaries (local decision ) based on the measurement and a decision threshold. If the measurement is less than the threshold, then , otherwise . Let and denote the local detection rate and false alarm rate of the -th sensor, respectively. Usually, and can be known in advance both by the AFC and EFC.
The framework of the system discussed in this paper is shown in
Figure 1. The
sensors are divided into
groups and each group comprises
sensors. The
decision bits from the corresponding
k sensors are converted into one decimal number,
. Then,
is embedded in a customized binary string
as hidden information through the F5 algorithm [
16]. After that,
is flipped according to the
-bit output sequence
of a pseudo-random function
and threshold
.
is assigned to the flipping group if
is in the flipping domain, otherwise, it is assigned to the non-flipping group. Generally, the output sequence of
is decided by its initial
. The flipped result
of each group is sent to AFC along with its error correction code (ECC). In this paper, forward error correction (FEC) is employed to ensure that AFC can correct limited errors in the received data
X. Finally, AFC recovers the flipped binary string, then extracts the embedded information
from the string, and obtains the real state of the target state by data fusion. The main process of the proposed method is shown in
Figure 2.
2.2. Information Hiding and Recovery
Before data transmission, the measurements of each sensor are hidden into a binary string, which is called carrier in traditional digital steganography [
16]. The steps for embedding the message are as follows:
Step 1 sensors simultaneously observe the common target state and generate binary observations and is converted to a decimal number .
Step 2 Randomly generate an -bits binary carrier .
Step 3 Construct a hash function
using successive XOR ‘
’ operations on
,
, which is shown in Equation (1).
Step 4 Calculate the position
of the binary bit to be flipped in
,
, as shown in Equation (2).
where
x is the secret to be hidden in
a.
Step 5 Modify
according to
, as shown in Equation (3).
where
denotes the inversion of the
-th bit in
.
Suppose the carrier embedding before and after is
and
respectively, and the Hamming distance between
and
is employed to represent the change caused by the modification of the carrier, which is shown in Equation (4).
The triple
denoted that there are
modifiable positions in
to embed
-bit secret messages. Usually, F5 [
16] requires that at most one bit of
a is modified to get
a’, such that
. When we withdraw the encoded secret
from
a’, the hash function
is calculated, and
.
Equation (5) is a derivation to prove that the hash function satisfies
.
Usually, and satisfy . It is well known that a -bit binary string has different permutations and can represent different numbers from 0 to . Since the F5 algorithm requires the carrier to be modified by no more than 1 bit, the unmodified state of the carrier represents 0. Therefore, the carrier can represent the remaining numbers when its length is .
Accordingly, the carrier modification density
and the embedding rate
can be calculated in Equations (6) and (7).
represents how much information is hidden in a unit carrier. The lower the
, the lower the carrier utilization and the higher the security. With the same embedding capacity, a low embedding rate will use more carriers than a high embedding rate, and therefore the computational effort and power consumption of the sensor will increase.
According to
and
, the embedding efficiency
can be defined in Equation (8).
indicates how many bits can be embedded for each change of
, which is approximately equal to
. With the same embedding capacity, the higher the
, the less the modification to the carrier will be, but the computational load will increase instead. Moreover, whenever one bit of the carrier intercepted by the adversary is incorrect, a completely different k-bit data will be recovered. Therefore, the security decreases as
increases.
Table 1 shows that the carrier data embedding rate
decreases with the increase of embedding efficiency
. Therefore, to realize a balance between embedding rate and embedding efficiency, it is better to choose a shorter ciphertext length
.
2.3. Flipping Encryption
In the previous section, the outputs of
sensors were embedded in the
-bit string
. A pseudo-random function
is previously deployed at both the sensor nodes and AFC. The output of the function determines which bits of
will be flipped. If the
-th output
of
falls within the flipping interval, then
is flipped, otherwise, it is not flipped. As
is a pseudo-random function, if the AFC and the sensor have the same initial
, the same random sequence can be generated synchronously at both sides. The
can be determined by channel state information or received signal strength indicator (RSSI). Due to the channel independence, the EFC can only know the channel state from itself to the sensors, but nothing about the channel state from the AFC to the sensors because of the physical independence. Before the data transmission at each duplex time, the AFC first sends pilot signals
to the sensor, where
. The sensor sends a local decision to the AFC after receiving the pilot channel. In this way, the AFC and the sensor initialize the same seed with RSSI. The sensor generates a new
each time it receives the pilot signal, and the AFC generates the same φ after receiving the sensor signal. The pseudo-random function generates the results in the way shown in Equation (9).
refers to the
-th transmission of sensor data in a duplex time. The first
is generated with RSSI as the
, after which the previous
is used as the
to generate
.
The encrypted data of are denoted by , where . If , will be put into the non-flipping group, that is, . If , will be put into the flipping group, that is, , that is .
is manually set to obey a uniform distribution. For instance, we use the
function in MATLAB to generate random numbers that are uniformly distributed in the interval (0,1) in the simulation. Its probability density function is
, therefore, its cumulative distribution function is shown in Equation (10).
We use
to denote the probability that
is flipped. If the flipping probability is
, then the probability of not flipping is
. The definition of
is shown in Equation (11).
Then, we encode with the BCH error correction code (ECC) and send to the AFC. In the proposed scheme, the AFC might not recover correctly if there is channel noise. As long as one bit is wrong, completely incorrect sensor data will be recovered. Error correction codes can increase the robustness and improve the probability of correct data recovery. BCH code is a classical and effective forward error correction code that corrects limited errors immediately as the data are received. For example, the (7,4,3) BCH code used in the experiment, which encodes 4 bits of data into 7 bits, can correct 1 bit of error.
Due to the broadcast nature of WSN, the EFC can eavesdrop on messages that AFC can receive. However, it cannot distinguish between flipped data and unflipped data, let alone extract the original sensor outputs embedded in it. Therefore, the EFC is completely unable to use the information it eavesdrops on.
2.4. The Fusion Result of the AFC
In this section, the fusion result of the AFC based on the log-likelihood ratio (LLR) is analyzed. Assume that the output vectors of
sensors received by AFC are
, and according to [
17] the LLR-based fusion rule at the AFC is given by Equation (12):
The main purpose of multi-sensor fusion decision is to obtain detection performance that is not achievable by any single sensor by fusing the detection results of multiple sensors. The fusion results are compared with the decision threshold, which affects the accuracy of the fused sensor data. How to find the optimal threshold is a problem that many researchers are exploring, but it is not the focus of this paper. Therefore, in the experiments of this paper, the decision threshold is set to 0. If is greater than the decision threshold, the fusion decision result is , otherwise, it is . is the conditional probability density function of the sensor. The sensors’ measurements are assumed to be independent and identically distributed (i.i.d.) under .
In
Section 2.2 and
Section 2.3,
sensors have been divided into
groups. The
sensor outputs of each group have been evenly embedded in the binary string
, which are then flipped and encrypted into the binary string
. After receiving the data, the AFC first corrects errors in the transmission by FEC and then recovers
via Equation (13).
AFC can easily recover
and calculate
.
can be converted to the binary string
. In the same way, we can recover the sensor output of each group and get the output vector of
sensors
. Let
and
then the fusion rule can be rewritten as Equation (14):
where
Therefore, the final fusion rule is shown in Equation (17).
If is greater than the decision threshold, the fusion decision result is , otherwise, it is . Note that this approximation can be regarded as a modified version of the Chair–Varshney fusion rule, which utilizes only the local detection rate and false alarm rate.
3. Security Analysis
In this section, we analyze the security of the scheme by discussing whether the EFC can obtain useful information from the eavesdropped data. The data eavesdropped by EFC from the sensor nodes are
. Since the EFC does not know the channel status of the AFC and the common seed of the sensor and the AFC, it can only use the RSSI of the eavesdropped data as the seed. It recovers
as follows:
Next, we intend to prove
. It may be assumed that there are
bits in
that cannot be recovered correctly and
. If
, then
. Thus
It can be inferred that , unless .
can be converted to a binary string
. Next, we can prove that the probabilities of
and
are equal.
From a statistical point of view, . Obviously, when m > 1, i.e., , there is .
Similarly, we can recover the sensor output of each group to obtain the output vector of N sensors
. Then, the fusion rule of EFC can be expressed as:
If
is greater than the decision threshold, the fusion decision result is
, otherwise, it is
. It can be seen from Equation (23) that data confidentiality can be achieved by deriving
equal to
, which makes the LLR value at the EFC always equal to zero. The EFC will ignore the data since it cannot make a final decision for the binary hypothesis testing problem when
.
As can be deduced from Equation (26), the LLR at EFC is always equal to 0, which makes it impossible to decide the target state. Therefore, it is almost impossible for EFC to make correct decisions based on the captured sensor data.
4. Time Complexity Evaluation
In this section, the time complexity of this scheme in a duplex time is analyzed. To compare with traditional encryption algorithms, we define the stage from the sensors observing target state to sending data as the encryption stage, and the stage from the AFC receiving data to taking out the sensor outputs as the decryption stage. Then, we use Big O notation to describe the time complexity of the encryption stage and the decryption stage. The proposed scheme in this paper is referred to as the information hiding and data flipping (IHDF) scheme.
In step 1 of
Section 2.2, the time complexity of
sensors to observe the target state is
. Since
,
. In step 2, the time complexity of generating a random binary string
of length
is
. In step 3, the time complexity of calculating the hash function
is also
. In step 4, the time complexity of calculating the position
of the bits in
to be changed is
. In step 5, the time complexity of the operation of modifying
according to
at most one bit is
. In
Section 2.3, the time complexity of the sensor generating
that determines whether
’ is flipped is
. The time complexity of calculating the error correction code of the encrypted
is
.Therefore, the time complexity of the encryption stage of IHDF is
.
Similarly, the time complexity of AFC for error correction of the received messages is . The time complexity of generating the flipping sequence is . The time complexity of calculating is . The time complexity of calculating is . The time complexity of restoring to binary output is . Therefore, the time complexity of the decryption stage of IHDF is .
The computation load of the sensor nodes is and the computation load of the AFC is also . The time complexity decreases as increases.
RC6 [
18], Simon [
19], and Speck [
19] are typical lightweight encryption algorithms, while Rijndael [
20] and Twofish [
21] are traditional symmetric encryption algorithms.
Table 2 shows the computational complexity of each algorithm. Compared with Rijndael and Twofish, the proposed scheme has no complicated S-box confusion, and the time and space costs are significantly reduced. For symmetric encryption algorithms, if the information to be encrypted is too short, such as a few bits, it must be filled to at least the shortest encryption length. At the same time, multiple iterative calculations are required, which will increase the computational effort.
6. Power Consumption Evaluation
In this section, we measured the power consumption of the algorithm in
Section 5.2 after running on the CC2530 development board for a while.
We put the fully charged 14,500 lithium battery into the CC2530 development board and send the encrypted sensor output at a rate of 5800 times per hour. After the development board continues to run the algorithm for several hours, the remaining battery power is measured. We continue to discharge the battery with a current of . The discharge ends when the battery voltage drops to the cutoff voltage. The total amount of discharge during this time is the remaining charge of the battery. Similarly, the fully charged battery power can be measured. The hourly power consumption is calculated by dividing the difference between the fully charged battery power and the remaining power by the operating time.
To avoid errors caused by different development boards and batteries, we ran each algorithm once on each of the four development boards, which corresponded to the batteries one by one. To measure the pure power consumption of the algorithm, all sensors were removed from the development board before starting the measurement. The average of the four measurements was then taken as the final hourly power consumption. The measurement order and measurement values for each algorithm are shown in
Table 5. It can be seen that the proposed scheme consumes significantly less power compared with the traditional lightweight symmetric encryption algorithm. Combined with
Table 2, the proposed scheme has no iterative computation, only 5 bits of data are encrypted, and the carrier bit string used to hide the information is only 31 bits. The amount of running code is much smaller than that of the traditional symbolic measure encryption algorithm. Therefore, the proposed scheme is more suitable than the traditional method for wireless sensor network applications that require low energy consumption.
7. Conclusions
In this paper, a lightweight scheme based on information hiding is proposed for data confidentiality in distributed WSNs. Due to the openness of WSNs, sensor data are transmitted through insecure channels. The EFC can eavesdrop on all transmitted data from sensors and calculate the state of the observed target through data fusion. To prevent the eavesdropping of EFC, we designed a transmission scheme based on information hiding and data flipping. The main idea is to hide the local measurements into a customized binary string based on the principle of encoding hiding. Then, the sensors and the AFC generate a sequence of random numbers synchronously based on the output of a pre-deployed pseudo-random function, which determines the data flipping. Finally, the flipped binary strings are sent to the AFC along with error correction codes. The AFC can recover the flipped string and then extract the sensor’s observations from the string.
Through simulations and experiments in real physical equipment, it is demonstrated that the proposed scheme enables the AFC to correctly recover the sensor output, while the EFC is completely unable to distinguish between flipped and unflipped data. The proposed scheme has good immunity to interference and can still restore the original data more accurately in the changing physical environment. Moreover, the proposed method is more efficient through real power consumption tests and consumes less power than the traditional symmetric encryption.
We believe that the proposed scheme can be deployed in many WSN applications with limited resources and unpredictable operating environments, including natural disaster monitoring and remote control of UAVs, due to its anti-interference, low complexity, and low power characteristics.