1. Introduction
Internet of things (IoT) is widely used in various fields of our daily lives, such as agriculture, industry, transportation, healthcare, education, furniture, and so on. With the IoT changing people’s lifestyles, user security, in its various applications, has gradually become a problem that cannot be ignored [
1,
2]. In October 2016, the Mirai botnet [
3] controlled a large number of IoT devices to launch DDoS attacks with network traffic up to 620 gb/s, which led to disconnections in most states in the United States. It was discovered that in 2014 there are more than 750,000 devices have been compromised and spied [
4]. In 2019, a large amount of privacy problems occurred in smart-home devices, and some users’ private videos were exposed on the internet by attackers. The above examples show that, due to the entity link mode of the IoT, attackers may easily obtain network data for illegal use or dissemination; more attention should be paid to this, especially in regards to the application of IoT-based health monitoring [
5], since mobile health has become one of the most closely related IoT applications to consumers [
6]. Data privacy of users has become a hot topic for researchers in the academic field of IoT.
An IoT infrastructure usually integrates various sensors, memories, computing units, and gateway modules to complete the tasks of data acquisition, storage, and forwarding transmissions to a cloud platform. Compared with traditional networks, IoT device nodes are based on physical connections with smaller storage space, wider distribution, and diversified transmission protocols [
7,
8]. This makes common privacy protection methods difficult to be directly transplanted to IoT devices, and undoubtedly increases the risk of disclosure of privacy data, especially a large amount of physiological data, which can be used to evaluate human health status [
9].
Generally, there are three basic privacy protection methods: encryption, disruption, and anonymity. Encryption is a useful method of data protection; commonly used encryption algorithms include AES [
10] and RSA [
11]. However, the complexity of the encryption method is also the largest of the three methods. Disruption refers to adding noise in accordance with a specific distribution of the data, which makes it difficult for attackers to accurately obtain data, such as differential privacy [
12]. Anonymity is to hide some data to ensure the privacy of users, such as k-anonymity [
13]. The complexities of these two methods are small, but there exist different degrees of information loss. According to the actual needs in different scenarios, the applicable methods are different.
This paper uses the intelligent wearable device (e.g., a smart bracelet) as an example to study the data privacy protection mechanism of IoT applications. Intelligent wearable devices are often attached to the skin surface to collect and analyze the physical and physiological signals of the human body. They can acquire human body status in real time as well as communicate with other IoT devices. Intelligent wearable devices are widely used in sports, healthcare, and other fields. This kind of equipment usually contains a variety of sensors, which can collect heartbeat, blood oxygen, acceleration, etc. These signals are usually transmitted to a specific gateway device (e.g., a smartphone) via the Bluetooth protocol, and then further transmitted to the server database deployed in a cloud platform. Data publishing is an important process of making data open access to provide necessary information to researchers and the public. For example, publishing infectious disease data can assist disease control centers in analyzing the diffusion trend and alert the public to take measures to prevent disease spread. Because the usability of publishing data and the low cost of data processing are basic requirements of data publishing, anonymity and disruption are common privacy protection methods in this process.
Figure 1 shows the information transmission process of wearable devices, in which each stage has a certain degree of privacy leakage risk:
- (1)
In the signal acquisition stage, the physical structure of the equipment is at risk of being damaged.
- (2)
In the wireless transmission stage, the signals are faced with the risk of being intercepted by special equipment. Moreover, most IoT devices have limited computing and storage space, which makes it difficult to run complex privacy protection algorithms. For example: in a marathon race in 2014, researchers used Bluetooth sniffers to easily obtain health information from 563 different competition devices since the data collected by the devices were not protected [
14].
- (3)
In the data publishing stage, attackers could infer the users’ real information in different ways. The common methods include linkage attack [
15,
16] and background knowledge attack [
15,
17].
In order to avoid these risks, we need to develop the privacy protection technology in specific circumstances. First, it is necessary to classify the data and data types acquired by IoT devices. According to the time duration of data value maintenance, the collected data attributes can be divided into static data, long-term data, and real-time data. Considering the differences of data attributes in the application of IoT, it is essential to establish different privacy protection mechanisms for each datum attribute. Moreover, we must ensure the consistency of privacy-preserving mechanisms, which means that it cannot only be applied to a certain type of device, or weaken the effect of privacy protection when the data are updated.
Based on the above considerations, we propose in this paper a practical privacy-preserving mechanism for wearable IoT applications. The framework is presented in
Figure 2. In regards to signal acquisition during the device stage, smooth prior processing (SPA) [
18] and median filtering (MF) [
19] were applied to preprocess the photoplethysmography (PPG) signals collected by wearable IoT devices. Data with physical significance, such as heartbeat, blood oxygen, and acceleration were then estimated, respectively, in numerical form from the preprocessed PPG signals. In the stage involving wireless transmission from devices to the cloud, the original data were encrypted by the PRESENT algorithm [
20], which is a lightweight encryption algorithm for IoT devices with limited space, and then transmitted to the server. Moreover, data were encrypted by the Paillier algorithm [
21] in the cloud for homomorphic updates. In the data publishing stage, we divided the data into three parts—static data, long-term data, and real-time data, respectively. For the first and the second types of data, we designed a personalized k-anonymity algorithm to protect privacy. For the third, we proposed the temporal differential privacy mechanism to suppress the information leakage in data update. Results of the two methods will be combined and eventually released.
The contributions of this paper are mainly summarized as follows.
- (1)
We designed a privacy-preserving framework for IoT devices, which includes the transmitting and data publishing process.
- (2)
We proposed the personalized k-anonymity algorithm based on entropy of attributes to increase the usability of anonymized data, in which the category and numeric attributes are discussed as different types.
- (3)
We proposed the temporal differential privacy mechanism to reduce the temporal privacy disclosure, and put forward an implement algorithm in the Laplace mechanism scenarios.
- (4)
We proposed a practical data-publishing model for IoT devices, including the processing of static, long-term, and real-time data, and we prove that this model is of enough safety.
The remainder of our work is organized as follows.
Section 2 introduces the related privacy preserving work.
Section 3 presents the complete theory and method of our work, including the collection and preprocessing of data in IoT devices, the encryption methods in the transmitting process, and the publishing model containing the proposed personalized k-anonymity and temporal differential privacy. In
Section 4, experiments are presented for performance evaluation and comparison. Finally, we draw conclusions in
Section 5.
3. Methodology
In this section, we demonstrate the details of the proposed privacy-preserving mechanism. In the application scenario of smart wearable devices, we first describe the specific ways of information collecting and processing, converting electrical signals into data. Secondly, we use two existing encryption methods to ensure the security of information in the transmitting process and background. Finally, we propose the data publishing method to reduce privacy leakage. The temporal data tables will first be split into two parts. We apply personalized k-anonymity to static and long-term data, and temporal differential privacy to real-time data. The data will be published after merging. In the end, we provide the rationality demonstration for the proposed method.
3.1. Signal Collecting and Preprocessing
In this section, we introduce the collecting and preprocessing steps of signals in smart bracelets, which are the bases of our privacy preserving mechanism. Firstly, two widely used technologies are demonstrated for obtaining signals of heart rate and blood oxygen. Secondly, we introduce two preprocessing methods of reducing noises in signals.
3.1.1. Signal Collecting
(1) Heart rate. When the light passes through the skin tissue and then reflects to the photosensitive sensor, the absorption of light by other tissues is, basically, unchanged except for the blood, for there are blood flow changes in the artery for every beat. When the light is converted into the electrical signal in the devices, the signal can be taken as the summation of DC and AC signals, which present the unchanged signal of other tissues and the changed signal of blood flow. According to the method described in [
34], we use Discrete Fourier Transform (DFT) to transform the time domain waveform of PPG signals into the frequency domain, and then extract the frequency components of human heart rate from the spectrum to obtain the heart rate data.
(2) Blood oxygen. There is a certain proportion of oxygenated hemoglobin
and hemoglobin
in blood. The absorption coefficient of
is high in the range of 600–800 nm in spectrum, while the coefficient of
is high in the range of 800–1000 nm [
28]. In [
28], researchers use red light (600–800 nm) and infrared ray (800–1000 nm) to detect the PPG signal of
and
to reflect the
value.
In Equation (2), is the oxygenated hemoglobin concentration, and is the reduced hemoglobin concentration.
3.1.2. Signal Preprocessing
There are other noises in the physiological signals collected by smart wearable devices, which are produced during the signal collecting process. In order to obtain the accurate physiological data of users, the signal-preprocessing module is needed in the devices.
The noises are produced because of the following two major reasons. The first is the electromyography (EMG) interference [
35]. The movements of human body cause the muscle tremor, which makes the surface potential change and causes the interference of collected signals. The EMG noise is similar to the white noise, because they both present narrow in time domain and wide in frequency domain. In our work of smart bracelets, we first apply the smoothness prior approach (SPA) [
18] to remove EMG noise.
The first step of SPA is dividing the signals
into two components: the stationary part
and the low-frequency aperiodic trend part
.
The information is contained in the stationary component, so the trend component needs to be removed. The trend component can be described as the linear model:
where
is the observation matrix, and
presents the observation error. The estimation of parameter
is expressed as the following expression:
where λ is the regularization parameter, and
is the discrete approximation to the
th derivative operator. Suppose the dimension
, then the solution is:
and
is the second order difference matrix:
The second reason of noise interference is baseline drift, which comes from the intermittent contact problem of the devices and human body surface. In this paper, we use the median filter (MF) to suppress this kind of noise. In this filter, we set a length-fixed window and make the signals successively stream into the window. The output of the filter is the median value of all samples in the window at every time point. The points with noises that appear in isolation can be removed through MF method.
3.2. Privacy-Preserving in Data Transmitting
Encryption is the most used privacy-preserving method in the process of data transmitting, which ensures safety and causes no information loss. During the transmitting of data from devices to the cloud, we apply two different asymmetry encryption algorithms in this paper. First, in order to ensure the security of data in the process of information transmission, and the better adaption to the storage and computing space of devices, we use the lightweight PRESENT algorithm to encrypt the data inside the device. Secondly, in order to make the privacy information not exposed in the background and support the normal data update operation, we use the Paillier homomorphic encryption algorithm to encrypt the data in the cloud.
3.2.1. Encryption in Devices
In IoT devices, block cipher is widely used as a kind of encryption methods. Block cipher divides plaintext into several vectors, encrypts the vectors separately, and finally combine the ciphertexts. Block cipher decreases the size of plaintext to be encrypted at one time, which fits the limited-space IoT devices.
In the process of transmitting information from smart wearable devices to the gateway devices, we apply the PRESENT algorithm [
20], which is one of the lightweight block cipher algorithms. In the PRESENT algorithm, the length of every vector is 64 bits, and the length of a secret key is 80 or 128 bits. The sizes of vectors and secret keys of PRESENT are much shorter than the traditional block cipher methods, which makes it feasible in smart wearable devices. The PRESENT algorithm is described in
Figure 3.
In the PRESENT algorithm, each key update process round is divided into three steps [
29]. Suppose that the last round key is [
. First, the key is rotated to the left by 61 bits, in which it becomes
. Then, the highest four bits
are replaced by S-box. Finally,
is exclusive-ored with the number of rounds.
The S-Box of PRESENT is shown in
Table 1 [
20], which is used in the substitution layer to replace every bit of text, and the function of the permutation layer [
20] is calculated as Equation (8), which means the bit
will be moved to the bit position
. Every round of the encryption process contains the substitution layer and the permutation layer.
3.2.2. Encryption in Cloud
When the information of users is transmitted to the cloud, in order to avoid information disclosure for background and support different computing operations, we employ the homomorphic encryption. Homomorphic encryption is a kind of encryption method, which makes the text before and after encryption homomorphic on some operations. In this paper, we use the Paillier homomorphic encryption [
21] to preserve privacy in the cloud for IoT data.
The Paillier homomorphic encryption algorithm is homomorphic in both addition [
21] and scalar multiplication [
21]. Suppose the Paillier homomorphic encryption algorithm is
, and two plaintexts are
and
, the Paillier homomorphic expressions can be described in Equations (9) and (10).
The above two equations indicate that when data are required to be updated or computed in the cloud, we can finish the addition and scalar multiplication without decrypting, and the real data will not be exposed to untrusted platforms.
3.3. Privacy-Preserving in Data Publishing
In this part, we introduce a practical data-publishing model: for static and long-term data, the personalized k-anonymity is used, and for real-time data, the temporal differential privacy is used. We first demonstrate the data publishing in IoT in
Section 3.3.1, and introduce the above two innovative algorithms in detail in
Section 3.3.2 and
Section 3.3.3. In
Section 3.3.4, we formally prove that the data-publishing model is reasonable.
3.3.1. Data Publishing of IoT
In IoT applications, analysts of relevant organizations will collect users’ data for comprehensive analysis. For example, IoT developers will analyze a large amount of user data for behavior analysis and personalized services, and some useful data will be made public to support researchers’ analysis in some fields. However, in the publishing process, the users’ real data are published, which poses a threat to the privacy of users. If the users’ private data are not properly preserved during data publishing, intentional attackers may use the information to cheat target users or sell their data, which will damage the users’ interests and reduce the credibility of the IoT platforms. In addition, if the leaked information is sensitive to users, the leakage will cause psychological harm. Data that could be published in the scenarios of the smart wearable devices are listed in
Table 2.
3.3.2. Personalized k-Anonymity
As we explain in
Section 2, the traditional k-anonymity methods lack the consideration of attributes’ differences. In this part, we introduce the personalized k-anonymity.
K-partition is the process of dividing an original dataset into several clusters [
13], which is the most important step of k-anonymity. The common methods of k-partition are mostly based on the distance between every two records, and making the near records into the same cluster, such as MDAV [
23], V-MDAV [
24]. Suppose the QI attribute set is
, the distance between two records
and
can be calculated as:
where
are weights of every attribute. In k-partition, we notice that there are two important points: one is the distance of values for one attribute
, the other is the assigned weight
of every attribute.
Firstly, we define the distance of two values of an attribute. As for the digital attributes, we use the absolute value of subtraction. As for the category attributes, we use the step to the same parent node in the generalization tree to measure the distance between the two child nodes.
To assign the weights, the entropy weight method [
36] is used in this paper. We describe the reason of using this method in a simple way: the attributes of high entropy are always complicated in the occurrence of value, which should play more important roles in dividing data into clusters. As what have discussed in the previous sections, the discussion of entropy weight method also bases on category and numeric attributes. As for the category attributes, the method is to find frequency
for each possible value
, usually
is the ratio of the number of occurrences of the value
and the number of records. The normalized entropy of attribute
is:
As for the numeric data, the scope of each attribute value is different, which will affect the extent of dispersion of attribute. Data should be normalized before assigning weights to attributes, so that we can judge the dispersion degree of each attribute from a unified perspective. In our study, we use the membership degree [
37] method to normalize the values.
The data collected by devices in one time can be described as
, where
presents the numbers of users, and
presents the numbers of QIs. The membership function can be expressed as:
and
are the upper limits of satisfying value and permissible value of attribute
. The normalized matrix can be expressed as
. Each element in the matrix ranges from 0 to 1. According to the idea we put forward, the greater the dispersion of data, the bigger the entropy of the data, and the less security the data possess. Entropy of an attribute
is calculated as in Equation (12), in which
The weight assigned to each attribute is:
However, the cost time for computing the weight of every attribute is almost equal to the cost time of k-partition. For the occasions where there are a large amount of data, the process to assign weights will cost much time. When we design the personalized k-anonymity algorithm, this procedure is based on a small amount of data sampled randomly to reduce the time cost. It can be proved that the result of weight is the unbiased estimation of the complete data result, for the calculating of entropy is based on frequency.
We use the V-MDAV algorithm [
24] for k-anonymity grouping, which is presented in Algorithm 1. The personalized k-anonymity algorithm is described in Algorithm 2.
Algorithm 1. V-MDAV |
Input: distance matrix , Parameter |
Output: micro-aggregated set |
1: |
2: while (more than records wait to be assigned) do |
3: |
4: |
5: |
6: end while |
7: |
8: |
9: return |
10: end function |
Algorithm 2. Personalized -anonymity |
Input: Original datasets , Parameter , sampling ratio |
Output: published datasets |
1: |
2: while |
3: if |
4: |
5: |
6: else if |
7: |
8: end if |
9: |
10: end while |
11: |
12: |
13: |
14: |
3.3.3. Temporal Differential Privacy
Apart from the common insecurity that a statistic dataset will bring, dangers of data leaking in a temporal process still exist. For example,
Table 3 shows the information that a device collects at different moments from one user. This user’s heart rate increases suddenly, while the other two attributes do not change a lot. Moreover, the background attackers can draw a conclusion that it is of a low probability that the sudden increase in the heart rate is related to sports, but probably results from an illness, such as the sudden palpitation. If this user really has the illness, the health information is exposed to the attackers.
In a dataset, some health attributes may have common correlations, which can be positive, negative, or more complex. Therefore, unbalanced changes of attributes indicate the occurrence of abnormal conditions and can lead to information leakage.
In this paper, we propose the temporal differential privacy mechanism to solve the above problem. We firstly define an important variable to indicate the sensitivity ratio between two time points. Suppose the sensitivities at time 0 and time are and , then the sensitivity ratio is .
Definition 4. Temporal differential privacy. Suppose the range of is , the published results are satisfied with temporal differential privacy when .
In this paper, we put forward an implementation method of temporal differential privacy in the Laplace mechanism scenarios.
Suppose the original query result is
, and the result at time
is
. According to the Laplace mechanism [
11], the result perturbed by Laplace noise
is satisfied with differential privacy. From the definition of Laplace distribution, the distribution of published result at time 0 of differential privacy is
At time
, the published result is also perturbed by Laplace noise
, and a distribution also exists. We study the possible result
. Through the same way as above, the probability is
In order to meet the requirement of temporal differential privacy, we first compute the result of
in Inequality 18. We use the absolute value inequality at the position of less-than-equal sign.
The computing result shows that the upper bound is not of correlations with the possible result
, which means this inequality is true for any value of
. Moreover, from the symmetry of
and
, the two bounds of
can also be computed:
According to definition 4, temporal differential privacy is satisfied if
We take the logarithm of both sides, and then merge the two intermediate results together to compute the final result:
We take Inequality 21 as the condition of our algorithm, which represents the bound of whether the data can be published, ensuring the data will not leak important information in the variations. The above algorithm is described in Algorithm 3. It is obvious that this algorithm is satisfied with temporal differential privacy.
Algorithm 3. Temporal differential privacy |
Input: Original datasets , |
Output: Processed results , |
1: |
2: while do |
3: |
4: |
5: if |
6: return |
7: |
8: end if |
9: end while |
3.3.4. Rationality Demonstration
We proposed a method that combines the promotions of the two traditional algorithms: k-anonymity and differential privacy in data publishing for different kinds of attributes in
Section 3.3.2 and
Section 3.3.3. In this section, we prove that this combination is rational and the final result satisfies with both k-anonymity and the differential privacy mechanism.
The real-time attributes can be seen as the sensitive attributes (SA), while the static and long-term attributes set includes QIs and SAs. Suppose the static and long-term attributes set is , and the real-time attributes set is .
(1) K-anonymity. The values of QI are the same in an equivalence class as the attributes according to the k-anonymity mechanism. It is obvious that, as for the combined attributes set, , the QI values remain the same in the equivalence class as the record number at least k. Therefore, our combined mechanism is satisfied with k-anonymity.
(2) Differential privacy. According to the differential privacy mechanism, the relationship between neighbor datasets
and
satisfies with the expression:
and are a couple of neighbor datasets, which means that they are different in only one record. We add the same data that have been anonymized to the two datasets. It is obvious that the new datasets and are neighbor datasets to each other.
Denote
the publishing mechanism in this paper and
, and we can conclude from the above that
It is proved that the proposed data publishing mechanism is satisfied with the differential privacy.
5. Conclusions and Future Work
In this paper, we propose a practical privacy-preserving mechanism to ensure data security in different stages of a wearable IoT framework, in which the application of smart wearable devices is taken as an example. First, we employ the light-weighted PRESENT algorithm to encrypt information in IoT devices, and utilize Paillier homomorphic encryption to manage data on the cloud platform. In the publishing data stage, we optimize the traditional -anonymity algorithm for static data and the differential privacy algorithm for the real-time data, and then make the rational demonstration for the combination of the two optimized algorithms. Specifically, we propose the personalized -anonymity algorithm, in which we assign weights to different attributes, based on the entropy, and discuss the different occasions for the numeric data and category data. The experiment results show that personalized k-anonymity is equivalent to traditional k-anonymity in usability, but its safety index is about 0–6.25% higher than traditional algorithms. Moreover, its grouping results are more concentrated. Furthermore, we propose the temporal differential privacy mechanism to ensure the privacy security in temporal dataset, and put forward an implementing method based on the Laplace mechanism. The experiment results show that the temporal differential privacy decreases the disclosure in time variation duration. Taken together, our results provide evidences toward the feasibility and effectiveness of our mechanism in protecting privacy for IoT-based users.
In the future work, we will improve the proposed mechanism from the following aspects:
- (1)
There are some researches of attacks on the PRESENT algorithm, such as [
40]. We will improve the algorithm in future work to enhance security.
- (2)
Some existing smart bracelet systems have used learning algorithms for classifying and predicting tasks, for example, the health status of users could be evaluated according to the data collected by the smart bracelets. In the training process, users’ privacy will also be exposed. We intend to adopt the federal learning method in the future work.
- (3)
We will improve our mechanism to adapt to other kinds of IoT devices, and evaluate its effectiveness in the current network and device environment.