1. Introduction
With the development of IoT technology, in order to achieve industrial informatization, more and more IoT devices are being connected, and the amount of data is becoming larger and larger. In order to save on storage and computing costs, enterprises choose to outsource massive amounts of data to cloud servers. However, while enjoying the convenience brought by the cloud, the security of the data has become crucial. To protect data privacy, sensitive private data need to be encrypted before being outsourced to the IoT cloud [
1]. Song et al. [
2] proposed searchable symmetric encryption (SSE). SSE is an encryption scheme that allows users to store encrypted data on a third-party cloud server and search the encrypted cloud data through the trapdoor generated by the keywords. However, most SSE schemes only consider keyword search operations on statically encrypted cloud data, which is inconsistent with the real-time and dynamic update requirements for enterprise data. Moreover, some studies have shown that conventional SSE schemes are vulnerable to leakage abuse attacks [
3], file injection attacks [
4], and technical attacks [
5]. In order to realize the dynamic update (add, delete, or modify) operations of encrypted data stored on cloud servers, some SSE schemes supporting the dynamic update operations of private data have been proposed [
6,
7,
8,
9]. Kamara et al. [
6] proposed an SSE scheme that supports dynamic data updating. The scheme realizes a sublinear search via extending an inverted index and uses a search array and delete array combined with other storage space to realize the dynamic update of the data. Subsequently, they proposed another method based on the keyword red–black tree index structure [
7] to support parallel keyword search and parallel data insertion and deletion. Guo et al. [
8] proposed a dynamic SSE scheme based on an inverted index. The scheme records the keyword position by the inverted index and realizes the data dynamics through updating the index. Xia et al. [
9] proposed a dynamic keyword search scheme for encrypted cloud data based on the tree index structure that supports multi-keyword sorting.
The above DSSE schemes do not consider the correctness and integrity verification of the returned matching result of the cloud server. In practice, the cloud server may return un-updated or incorrect matching results to the user in order to save computing resources. Therefore, users need to verify the results returned by the cloud server to ensure the correctness and integrity of the returned results. Some schemes [
10,
11,
12] use the timestamp function of the RSA accumulator to verify the search results, which generates accumulator bits for all files and indexes, which can be saved by the data owner. If the cloud server returns an un-updated result, the user can check it with the latest accumulator. The RSA accumulator [
13] can synthesize a large number of data into a fixed-size value to achieve member authentication, which can effectively reduce communication overhead. The RSA accumulator is applied to the compressed prefix tree structure to realize both efficient retrieval and result verification. However, the RSA accumulator is based on asymmetric cryptosystems, and the computational costs and verification costs are high. Some research teams have proposed verifiable schemes based on message authentication code (MAC) [
14,
15,
16], but in DSSE application scenarios, MAC cannot verify whether the results returned by the cloud server are the latest, that is, it cannot resist replay attacks [
17]. Ge et al. [
18] proposed a verifiable DSSE scheme based on cumulative authentication tags (AATs), which generates authentication tags for keywords and verifies the returned results through recording the number of global updates and the number of updates of a single file containing those keywords. Each update operation consumes only one label, which is highly efficient. However, the pseudo-random permutation and pseudo-random function are used in this scheme to replace and encrypt the keywords and global update times, which leads to key management problems. Relational authentication tags (RALs) [
19] are used to verify the relationship of query keywords in documents, and audit certificates can be generated without exposing sensitive information. However, the program requires third-party auditors to be involved in the search process. Therefore, how to effectively verify the correctness and integrity of the returned results is an urgent problem to be solved.
According to the research on the above schemes, the verification of search results in most schemes is not comprehensive and also involves key management problems. Therefore, how to effectively verify the correctness and integrity of search results as well as the security management of encryption keys are the problems we should focus on and solve.
In this paper, we explore how to use the public key cryptosystem in the DSSE scheme to verify the correctness and integrity of the result returned by the cloud server and to manage the encryption key effectively and securely.
The contributions of this paper can be summarized as follows:
(1) In order to efficiently realize the index update and index lookup, we constructed a bitmap index to store the relationship between the keywords and encrypted files. A verification list () was used to store the latest verification information of files containing keywords so that we can quickly obtain the latest verification information from the to perform the secure index update.
(2) In order to support the effective verification of dynamic data, we designed public-key-based cumulative verification information (), which is stored in the bitmap index. When the encrypted cloud data are dynamically updated, the verification information can be easily updated. In addition, the verification information contains the corresponding keywords’ information, which makes the verification information of various keywords different. Moreover, replay attacks can be resisted by the , that is, through verifying whether the returned result is up to date.
(3) In order to achieve forward security, the scheme places the node information in the bitmap index to avoid statistical attacks. When we need to search or update private data, the cloud server will return or change the whole column’s data so that malicious cloud software cannot obtain the relationship of the keywords and index.
(4) Based on the above description, we design a verifiable DSSE scheme based on the public key cryptosystem. The security, verification efficiency, and updating efficiency of the scheme are analyzed and explored. The results show that the scheme is safe and effective.
Organization: The rest of the paper is organized as follows. We summarize the related work in
Section 2. In
Section 3, the formulas and algorithms involved in the scheme are defined, including model construction, design objectives, etc. In
Section 4, we describe the construction of the scheme and the execution of the algorithm. The security analysis of the v-PADSSE scheme is given in
Section 5. In
Section 6, the implementation efficiency and updating efficiency of the program are analyzed and evaluated.
2. Related Work
With the development and application of the Internet of Things and cloud computing technology, many industries have chosen to outsource data to third-party clouds for storage. While cloud storage brings convenience to enterprises, it also brings new security challenges. Users cannot directly control the data stored in the cloud, so it is impossible to determine whether the data stored are complete and correct. To solve the problem of data verification, the research community has proposed some cloud storage verification schemes [
20,
21,
22] to audit and verify data in the cloud. In addition, before uploading private data to the cloud for storage, users need to encrypt it to prevent it from being accessed directly by cloud providers. However, in this case, how users perform search operations on encrypted cloud data is also an important problem to be solved. To solve the above problems, the research community proposes searchable symmetric encryption (SSE), which allows users to perform search operations directly on the ciphertext. Compared with the searchable encryption scheme of the public key encryption system [
23,
24], the efficiency of the SSE scheme has received more attention from the industry.
Dynamic SSE. Searchable encryption can be divided into two categories: symmetric key encryption [
25] and public key encryption [
26]. Song et al. [
2] first proposed a searchable encryption scheme that encrypts each keyword through constructing a special two-layer encryption structure. Some static SSE schemes, such as semantic search schemes [
27] and ranked keyword search schemes [
28,
29], are also proposed. However, in practice, industrial data are dynamically updated in real time, and the static SSE scheme does not support the dynamic update of encrypted cloud data, so it cannot meet the requirements of cloud storage data encryption at this stage. In order to support the dynamic update of encrypted data, Kamara et al. [
6] proposed a dynamic SSE scheme through constructing an extended inverted index to achieve sublinear search efficiency and CKA-2 security. Scheme [
30] proposed a dynamic SSE scheme which allows data owners to store privacy files in a way that the cloud server does not know the number of files through constructing a blind storage system on the cloud server. Guo et al. [
9] proposed a DSSE scheme based on the inverted index, which allows data users to search multiple phrases in a query request, and the scheme supports the ordering of search results. In recent years, a number of cloud-assisted schemes have been proposed for searchable encryption [
31]. Scheme [
32] utilized the searchable encryption technologies of keyword range search and multi-keyword search. Since the cloud is untrustworthy, scheme [
32] used Bloom filters and message verification codes to classify health information, filter out fake data, and check data integrity. In order to verify whether the cloud faithfully performs the search operation, a multi-user verifiable searchable symmetric encryption is proposed in scheme [
25]. Authorized users can search the data, verify the authenticity of the search results, and improve the accuracy of the search results. Since the access rights of authorized users are always valid, it is not secure. In order to automatically revoke a user’s access, the time key was introduced in [
33]. At the beginning of encryption, the key is encapsulated in ciphertext, which means that all users, including the data owner, are bound by the time period. Later, Yang et al. [
34] proposed a conjunctional keyword search with the function of specifying testers and enabling timed proxy re-encryption. It utilizes a time server to generate time tokens for users. In addition, it implements time-controlled access revocation to prevent authorized users from accessing future EHRs. Scheme [
35] proposed timed-release computational secret sharing and threshold encryption which used a time-release function instead of a time server to reduce overhead. Scheme [
36] proposed 0-encoding and 1-encoding to generate the time key. However, the retrieval efficiency of this work is low. In order to improve search efficiency, scheme [
37] with hidden data structures was proposed in the literature. The user expected to find more ciphertexts in one step. However, scheme [
37] reduced the number of computation-intensive operations without searching for at least two matching ciphertexts in just one step. This work cannot meet the need for a quick search and prevent authorized users from accessing future data. While all of the above work enables cloud-based search, there is still a challenge: the cloud is not a fully trusted entity and can collude with other entities to gain access to users’ private information.
Verifiable SSE. In practice, cloud servers are semi-trusted entities [
38] that may return incorrect or un-updated results to the data user in order to save on computing overhead. Miao et al. [
39] constructed the verifiable SE framework (VSEF), which can withstand internal KGA and achieve verifiable searchability. Wu et al. [
40] proposed a new authentication data structure based on homomorphic encryption and showed how to apply it to verify the correctness and integrity of search results. However, the verification proof in their scheme is generated by the cloud server, which can forge the proof to pass the verification when the cloud server is seen as an adversary. To avoid this, Chai et al. [
41] first proposed a verifiable keyword search scheme for encrypting cloud data, using hash functions to generate proof of document identity. Jiang et al. [
14] proposed a verifiable multi-keyword ranked search scheme based on encrypted cloud data, which realized an efficient keyword search through constructing the special data structure QSet. Yang et al. [
42] designed a forward-privacy VDSSE scheme with Bloom filters and message authentication codes to allow verification and support dynamic updates of outsourced data. Zhang et al. [
43] proposed a verifiable data structure based on a multi-set hash function, which guarantees forward security and realizes effective verifiable data updates. Gao et al. [
19] used relational authentication tags (RALs) to verify the relationship of the query keywords in the document, which can generate audit certificates without exposing sensitive information. However, the program requires third-party auditors to be involved in the search process. Merkle hash trees [
44] are used to validate data elements in large databases. Through adding data elements to the leaf node of the tree, the tree structure is constructed layer by layer from the leaf node to the root node, and finally the unique root node is obtained. A change to any element in the data set will make the root node change. A Merkle Patricia tree is proposed in GSSE [
45] to reduce the storage overhead of index structures in schemes based on Merkle hash trees. It reduces storage space through reducing the depth of the tree. However, in the above two scenarios, the proof provided by the cloud server to the DU is larger in scale, which brings more communication overhead. Chen et al. [
46] extend the Merkle hash tree [
47] to a searchable index tree to achieve efficient result verification, where search time grows sublinearly with the size of the data set, and verification is more efficient than the accumulator structure. In addition, verifiable DSSE has been implemented by schemes [
45,
48], but they either support a single keyword match search or use two rounds of communication in a single-user setup to achieve result verification. The RSA accumulator [
13] can aggregate a large amount of data into a fixed value to achieve member verification, which can effectively reduce communication overhead. The RSA accumulator is applied to the compressed prefix tree structure to realize the combination of efficient retrieval and result verification. Schemes [
10,
12,
13] all use an RSA accumulator to realize result verification for dynamic data. Most of the above VDSSE schemes are based on asymmetric key cryptography, and the results returned by the cloud server are verified using the public key signature.
Forward secure SSE. Forward privacy protection requires that update operations (insert or delete) performed by the data owner cannot be associated with previously performed search operations. Because the secret key is used for the deterministic encryption of private data in the DSSE scheme, it is easy for untrusted servers to obtain repeated queries and other information, which leads to information leakage (such as the number of keyword queries, etc.). If ORAM is introduced into the scheme, such problems will be avoided, but the communication cost and calculation cost are high, which causes the calculation and execution efficiency of DSSE to be exchanged through allowing some information to be leaked in actual use. However, such leaks are often attacked in different ways [
49,
50]. Bost et al. [
51] proposed to use a one-way trapdoor replacement to eliminate the correlation between the latest trapdoor and the previous trapdoor, that is, the latest trapdoor can search all encrypted documents, but the previous trapdoor cannot match the latest encrypted document. Cao et al. [
52] used the KNN method to construct the security index and trapdoor. This method is used to encode indexes and trapdoors so that even if the keyword is the same, the encoding is different. In this way, the cloud server can avoid obtaining the number of keyword queries and the association between keywords and encrypted data based on data user query operations, thereby protecting forward privacy. Li et al. [
53] used partitioning and pointer hiding technology to partition the secure index and extracted sub-keywords according to the original keywords as the keywords of partition search, then encrypted and hid the index block, which only needed to save the index table header identification and encryption key locally. Since the search token information is calculated using subkeywords, it is difficult for subsequent query keywords to be directly associated with the newly added encrypted document.
4. v-PADSSE Scheme Construction and Algorithm Description
We have summarized some common symbols used in the design of the v-PADSSE scheme, as shown in
Table 1.
4.1. Overview of the v-PADSSE Scheme
In order to solve the problem of correctness and integrity verification of the results returned by the cloud server, this paper designs a DSSE scheme based on public key verification (v-PADSSE). In this scheme, verification information is added to the security index and encrypted using the public key of the data user, so that the user can verify the returned results. Below, the construction of the v-PADSSE scheme is described in detail.
When constructing , the v-PADSSE scheme needs to include , , and . The needs to be encrypted with the user’s public key. Assume that the encryption function is . To prevent the cloud server from collecting statistics on the correlation between keywords and updated files, all index nodes in the column of the document representation of the updated file must be updated so that the update operation can hide the correlation between the ciphertext and the keywords. Therefore, the verification information . In addition, the data owner creates a verification list () locally, which stores the latest verification information of each keyword, including the number of files containing keyword N, the total number of updates of files containing keyword V, the document identification of files containing keyword , and its single set of file update times so that the latest update token can be generated directly when the update operation is performed. For different update operations (such as modify, add, and delete), needs to be performed in different operations. is calculated as follows:
When the data owner obtains the latest and performs the add operation, the number of updates of a single file is initialized to . If the newly added file contains the keyword w in the , the corresponding node in the needs to execute , , then add the document of the new file F and the number of updates to the node. If it does not, a new node needs to be added to the , where , , , and
When file F needs to be updated to a new file (both F and contain the keyword w), the data owner updates with the latest verification information for the keyword w in the corresponding node and executes and . .
File F contains keyword w, and the data owner updates , , and in the in the node where keyword w resides. In this case, the latest verification information . Additionally, the document and of file F are removed from the .
After the data owner performs different update operations, the corresponding update token is generated and sent to the cloud server, which updates the security index and ciphertext according to the update token information.
4.2. Secure Index Structure
In v-PADSSE, the data owner constructs the security index through using the bitmap index and constructs locally. The data owner generates the symmetric encryption key through executing the algorithm KeyGen, and the data user’s key pair is generated via KGC executing the PSKeyGen algorithm.
First, the data user publicly releases the public key, and the data owner, after obtaining the user’s public key, uses the symmetric private key
K to encrypt the privacy files and keywords and uses
to encrypt the verification information corresponding to the keywords. When the security index is firstly constructed, the data owner needs to initialize
through initializing
N to the number of files containing the keyword
w,
,
(1 ≤ i ≤ N). When the security index is constructed, the column header contains the keyword
, and the row header contains the document ID
. Middle node information includes
(the number of updates to the file containing the keyword
),
(indicating whether the document contains the keyword), and
(the result of the hash operation of the plaintext file). The security index structure is shown in
Figure 2.
The number of rows in the secure index is determined by the number of keywords, and each row is associated with a keyword. The number of column nodes is determined by the number of privacy files. When the data owner needs to perform an update operation, PK(), , and in all nodes in the whole column are modified according to the updated document identifier.
The data owner needs to obtain the latest verification information of the corresponding keyword when generating the update token. Therefore,
is designed in the scheme and is owned by the data owner. The latest
of keywords contained in each privacy file must be recorded in
. When updating, the data owner modifies the
in
to ensure that the
and
in the security index are updated simultaneously. The structure of the
is shown in
Figure 3, where
N indicates the number of files containing the keyword,
V indicates the total number of updates to files containing the keyword, and
indicates the set composed of the document identification of the file containing the keyword and the number of updates to the file.
4.3. Algorithm Description
In this section, we give the execution steps of the core algorithm of the v-PADSSE scheme and explain the related functions in detail.
In v-PADSSE, the core algorithms involved are IndexBuild (Algorithm 1), Building VL (Algorithm 2), GenToken, Search (Algorithm 3), Verify (Algorithm 4), UpdateToken (Algorithm 5), and Update (Algorithm 6). The IndexBuild Algorithm 1 is used by the data owners to construct the secure index with bitmaps. Among them, the header node stores keyword and document identification, and the middle node stores keyword-related verification information. After constructing the secure index, data users upload the secure index and encrypted files to the cloud server. Data users use the GenToken algorithm to generate a trapdoor and send it to the cloud server. The cloud server executes the Search Algorithm 3, performs the matching query on the security index according to the trapdoor, and returns the matching results and verification information to the data user. After receiving the results, the user executes the Verify Algorithm 4 to verify the correctness and integrity of the results. If not verified, refuse. To update the privacy file, the data owner obtains the verification information related to the keyword contained in the file from , runs the UpdateToken Algorithm 5 to generate the corresponding update token, and sends it to the cloud server. The cloud server executes the Update Algorithm 6 based on the updated token information to update the security index and related ciphertext. Below, we will give a detailed explanation of the above core algorithm execution process.
Initialization parameters:
- (1)
Obtain the keyword set contained in the plaintext file and save it in the keyword set , where n is the number of keywords.
- (2)
Obtain the number of files containing keyword w, , the total number of updates to files containing keyword w, , the set consisting of the document ID and the number of updates per file, and whether the file contains the of the keyword. through using the public key.
- (3)
Encrypt keyword w and plaintext file F using the symmetric key algorithm, compute , and hash file F to obtain .
Building secure index:
For each privacy file, document identification and keyword set are used to construct the bitmap index. The functions and parameters required to construct the security index are described as follows:
- (1)
Create the header node , where is the keyword set after symmetric key K encryption, and is the private file identification set.
- (2)
Create an intermediate node ; and indicate where verification information is stored in the bitmap index, ) is the number of file updates containing the keyword w, and indicates whether the privacy file corresponding to contains keyword w. If yes, ; if no, . is the result of hashing the privacy file F.
The algorithm process is as follows:
Algorithm 1 |
- 1:
DO: - 2:
N=F.num();V=F.num();v[N]=; - 3:
- 4:
; - 5:
; - 6:
Assuming a total of n keywords and n privacy files, the following procedure is executed times. - 7:
for ; ; i++ do - 8:
for ; ; j++ do - 9:
- 10:
- 11:
end for - 12:
end for - 13:
The above process of creating head nodes and middle nodes together forms the bitmap index, and assigns the bit according to whether the keyword is contained in the privacy file, and generates the security index I. - 14:
The data owner sends the generated security index I and ciphertext C to the cloud server. - 15:
Send to CS(I,C);
|
Building verification list:
The verification list is constructed and stored locally by the data owner. The is a single linked list, and the linked list node needs to contain the verification information of each keyword. The creation process is as follows.
The header node does not store any information, only the address of the first keyword.
: indicates the address of the first keyword.
. The middle node of the linked list stores the keyword
, the number of files
containing
, and the total number of updates to files
containing
. A set of the document identification of the privacy file containing
and the number of updates
to the file, and a pointer to the next keyword address
are also included.
Algorithm 2 Building VL |
- 1:
Because there are n keywords, the creation of the intermediate index node needs to be executed n times. - 2:
for ; ; i++ do - 3:
; - 4:
end for
|
The Search algorithm process is as follows:
Algorithm 3 |
- 1:
DO: - 2:
Data users execute trapdoor generation algorithm GenToken, use symmetric key K to encrypt keyword information, generate trapdoor , and send it to the cloud server. It is also sent to the data owner through the channel to obtain the latest verification information of the keyword. - 3:
- 4:
Send () to CS and DO; - 5:
CS: - 6:
if IndexSearch() = null then - 7:
Then return null; - 8:
else - 9:
Obtain keyword verification information. - 10:
verifyInfo = GetVerifyInfo(PK(),PK(flag),H(F)); - 11:
C(w) = search(docId); - 12:
end if - 13:
Return the search results and VI to the data user. - 14:
Return (verifyInfo,C(w));
|
The Verity algorithm process is as follows:
Algorithm 4 |
- 1:
DU: - 2:
The user uses the private key to decrypt the verification information. - 3:
- 4:
Decrypt the latest verification information returned by the data owner. - 5:
- 6:
- 7:
Check whether the keyword is contained in the privacy file according to the . If yes, proceed with the execution. If not, the privacy file will not be decrypted. - 8:
if flag% 2=0 then - 9:
Delete; - 10:
The verification information returned by the data owner compares with that returned by the cloud server. If the verification information is correct, the ciphertext is accepted and decrypted. If not, the ciphertext is rejected. - 11:
else - 12:
if . num and and H(Decrypt(K,C(w)))=H(F) then - 13:
Return Y; - 14:
else - 15:
return N; - 16:
end if - 17:
end if
|
The UpdateToken algorithm process is as follows:
Algorithm 5 |
- 1:
DO: - 2:
Add: - 3:
The data owner needs to add the privacy file F, he firstly obtains the keyword set and document identification in F and then encrypts F and keyword with K, is the number of updates corresponding to the privacy file. - 4:
- 5:
The is matched according to the keywords contained in F. If contained in F already exists, add (, 1) to the in the matched node, and do , in this node. If not, add a new node () to . - 6:
Assume there are k keywords in file F, it needs to be executed k times. - 7:
for ; ; i++ do - 8:
if search() is true then - 9:
N = N + 1;V = V + 1; - 10:
Add(docIdF,1); - 11:
else - 12:
- 13:
end if - 14:
end for - 15:
The updated verification information is encrypted using public key , and generates the added token . If w is contained by the file F, the value of is in node , If not, the ’s value is . - 16:
- 17:
- 18:
- 19:
- 20:
Delete: - 21:
If the data owner needs to delete file F, he obtains the set of keyword in file F and uses K to encrypt the keywords and the deleted file F. The document identification of the file F is . - 22:
- 23:
Search for in , and update N, V and the set in the corresponding node according to the keyword . This procedure takes k times. - 24:
for ; ; i++ do - 25:
if search(w) is true then - 26:
N=N-1;V=V-v[docIdF]; - 27:
Removes from the document identification set in matched node; - 28:
Delete - 29:
else - 30:
Return error; - 31:
end if - 32:
end for - 33:
The updated verification information is encrypted using the public key, and the deleted token is generated and sent to the cloud server. - 34:
- 35:
- 36:
- 37:
Send() to CS; - 38:
Modify: - 39:
If the data owner needs to modify the privacy file, he uses K to encrypt the modified file F and keywords contained in F. - 40:
- 41:
Update in according to the keywords contained in file F. After updating, the verification information is encrypted using the public key, and the modified token is generated and sent to the cloud server. - 42:
Since the modified file F contains k keywords , the following procedure needs to be performed k times. - 43:
for ; ; i++ do - 44:
if search(w) is true then - 45:
V=V+1;v[docIdF]=v[docIdF]+1; - 46:
end if - 47:
end for - 48:
- 49:
- 50:
During the modification, the remains unchanged. - 51:
- 52:
Send() to CS;
|
Algorithm 6 |
- 1:
CS: - 2:
After receiving the updated token from the data owner, the cloud server performs operations on the security index I and ciphertext C according to the token. - 3:
if .operate = “add” then - 4:
Add a new column to the bitmap index, or add a new row if the keyword contained in F does not exist in the bitmap index. - 5:
BuildHeadNode(docIdF); - 6:
Assume there are n keywords in the bitmap index, that is, n rows, which need to be executed n times. - 7:
for ; ; i++ do - 8:
- 9:
end for - 10:
if does not exist in the bitmap index then - 11:
- 12:
Assume there are n document identifications in bitmap, the following procedure needs to be performed n times. - 13:
for ; ; i++ do - 14:
- 15:
end for - 16:
end if - 17:
addFile(C’); - 18:
else if .operate = “delete” then - 19:
Delete All nodes in the bitmap index if the value of column head node is . - 20:
deleteColumn(deleteId); - 21:
deleteFile(C’); - 22:
else if .operate = “modify” then - 23:
Assume there are n keywords in the bitmap index, all nodes need to be changed if the value of column head node is . - 24:
for ; ; i++ do - 25:
- 26:
end for - 27:
changeFile(C,C’); - 28:
end if
|
In conclusion, when the cloud server performs the update operation, the updated column needs to be modified. At this time, the existence of the and the correct update of the will not affect the verification result, even if the keyword w which is not contained in F changes. Moreover, since the update operation involves the change of an entire column in the secure index, it is also a good way to hide the correlation between the keywords and the updated file.
4.4. Comparison
In this section, we compare our scheme with
[
51], Ge’s scheme [
18], Gao’s scheme [
19], and Zhang’s scheme [
43]. All of those schemes can ensure the verifiability of search results. Assume there are n files and m keywords in total. For simplification, we assume that each search returns n files. We neglect the communication costs and only compare the computation overhead in different phases of these schemes.
Table 2 shows the results of the comparison.
As can be seen from
Table 2, the efficiency of our scheme is close to Zhang’s scheme [
43], Gao’s scheme [
19], and Ge’s scheme [
18] in terms of search and verify operations. However, in the update process, our scheme and Ge’s scheme [
18] are better than others; the time complexity of the two schemes is O(n).
5. Security Analysis
In this section, we will analyze the security of the v-PADSSE scheme in two aspects: update reliability and verifiability.
5.1. Update Reliability Analysis
Due to the cloud server being unreliable in v-PADSSE, it may not update the security index and ciphertext after receiving the update request from the data owner in order to save computing or storage resources. We are going to prove that the Verify algorithm outputs “N” when the cloud server returns un-updated results.
Assume that the result returned by the cloud server is , and the correct result and verification information is . The number of files containing the keyword w is , and the total number of updates to files containing the keyword w is . In addition, the scheme proposes that when the data user performs a query, it will apply to the data owner for the latest verification information , for the keyword. At this time, we will consider the following three scenarios to prove the reliability of the v-PADSSE.
(1) VI = VI’, C(w) ≠ C’(w)
If the cloud server updates only the security index but not the ciphertext, the returned result is .
;
;
But the return verification information contains . At this time, we decrypt the return ciphertext to get . If you want to pass the verification, then ; that is, . If the plaintext is the same, the result after encryption with the same key, which is inconsistent with the assumption that . The above calculation shows that if only the security index is updated without the ciphertext, the Verify Algorithm 4 cannot output ‘Y’ when the data user performs verification.
(2) VI ≠ VI’, C(w) = C’(w)
In this case, the cloud server only updates the ciphertext but not the verification information in the security index.
If we want the Verify Algorithm 4 to output ‘Y’, that means
.
Since the verification information in the security index is not updated, if and the rest are equal, the total update times will output ‘N’ when verifying the returned results. If and the rest are the same, the number of returned results is not equal to N, and the Verify Algorithm 4 will output ‘N’. If the number of is the same as the number of in the returned , it is also necessary to ensure that the corresponding to the two are the same, which indicates that the attacker needs to obtain the verification information from the data owner, but is private to the data owner, and the probability of information leakage can be almost ignored. If and the rest are the same, in this case , the Verify Algorithm 4 will output ‘N’.
(3) VI ≠ VI’, C(w) ≠ C’(w)
Assume that the cloud server does not update the security index and ciphertext after receiving the update token from the data owner. If the data user performs a Search operation (Algorithm 3), the cloud server returns the un-updated results to the user. In the system model architecture (
Figure 1), before performing the Search operation, the data user needs to send the latest
request for the searched keyword to the data owner. The data owner searches
and returns the latest
of the keyword to the data user. After receiving the
, the data user uses the Verify Algorithm 4 to compare the un-updated
with the latest. If any inconsistency is found, the Verify Algorithm 4 directly outputs ‘N’.
The above shows that if the cloud server does not update the security index and ciphertext to save computing or storage resources, our scheme can verify the verification information and return results through the verification algorithm to find the un-updated situation in time. Therefore, the v-PADSSE scheme proposed by us meets the updated reliability.
5.2. Verifiability Analysis
In this section, we assume that the attacker can forge so that the returned results pass the Verify Algorithm 4. Assuming that the correct results and verification information are , we will compare the forged information with the real information to prove that the probability of the attacker passing the Verify Algorithm 4 through forging verification information is negligible.
We will consider the following three scenarios to demonstrate the verifiability of the v-PADSSE.
(1) VI = VI’, C(w) ≠ C’(w)
Attackers forge , while proper verification information . This makes:
;
In this case, because N, V, , and are encrypted using the user’s public key, the cost of forgery is relatively small. At this point, we can consider:
;
According to the properties of the hash function, is certain. In this case, , which is not consistent with the assumption. Therefore, the probability of an attacker passing the Verify Algorithm 4 in this way is almost negligible.
(2) VI ≠ VI’, C(w) = C’(w)
The attacker forges the ciphertext to be consistent with the correct ciphertext. According to the design of the v-PADSSE, the verification information contains . At this point, we can consider:
;
According to the above situation, ; when , if the number of is not equal to the number of , the known probability of information leakage of can be ignored. In this case, the probability of the number of returned results being N is negligible, and the Verify Algorithm 4 will output ‘N’. Therefore, the probability of the above situation passing the Verify Algorithm 4 can also be ignored.
(3) VI ≠ VI’, C(w) ≠ C’(w)
In this case, the data user will spend a communication after sending the trapdoor to the data owner to request the latest verification information N, V of the keyword. Therefore, under this assumption, as long as any of the verification information is different, or the encrypted files returned are different, is inconsistent, which will make the Verify Algorithm 4 output ‘N’. Therefore, the probability of the attacker passing the verification can be ignored under this condition.
The above three scenarios show that if a malicious attacker forges ciphertext or verification information, our scheme can also determine which information is forged and give feedback. Therefore, our scheme satisfies verifiability.
6. Performance and Experiments
In this section, we will analyze the performance of the proposed v-PADSSE scheme. The basic logic of the experiment was written in C++, and the running environment was Windows 10 equipped with a 2.40 GHz 12th Gen Intel(R) Core(TM) i7 CPU and 4.0 GB RAM.
Index construction efficiency. We evaluated the bitmap index proposed in the scheme and the verification list construction efficiency.
Figure 4 shows the time spent to build the security index and verification list when the number of keywords is set to 10,000 and the number of privacy files changes from 1000 to 10,000. In the scheme, the security index adopts the form of a bitmap index, the number of rows is the number of keywords, and the number of columns is the number of document identifiers of privacy files. When the number of rows in the bitmap index is fixed and the number of private files increases, the number of columns in the bitmap also needs to increase, and the time cost of building a secure index also increases.
Figure 5 shows the time spent to construct the security index and verification list when the number of privacy files is 10,000 and the number of keywords contained in the privacy files changes from 1000 to 10,000. When the number of secure index columns is fixed, the increase in the number of keywords leads to an increase in the number of rows in the bitmap index, and the time cost of building the secure index Algorithm 1 also increases. During secure index construction, the number of nodes is related to the number of keywords and privacy files. Therefore, when the number of privacy files or keywords increases, the number of columns or rows of the bitmap index will also increase, and the time cost of building a security index will also increase. Since the number of nodes in
is only related to the number of keywords contained in the privacy file, when the number of keywords increases, the number of nodes in
increases, and the time cost of building
(Algorithm 2) increases at the same time.
Update token generation efficiency.
Figure 6 shows the time cost of generating update tokens (Algorithm 5) (modify token, delete token, and add token) in the scheme. Since the generation of the update token involves the document identification and the number of keywords contained in the privacy file after the document identification of the modified file is determined, the latest update times of the file need to be obtained from
. Therefore, the generation efficiency of update tokens is linearly related to the number of nodes in
, and the higher the number, the longer the token generation time. However, since the added token may involve increasing the number of
nodes, the generation time will be slightly longer.
Search efficiency analysis. The
obtains the latest verification information and generates an update token. After receiving the update token, the cloud server searches for it in the security index.
Figure 7 shows the time cost of performing a search operation in the security index when the number of keywords is 10,000 and the number of private files changes from 1000 to 10,000. It can be seen that when the number of rows in the security index is fixed, that is, the number of keywords is fixed, the time cost of searching the index increases linearly with the increase in the number of columns, that is, the number of privacy files.
Figure 8 shows the time cost of searching (Algorithm 3) the
and the secure index when the number of privacy files is 10,000 and the number of keywords changes from 1000 to 10,000. Since the number of nodes in
is equal to the number of keywords, the search time also increases linearly when the number of keywords increases. When the number of columns in the security index is fixed and the number of rows in the bitmap index increases as the number of keywords increases, the search time cost increases.
In this section, we compare this scheme with what is generally regarded as the most typical verifiable SSE scheme [
54] in terms of verification efficiency and update efficiency.
Verify efficiency analysis. We made a comparative analysis of the verification (Algorithm 4) time cost of our scheme and scheme [
54]. As can be seen from
Figure 9, the verification time cost of our scheme is lower than scheme [
54]. Scheme [
54] used a bilinear mapping accumulator to verify search results, which is based on asymmetric key encryption. Our scheme is based on a
; the number of file updates is stored in the
, which is more efficient than the accumulator. As shown in
Figure 9, when the number of privacy files containing search keywords is 200, the verification time cost of scheme [
54] is roughly 5 ms, and the verification time cost of our scheme is 0.3725 ms. When the number of search keywords is 2000, the verification time cost of scheme [
54] is about 48 ms, and the verification time cost of our scheme is about 9.2437 ms. As can be seen from
Figure 10, the number of CPU clock cycles of our scheme is lower than that of scheme [
54]. Therefore, the verification efficiency of our scheme is higher than that of scheme [
54].
Update efficiency analysis. After receiving the update token, the cloud server needs to perform the corresponding Update operation (Algorithm 6) on the security index. As can be seen from
Figure 11,
Figure 12 and
Figure 13, the number of columns or rows in the security index needs to be increased due to the add and modify operation, and the time cost is slightly larger than the delete operation. The cloud server deletes the corresponding column in the security index when it performs the delete operation, and the time cost is lower. As can be seen from
Figure 11,
Figure 12 and
Figure 13, the update efficiency of our scheme is better than scheme [
54].
7. Discussion
In this section, we analyze the advantages and disadvantages of schemes [
18,
19,
43,
51], as shown in
Table 3.
[
51], Zhang’s scheme [
43], and Gao’s scheme [
19] realize the dynamic update and searchability of data through constructing an inverted index. If the update file contains many keywords, the update efficiency is relatively low. None of the above four schemes involve key management securely.
[
51] uses a one-way trap gate to realize forward security, but the calculation cost is high. Zhang’s scheme [
43] is improved on the basis of
[
51], using random states to achieve forward safety and improve efficiency. However, the correctness of the returned results is not verified; that is, the update reliability proposed in this scheme is not satisfied. Gao’s scheme [
19] requires third-party TPA to verify the integrity of search results, which requires TPA to honestly implement the verification algorithm, which requires a trap gate, data block number, RAL, and authenticator to perform related calculations, which is relatively complex. Both Ge’s scheme [
18] and our scheme used a bitmap to construct the security index, which has high updating efficiency. However, the accumulated authentication tags (AATs) in Ge’s scheme [
18] contain ciphertext data blocks, which consume additional storage resources. In our scheme, the verification process does not involve complex operations, and the verification information is simple, which makes the verification efficiency high, but the scheme also needs to consume communication resources once more. In conclusion, compared with the above schemes, our scheme is relatively efficient in the process of searching, updating, and verifying.
8. Conclusions
In this paper, we first studied the research status of the DSSE scheme and analyzed the advantages and disadvantages of different schemes. Since most schemes do not involve key management, we proposed a verifiable DSSE scheme based on the public key cryptosystem which can realize secure key management. In
Section 3, we defined the security model, design goals, core algorithm, and security analysis of the V-PDSSE scheme. In
Section 4, we described the bitmap index and verification list construction of the scheme in detail and explained the core algorithm steps of the scheme. Finally, we compared the time complexity with schemes [
18,
19,
43,
51] to prove that the implementation efficiency of our scheme is high. In
Section 5, a security analysis was carried out on the reliability and verifiability of our scheme to prove that our scheme meets the security requirements. In
Section 6, we tested the efficiency of the core algorithms and compared the efficiency of scheme [
54] in security index construction, verification list construction, searching, search result verification, and updating. The results show that our scheme has high efficiency and strong feasibility. In
Section 7, we analyzed the advantages and disadvantages of schemes [
18,
19,
43,
51].
Compared with previous schemes, the functional design of this scheme is more comprehensive. The verification process does not involve complex operations, the verification information structure is simple, and the execution efficiency is high. Our scheme can solve the problems of dynamic searchability, forward security, integrity, and correct verification of search results and key management well. In future work, given the rapid development of quantum computers and verifiable DSSE schemes, how to deal with quantum attacks effectively will become a key direction of our research.