Next Article in Journal
Synchronizing Many Filesystems in Near Linear Time
Previous Article in Journal
Task Automation Intelligent Agents: A Review
Previous Article in Special Issue
A Game Theory-Based Model for the Dissemination of Privacy Information in Online Social Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

How Can We Achieve Query Keyword Frequency Analysis in Privacy-Preserving Situations?

College of Information Science and Technology/College of Cyber Security, Jinan University, Guangzhou 510632, China
*
Author to whom correspondence should be addressed.
Future Internet 2023, 15(6), 197; https://doi.org/10.3390/fi15060197
Submission received: 16 April 2023 / Revised: 26 May 2023 / Accepted: 26 May 2023 / Published: 29 May 2023
(This article belongs to the Special Issue Cryptography in Digital Networks)

Abstract

:
Recently, significant progress has been made in the field of public key encryption with keyword search (PEKS), with a focus on optimizing search methods and improving the security and efficiency of schemes. Keyword frequency analysis is a powerful tool for enhancing retrieval services in explicit databases. However, designing a PEKS scheme that integrates keyword frequency analysis while preserving privacy and security has remained challenging, as it may conflict with some of the security principles of PEKS. In this paper, we propose an innovative scheme that introduces a security deadline to query trapdoors through the use of timestamps. This means that the keywords in the query trapdoor can only be recovered after the security deadline has passed. This approach allows for keyword frequency analysis of query keywords without compromising data privacy and user privacy, while also providing protection against keyword-guessing attacks through the dual-server architecture of our scheme. Moreover, our scheme supports multi-keyword queries in multi-user scenarios and is highly scalable. Finally, we evaluate the computational and communication efficiency of our scheme, demonstrating its feasibility in practical applications.

1. Introduction

With the exponential growth in massive data generated by the internet of things [1,2], traditional storage systems are struggling to meet the storage demands. As a result, cloud storage technology [3] based on cloud computing has gained popularity among businesses and individuals for its scalability and flexibility.
Despite the rapid and recent development of cloud storage technology, most businesses and individuals prefer to encrypt their important data before uploading it to a cloud server, due to concerns about data security. This does effectively guarantee the security of data; however, it also makes it difficult to retrieve and analyze the encrypted data.
Searchable encryption (SE) is a robust solution that addresses the challenge of searching encrypted data while preserving data confidentiality. The concept of public key encryption with keyword search (PEKS) was introduced by Boneh [4] et al. in their seminal work. A PEKS scheme typically involves three entities: a data owner, a data user, and a cloud server. To share data with data users, the data owner extracts keywords from files and encrypts them with the public key of the designated data user, thereby generating searchable ciphertexts. The data owner then sends the searchable ciphertext, together with the corresponding files, to the cloud server. Subsequently, the designated data user is the only entity capable of generating a query trapdoor, by using their private key and specifying a keyword of interest. The generated query trapdoor is then sent to the cloud server to retrieve the associated file. Finally, the cloud server will match the searchable ciphertext with the query trapdoor one-by-one, and send the matching result to the data user. Since SE was proposed, it has been applied in many fields, such as online medical care [5,6], cloud manufacturing [7], etc.
In many instances of cloud storage applications, counting the frequency of keyword queries is important to improve the search experience of users. For example, in a cloud manufacturing system [7], an upstream supplying plant (the data owner) will encrypt some production data from its workshop and store the ciphertext on a cloud server. Subsequently, the service platform providing the ciphertext retrieval service can respond to the search request of the purchasing company (the data users), and provide the supply factory that meets their needs. If another service platform that provides keyword frequency analysis services can analyze the frequency of the query keywords while ensuring the privacy of the purchasing company, the user’s search experience can be significantly optimized (e.g., by placing the data of the most frequent keywords in the best position in the database). Meanwhile, the cloud server shares analysis results with upstream supply factories, which can also help factories to make targeted production adjustments, thus greatly improving production efficiency. In our solution, the query token used by the purchasing company (the data users) to query is embedded with a timestamp specified by the user themselves, which is similar to the user setting a validity period for their query records, during which the query operation will not compromise the user’s privacy. After this validity period, a third party can recover the keyword information of the query token, but cannot associate this keyword information with the user’s identity. In this way, we propose a PEKS scheme that perfectly combines security with keyword frequency analysis. Meanwhile, the scheme has many advantages: it is multi-user, has the capacity for multi-keyword queries, and can resist KGA (keyword-guessing attack).

1.1. Our Contribution

In this work, we propose a PEKS scheme with keyword frequency analysis (KFA-PEKS), which takes into account privacy protection and frequency analysis of keywords. Furthermore, KFA-PEKS supports multi-user participation and multi-keyword search; therefore, it is suitable for distributed IoT scenarios such as cloud manufacturing [7]. Overall, our contributions in this paper are as follows:
  • We propose a novel KFA-PEKS scheme that combines searchable encryption with keyword frequency analysis, while maintaining high levels of security and scalability. Our solution allows the server to periodically analyze the frequency of queried keywords without linking this information to the user’s identity, thereby preserving user privacy.
  • We conduct a comprehensive functional comparison of our KFA-PEKS scheme with existing SE schemes. The comparison results demonstrate that our scheme is unique in achieving multi-keyword search, resistance to KGA, and frequency analysis of query keywords within a multi-writer/multi-reader model. Furthermore, the complexity analysis of our scheme indicates that it is highly practical and applicable.
  • We implement our KFA-PEKS scheme using Python, and perform extensive security evaluations, including computational and communication overhead analysis, to demonstrate its feasibility and effectiveness. Our experimental results highlight the advantages of our KFA-PEKS scheme compared to other PEKS schemes, showcasing its superior performance and suitability for real-world applications.

1.2. Paper Organization

In Section 2, we will review the relevant literature and related work. In Section 3, some preliminaries required to understand our proposed KFA-PEKS will be given. In Section 4, we will formally describe the KFA-PEKS scheme. Next, in Section 5, we will present a concrete construction of KFA-PEKS and provide a proof of its security. The efficiency of our construction will be analyzed theoretically in Section 6, and we will showcase its practicality through experimental results. Finally, in Section 7, we will conclude the paper as a whole.

2. Related Work

Since the concept of SE was first introduced in [8], SE has been divided into two categories: symmetric searchable encryption (SSE) [9,10,11] and PEKS [4,12,13,14,15]. The two have different application scenarios and construction methods. SSE generally considers the use of a single user, which is equivalent to establishing a personal encrypted cloud disk, and relies on symmetric encryption algorithms for scheme construction; PEKS, which mainly relies on public key cryptography algorithms, usually considers multi-user scenarios, such as mail systems or multi-person file-sharing systems.
The first PEKS scheme was proposed by Boneh et al. [4], wherein the data owner could use the public key of a specified data user, so that the user could use their key to generate a trapdoor for query keywords to search encrypted data. Since then, many PEKS schemes have been proposed. Among these schemes, examples in the literature, such as [16,17], support multi-keyword search; however, these schemes cannot support multiple data owners and data users at the same time. Sun et al. propose a PEKS scheme [18] supporting multiple keyword queries and multiple users, which combines a data structure representing a keyword/identity representation (T-set) in [11], and ciphertext-policy attribute-based encryption (CP-ABE) in [19]. In their proposed scheme, the data owners grant keys to the data users, requiring each data user to maintain a set of keys in order to access data that are outsourced by different data owners. In addition, because this scheme does not deal with the relationship between the ciphertext and keywords, it brings a huge storage overhead to each entity in the scheme. In 2019, Xu et al. [20] proposed a lattice-based PEKS scheme that was derived from a blind identity-based encryption (blind IBE) [21] scheme, by substituting identities with keywords. Their construction involves an identity-based PEKS that maps the identity of the data owner to a matrix, allowing the data owner to encrypt data with the identity of the data user. In 2021, Liu [22] et al. combined a subset decision mechanism on a distributed two-trapdoor public key cryptosystem [23] to construct a PEKS scheme to be applied in distributed systems. In this scheme, both the data owner and data user have their public and private key pairs, and can generate corresponding searchable ciphertexts and trapdoors with their respective public keys. However, this scheme, similarly to all previously proposed PEKS schemes, loses the availability of data; in addition, it cannot expand the potential application scenarios of public key searchable encryption, because it does not consider keyword frequency analysis of query keywords.
In 2019, Xu et al. [24] proposed the first PEKS scheme based on blind IBE [21], which can accomplish keyword frequency analysis under the premise of protecting user privacy. Unfortunately, due to its single-server setup, this solution is not immune to KGA [25,26]. In addition, it is difficult to extend the query method from a single keyword to multiple keywords. KGA is a common type of attack facing PEKS schemes. Within KGA, an attacker can generate searchable ciphertexts for any desired keyword using the public key of the data user, and can then test these ciphertexts against the search trapdoor. To mitigate KGA, various cryptographic primitives have been proposed, including public key authenticated encryption with keyword search (PAEKS) [27,28,29] and public key encryption with fuzzy keyword search (PEFKS) [30]. Based on [22], we propose a PEKS that can achieve keyword frequency analysis while protecting data privacy and user privacy. In addition, this PEKS is suitable for multi-user distributed scenarios and is resistant to KGA.

3. Preliminaries

For ease of reading, we refer to [22] for some symbols and related terms, and present the following in Table 1, where W is a universal keyword set with keyword number η , W = { ω η 1 ,   ,   ω 0 } . After that, we have a decimal number T { 0 ,   ,   2 η 1 } to represent the relationship between each keyword set W and a document D. This is expressed as follows:
T i = 1 , i f ω i i s c o n t a i n e d i n D , 0 , o t h e r w i s e
W T = { ω i | ω i W , T i = 1 } denotes the keyword set corresponding to T. When a data user (DU) initiates a query request, we use a decimal number t { 0 ,   ,   2 η 1 } to represent the relationship between each keyword set W and DU’s interest. This is expressed as follows:
t i = 1 , i f D U i s i n t e r e t s e d i n ω i , 0 , o t h e r w i s e
W t = { ω i | ω i W , t i = 1 } denotes the keyword set corresponding to t. In the description that follows, t matches T equivalently to W t W T .

3.1. Pseudorandom Function

A function F : { 0 , 1 } κ × { 0 , 1 } l { 0 , 1 } l is called a pseudorandom function (PRF) [31] if for all PPT adversary A , its advantage | P r [ A F ( k , · ) ( 1 κ ) = 1 ] P r [ A R ( · ) ( 1 κ ) = 1 ] | υ ( κ ) , where k $ { 0 , 1 } κ , R is a random function denoted as: { 0 , 1 } l { 0 , 1 } l , and υ ( κ ) is negligible.

3.2. Subset Decision Mechanism (SDM)

Given a universal set W, and two subsets and W T and W t , and these sets being represented in decimal integers and binary representations, the role of the SDM [22] is to perform some calculations on these decimal integers and binary representations to determine whether W t W T . In this paper, the SDM is used as an important component of the matching process. In the next description, we let W denote the set of all keywords whose binary and decimal forms are denoted as ( 1 ,   ,   1 ) and S U M , and W T denote the set of keywords corresponding to the document uploaded by the data owner (DO), which is represented in binary and decimal form as ( T η 1 ,   ,   T 0 ) and T, respectively. Furthermore, we let W t denote the set of keywords corresponding to the data user’s (DU’s) query request in binary and decimal form as ( t η 1 ,   ,   t 0 ) and t, respectively.
The key for the SDM to determine W t W T is to ensure that there is no i that satisfies t i = 1 and T i = 0 . In the SDM, the inputs are a universal keyword set W = { ω η 1 ,   ,   ω 0 } with two subsets WT and Wt such that W T , W t W . The outputs are a result value R , which represents whether W t W T . The specific process can be described as follows. We first compute the complement ( ¬ T η 1 ,   ,   ¬ T 0 ) of the binary of T and the corresponding integer ¬ T . Then, perform a bitwise addition of ¬ T i and t i to obtain C i . Finally, if there is no C i = 2 , then, W t W T ; otherwise, W t W T . The working mechanism of the SDM is described formally in Algorithm 1.
Algorithm 1 Subset Decision Mechanism Source: [22], Algorithm 1 Subset Decision
Input:
  A universal set W = { ω η 1 ,   ,   ω 0 } , two subsets W t , W T W .
Output:
  Whether W t W T .
  1:
Compute the binary representations ( T η 1 ,   ,   T 0 ) , ( t η 1 ,   ,   t 0 ) of W T , W t .
  2:
Compute the complement ( ¬ T η 1 ,   ,   ¬ T 0 ) of ( T η 1 ,   ,   T 0 ) .
  3:
Set i = 0 , R = 1 .
  4:
while  i < η  do
  5:
    C i = ¬ t i + T i ;
  6:
    D i = 2 C i ;
  7:
    R = R · D i ;
  8:
end while
  9:
if  R = 0  then
10:
   return  W t W T
11:
else
12:
   return  W t W T
13:
end if
In addition, we are able to accelerate the execution of the SDM when W t W T . That is, when W t W T , there is always s u m = t + ¬ T S U M = 2 η 1 . We describe this process in Algorithm 2.
Algorithm 2 Subset Decision Mechanism With Modification. Source: [22], Algorithm 1 Subset Decision with Modification
Input:
  A universal set W = { ω η 1 ,   ,   ω 0 } , two subsets W t , W T W .
Output:
  Whether W t W T .
1:
We additionally compute the decimal integers T , t of W T , W t , S U M = 2 η 1 of W, ¬ T = S U M T and s u m = ¬ T + t after executing steps 1 and 2 in Algorithm 1.
2:
if  s u m > S U M  then
3:
   return  W t W T
4:
else
5:
   Go to Step 3 of Algorithm 1.
6:
end if

3.3. Secure Bit-Decomposition (SBD)

For a ciphertext [ c ] p k encrypted with p k , SBD [32] can encrypt each bit of c, and will not leak information about c to the two parties involved in the calculation in SBD. A more standardized expression of the SBD protocol is as follows.
SBD ( [ c ] p k ) ( [ c η 1 ] p k ,   ,   [ c 0 ] p k ) .

3.4. DT-PKC

A distributed two-trapdoor public key cryptosystem (DT-PKC) [23] is a toolkit that can securely handle common integer operations across different cryptographic domains, meaning it will become an important part of the KFA-PEKS.

3.4.1. Basic Structure

Our scheme takes a similar introduction to DT-PKC as that of [22]. Among them, since this paper needs to use the joint decryption of dual servers, the P S D e c 1 algorithm and P S D e c 2 algorithm in [23] are retained and rewritten as P M D e c 1 algorithm and P M D e c 2 algorithm in this paper. Furthermore, this paper does not involve the joint decryption between users, so the PWDec1 and PWDec2 algorithms are omitted. It is important to emphasize that this change will not affect the content that follows. The DT-PKC infrastructure consists of eight algorithms: K e y G e n , E n c , U d e c , M D e c , M k e y S , P M D e c 1 , P M D e c 2 , and C R . The definition of DT-PKC accepted in [23], with slight modifications, is as follows:
  • K e y G e n : Given the security parameter κ , KGC finds two large primes p, q such that L ( p ) = L ( q ) = κ , where L ( x ) refers to the length of the parameter x, then computes N = p q , p = ( p 1 ) 2 , q = ( q 1 ) 2 , of which p and q are two strong primes. Simultaneously, KGC also chooses a generator of the order ( p 1 ) ( q 1 ) 2 , and chooses a random number θ such that θ [ 1 , N 4 ] . Finally, KGC obtains the public key p k i = ( N , g , h i = g θ ) and the private key s k i = θ for user i, and computes the master key M S K = λ = l c m ( p 1 , q 1 ) 2 , where l c m ( a , b ) refers to finding the least common multiple of a and b.
  • E n c r y p t i o n ( E n c ) : Enter the plaintext m and the public key p k i and choose a random number r ( r [ 1 , N 4 ] ) to generate the ciphertext [ m ] p k i = { C i , 1 , C i , 2 } , where C i , 1 = g r θ ( 1 + m N ) m o d N 2 ; T i , 2 = g r m o d N 2 .
  • Decryption With User’s Private Key (UDec): Enter the ciphertext [ m ] p k i and the private key s k i to generate the corresponding plaintext m as follows:
    m = C i , 1 C i , 2 m o d N 2 ,
    m = m 1 N .
  • Decryption With Master Key (MDec): Enter the ciphertext [ m ] p k i and the master key M S K to generate the corresponding plaintext m by first calculating:
    C i , 1 λ m o d N 2 = g λ · θ · r ( 1 + m N λ ) m o d N 2 = ( 1 + m N λ ) .
    Then, since g c d ( λ , N ) = 1 , we are able to obtain m by the following expression:
    m * = C i , 1 λ m o d N 2 ,
    m = m * 1 N · λ 1 m o d N .
  • Master Key Splitting (MkeyS): Enter the master key M S K to generate two partial master keys M S K 1 = λ 1 and M S K 2 = λ 2 such that λ 1 + λ 2 0 m o d λ and λ 1 + λ 2 1 m o d N 2 .
  • Partial Decryption With Partial Master Key Step One (PMDec1): Enter the ciphertext [ m ] p k i and the partial master key M S K 1 to generate the partial ciphertext C T i 1 as follows:
    C T i 1 = ( C i , 1 ) λ 1 = g r θ λ 1 ( 1 + m N λ 1 ) m o d N 2 .
  • Partial Decryption With Partial Master Key Step Two (PMDec2): Enter the partial ciphertext C T i 1 , the ciphertext [ m ] p k i and the partial master key M S K 2 to generate the corresponding plaintext m by first calculating:
    C T i 2 = ( C i , 1 ) λ 2 = g r θ λ 2 ( 1 + m N λ 2 ) m o d N 2 .
    Then, we can obtain the plaintext m by computing:
    C T = C T i 1 · C T i 2 ,
    m = C T 1 N .
  • Ciphertext Refresh (CR): Enter the ciphertext [ m ] p k θ and a random number r Z N to generate another ciphertext [ m ] p k θ = { C i , 1 , C i , 2 } , where
    C i , 1 = T i , 1 · h i r ;   C i , 2 = C i , 2 · g r .

3.4.2. Sub-Protocols

DT-PKC is capable of deriving several sub-protocols, which can be found in [23]. Since the sub-protocols S A D and S M D of DT-PKC introduced in [22] are involved in our scheme, only the sub-protocols S A D and S M D are briefly described later. The following is a brief description of the sub-protocols S A D and S M D :
  • Secure Addition Protocol across Domains (SAD): Enter the two ciphertexts [ m 1 ] p k θ 1 and [ m 2 ] p k θ 2 , the partial master keys M S K 1 and M S K 2 , and the public keys p k θ 1 , p k θ 2 and p k θ 3 to generate the addition of two ciphertexts [ m 1 + m 2 ] p k θ 3 in different encryption domains.
  • Secure Multiplication Protocol across Domains (SMD): Enter the same inputs as the SAD to generate the multiplication of two ciphertexts [ m 1 · m 2 ] p k θ 3 in different encryption domains.

4. KFA-PEKS

4.1. System Model

The following entities are included in our scheme: a key generation center (KGC), a matching server (MS), a keyword frequency server (KFS), a time server (TS), data owners (DOs), and data users (DUs). A brief description of our system model is given in Figure 1.
  • KGC: The role of the KGC is to generate public parameters and distribute the corresponding keys to the various entities involved in the scheme.
  • MS: The MS mainly provides secure file storage services for the DOs and secure file search services for the DUs.
  • KFS: The KFS collaborates with the MS in processing the DUs’ file query requests, and conducts keyword frequency analysis on the DUs’ queries, while ensuring the privacy of the DUs, in order to provide enhanced file storage and lookup services to both the DOs and DUs.
  • TS: The TS periodically sends time tokens embedded with the current timestamp to the MS, which are then used to recover keyword information from the DUs’ queries in order to conduct keyword frequency analysis.
  • DO: Each DO generates their own public–private key pair based on the public parameters. They then extract keywords from a file to generate searchable ciphertext, which is sent to the MS along with the file for secure storage and search services.
  • DU: Each DU generates their own public–private key pair based on the public parameters and generates a query trapdoor for a set of keywords of interest. Only the DU has the ability to decrypt the queried results received from the MS in order to obtain the corresponding file index, thereby ensuring privacy and security.

4.2. Threat Model

We assume that the KGC is an honest party, assigning corresponding keys to other members in KFA-PEKS.
The MS and the KFS are assumed to be a pair of non-colluding semi-honest adversaries (specific references in [33]). In KFA-PEKS, the MS will try to obtain information about trapdoors (e.g., keyword information and corresponding documentation information); the KFS will infer query keywords while analyzing the frequency of query keywords corresponding to the user information. Non-collusion means that there will be no unnecessary interaction between the MS and the KFS, except for query calculation and keyword frequency analysis.
We assume that the TS, DOs, and DUs are semi-honest, that the TS will strictly generate the correct time token based on the current time and send it to the KFS, and that the DOs and DUs will also honestly execute individual protocols in KFA-PEKS. However, they will all try to gain access to the private data of other members of the scheme.
Based on the definition of individual entities given above, we introduce an adversary A * into our model that is capable of compromising some of the entities in the model. For example, A * can conspire with the MS or KFS (but not both) to guess the information in the ciphertext. Furthermore, A * can conspire with several DOs or DUs to obtain the decryption capability of the corresponding ciphertext or trapdoor. Such a security model has strong practical implications, and we recommend that interested readers consult [23], upon which we will not expand in this paper for brevity reasons.

4.3. Security Goals

Due to the need for user privacy, even if these keywords are restored by the KFS after T S O , the DUs’ identities should not be linked to the keyword they queried. Furthermore, these keywords should also not have been disclosed prior to the T S O . Specifically, the security objectives of our program should achieve the following five points:
  • Keyword privacy of DUs Before TSO: The DUs’ query operations will not leak keyword information until the current time reaches the T S O specified by the DU.
  • Indistinguishability of DO’s searchable ciphertext: The searchable ciphertext uploaded by the DO does not reveal any information related to the file.
  • Keyword privacy for DUs after TSO: After reaching the T S O specified by the DU, only the KFS can recover the queried keywords, and for others, the trapdoor still retains its previous security.
  • Unlinkability of recovered keywords and DUs’ identities: The KFS is unable to locate the relevant DU through recovering the keyword.
  • Resist KGA: The focus of this paper is on offline keyword-guessing attacks by internal attackers, that is, on preventing internal attackers, such as untrusted cloud servers, from performing exhaustive attacks on keyword information in encrypted searchable ciphertext or trapdoors.

4.4. Syntax

Public key encryption with keyword search supporting keyword frequency analysis is a protocol shared among a KGC, an MS, a KFS, a TS, multiple DPs, and multiple DUs, as follows:
  • S e t u p ( κ ) → ( P P , r , r 2 ) : Given the security parameter κ , the KGC generates the public parameter P P , a value shared by the KFS and DUs, and a key r 2 , shared by the TS and DUs.
  • K e y G e n S e r ( P P ) → ( M S K M S , M S K K F S ): Given the public parameter P P , the KGC generates two keys for the MS and KFS, M S K M S and M S K K F S .
  • K e y G e n D O ( P P ) → ( p k D O , s k D O ) : Given the public parameter P P , the KGC generates the public–private key pair ( p k D O , s k D O ) for the DOs.
  • K e y G e n D U ( P P ) → ( p k D U , s k D U ): Given the public parameter P P , the KGC generates the public–private key pair ( p k D U , s k D U ) for the DUs.
  • K e y G e n T S ( P P ) → ( p k T S , s k T S ): Given the public parameter P P , the KGC generates the public–private key pair ( p k T S , s k T S ) for the TS.
  • P E K S (W, W T , p k D O ) → [ T ] p k D O : Given the universal keyword set W, a keyword subset W T , and the DO’s public key, the DO computes the searchable ciphertext [ T ] p k D O .
  • T r a p d o o r (W, r, r 2 , s k D U , p k T S , T S O ) → [ t ] P K : Given the universal keyword set W, the value r, the key r 2 , the DU’s secret key s k D U , the TS’s public key p k T S , and a timestamp, the DU computes an encryption key P K and the trapdoor [ t ] P K .
  • T e s t ( [ T ] p k D O , [ t ] P K , M S K M S , M S K K F S , p k D O , P K ) → [ 1 ] P K or [ 0 ] P K : Given the searchable ciphertext [ T ] p k D O , trapdoor [ t ] P K , the MS’s secret key M S K M S , the KFS’s secret key M S K K F S , the DO’s public key p k D O , and the encryption key P K , the MS and the KFS compute a test result. The DU can compute the test result and outputs 1 if W t W T , or otherwise, 0.
  • T i m e T o k e n G e n ( s k T S , p k D U , r 2 , T S N ) → T T : Given the TS’s secret key s k T S , the DU’s public key p k D U , the key r 2 , and a timestamp, the TS generates a time token T T .
  • O p e n K e y w o r d ( [ t ] P K , r, T T ) → t: Given the encryption key P K , the value r, and a time token T T , the KFS can recover to obtain t.

4.5. Correctness

The correctness of the solution KFA-PEKS is mainly reflected in two aspects: the correctness of the search results and the correctness of the keyword frequency analysis. Specifically, we express the correctness as follows:
  • Correctness of search results:
    -
    T e s t ( [ T ] p k D O , [ t ] P K , M S K M S , M S K K F S , p k D O , P K ) → 1 if, and only if, W t W T .
    -
    T e s t ( [ T ] p k D O , [ t ] P K , M S K M S , M S K K F S , p k D O , P K ) → 0 if, and only if, W t W T .
  • Correctness of keyword frequency analysis:
    -
    O p e n K e y w o r d ( [ t ] P K , r, T T ) → t if, and only if, T S N T S O .
    -
    O p e n K e y w o r d ( [ t ] P K , r, T T ) → ¬ t . if, and only if, T S N T S O .

5. Construction

5.1. The Concrete Construction

For clarity and visibility, we denote the subset decision mechanism as the SDM in the description that follows. The KFA-PEKS is constructed as follows:
  • S e t u p ( κ ) → P P : Given the security parameter κ , the KGC finds two large primes p, q such that L ( p ) = L ( q ) = κ , where L ( x ) refers to the length of the parameter x, then computes N = p q , p = ( p 1 ) / 2 , q = ( q 1 ) / 2 , of which p and q are two strong primes. Simultaneously, the KGC also chooses a generator of the order ( p 1 ) ( q 1 ) / 2 and initializes a set of keywords W, the total number of which is η . Later the KGC sends a secret value r to the KFS and the DUs, and sends P P = (W, η , N, g) to others.
  • K e y G e n S e r ( P P ) → ( M S K M S , M S K K F S ): The KGC first executes K e y G e n in DT-PKC to obtain the master key M S K = λ , and then executes M k e y S in DT-PKC to generate two partial master keys M S K 1 = λ 1 , M S K 2 = λ 2 . Finally, the KGC sends M S K M S = M S K 1 = λ 1 to the MS, the partial master key M S K K F S = M S K 2 = λ 2 to the KFS, and keeps M S K = λ secret.
  • K e y G e n D O ( P P ) → ( p k D O , s k D O ) : Each DO chooses a random number s k D O = θ 1 and calculates h D O = g θ 1 . They then publish p k D O = ( N , h D O , g ) as their public key, and keep s k D O as their private key.
  • K e y G e n D U ( P P ) → ( p k D U , s k D U ): Each DU chooses a random number s k D U = θ 2 and calculates h D U = g θ 2 . They then publish p k D U = ( N , h D U , g ) as their public key, and keep s k D U as their private key.
  • K e y G e n T S ( P P ) → ( p k T S , s k T S ): Each TS chooses a random number s k T S = θ 3 and calculates h T S = g θ 3 . They then publish p k T S = ( N , h T S , g ) as their public key, and keep s k T S as their private key.
  • P E K S (W, W T , p k D O ) → [ T ] p k D O : Firstly, DO chooses a random number r 1 , and generates W T based on W such that W T W and computes the corresponding decimal number T according to its binary representation; then, encrypts T with their public key p k D O to obtain the encrypted searchable ciphertext [ T ] p k D O = ( T D O , 1 , T D O , 2 ), which specifically is as follows:
    T D O , 1 = g r 1 θ 1 ( 1 + T N ) m o d N 2
    T D O , 2 = g r 1 m o d N 2
    Finally, the DO sends [ T ] p k D O to the MS.
  • T r a p d o o r (W, r, r 2 , s k D U , p k T S , T S O ) → [ t ] P K : the DU uses their own private key s k D U and TS’s public key p k T S to generate an encryption key P K = g θ 2 θ 3 . They then generate the keyword set W t of interest based on T such that W t W and compute the corresponding decimal number t according to its binary representation. At the same time, the DU runs a PRF F for the specified point in order to obtain a timestamp, and shares the key r 2 with the TS. Subsequently, the DU encrypts t with P K , a secret value r (shared with the KFS) and their specified timestamp, and obtains the trapdoor [ t ] P K = t D U , 1 , t D U , 2 , which specifically is as follows:
    t D U , 1 = g θ 2 · θ 3 · r · F ( r 2 , T S O ) ( 1 + t N ) m o d N 2
    t D U , 2 = g θ 3 m o d N 2 .
    Finally, the DU sends [ t ] P K to the MS.
  • T e s t ( [ T ] p k D O , [ t ] P K , M S K M S , M S K K F S , p k D O , P K ) → [ 1 ] P K or [ 0 ] P K : After receiving [ T ] p k D O and [ t ] P K , the MS can perform the following four steps with the KFS to obtain an encrypted query result and send it to the DU.
    -
    step 1: After the MS receives [ T ] p k D O and [ t ] P K , it will execute the S B D protocol with the KFS to calculate the ciphertext of each bit of ¬ T and t, i.e., [ ¬ T i ] p k D O and [ t i ] P K for i { 0 , . . . , η 1 } .
    -
    step 2: After obtaining [ ¬ T i ] p k D O and [ t j ] P K , the MS can calculate the ciphertext of each C i and D i mentioned in the SDM, i.e., [ C i ] P K and [ D i ] P K together with the KFS, and finally the MS sends the randomized matching result to the DU.
    -
    step 3: After obtaining [ D i ] P K , the MS and KFS together calculate the matching result and the randomized value, i.e., [ R ] P K and [ F ] P K .
    -
    step 4: After receiving [ F ] P K , the DU decrypts the final matching result. A result of 0 means that the file does not match. Any other result means that the current file does match, in which case the DU will request this file from the MS.
  • T i m e T o k e n G e n ( s k T S , p k D U , r 2 , T S N ) → T T : The TS uses s k T S and p k D U to deal with the current timestamp T S N and obtain a time token T T = g θ 2 · θ 3 · F ( r 2 , T S N ) m o d N 2 , this is sent periodically to the KFS.
  • O p e n K e y w o r d ( [ t ] P K , r, T T ) → t: The KFS receives the T T , and when the timestamp in the T T is consistent with the timestamp specified by the DU, then, the KFS can recover the keyword information in the trapdoor with r and T T , so as to analyze the frequency of the queried keywords later. The recovery process is as follows:
    t = t D U , 1 ( T T ) r 1 N m o d N 2

5.2. Process of KFA-PEKS

Figure 2 shows specifically the workflow of the 6 members and 10 algorithms in KFA-PEKS.
Notably, since T S O in the query trapdoor is specified by the DUs, the DUs can better grasp the conditions under which the query keywords are recovered, and, therefore, can better protect the privacy of users. A DU can generate multiple trapdoors with different T S O s , so that the T S O of this trapdoor is a time interval, i.e., the time interval is between T 1 = “10:00:00 AM January 1, 2022 GMT” and T 2 = “10:00:00 AM December 1, 202 GMT”. In this way, only when the current time T satisfies T 1 T T 2 , can the KFS open the queried trapdoors using the time token TT of the TS.

5.3. Security Proof

On the premise of being correct, our scheme should also satisfy the five security definitions we mentioned earlier: keyword privacy of DUs before T S O , indistinguishability of DO’s searchable ciphertext, keyword privacy of DUs after T S O , unlinkability of recovered keywords and DUs’ identities, resistance to internal and external keyword-guessing attacks. Below, we will prove each of these five security requirements by means of the following Theorems 1–5.
Theorem 1. 
The keyword privacy of the DU is protected until the T S O specified by the DU is reached.
Proof. 
The DU generates a trapdoor which is sent to the MS to perform a query. Then, the MS executes the Test algorithm together with the KFS, and returns the test result to the DU. It is required that the test result can only be read by the DU, and the trapdoor must not reveal any information about the keywords before reaching the T S O specified by the DU. It is worth noting that in our scheme, the encryption key for the trapdoor must be generated separately by the TS or the DU, while the encryption key for the searchable ciphertext is the public key of the DO; therefore, for the same keyword, the trapdoor is uncorrelated with the searchable ciphertext, meaning that common chosen keyword attacks (CKA) can be avoided in PEKS. We use A to denote the adversary in the indistinguishability game, and C to denote the challenger; this game can be represented as follows:
First, A chooses a time point T * to attack, and selects two different sets of keywords of interest, W 1 and W 2 , which will be sent to C ; then, C randomly selects one of the two sets to generate a trapdoor which is then sent to A . Finally, A tries to distinguish which set of keywords corresponds to the trapdoor returned by C .
In the following, if our scheme passes the indistinguishability game of trapdoor privacy (IND-TP), then the scheme can be deemed IND-TP secure.
  • Setup: First, A chooses a point in time T at which to make the attack. Then, C executes algorithms S e t u p , K e y G e n S e r , K e y G e n D U , K e y G e n D O , and K e y G e n T S to obtain P P , ( M S K M S , M S K K F S ) , ( p k D U , s k D U ) , ( p k D O , s k D O ) , and ( p k T S , s k T S ) . Furthermore, it sends ( M S K M S , p k D U , p k D O , p k T S ) to A .
  • Phase 1:  A interacts with C , which executes the T i m e T o k e n G e n algorithm in an imitation of the TS, and sends the resulting time token to A .
  • Challenge:  A picks two different keyword sets W 0 , W 1 W, and sends t 0 and t 1 , representing the two sets, to C . Then, C picks b R { 0 , 1 } , runs T r a p d o o r (W, r, s k D U , p k T S , T S O ) → [ t b ] P K , and sends [ t b ] P K to A .
  • Phase 2:  A can continue with the first phase of queries while also executing the algorithm P E K S in an imitation of the MS interacting with the KFS.
  • Guess:  A starts guessing b. If the given guess b satisfies b = b , then A wins this game.
If A can win the game described above with a non-negligible probability, then the KFA-PEKS is deemed IND-TP secure.
Suppose there exists an adversary A who can win the game described above with a non-negligible probability. Let C be the challenger of the trapdoor privacy experiment before TSO in KFA-PEKS, and here construct a simulator B to exploit adversary A to defeat the DDH hardness problem based on Z N 2 * .
Let ( g , g x , g y , g x y , g z ) be a set of examples of DDH hardness problems based on Z N 2 * . Challenger C first uses the generating element g to construct the public parameters of the scheme P P = ( W , η , N , g ) . Then, challenger C uses P P by itself to run S e t u p , K e y G e n S e r , K e y G e n D O , K e y G e n D U , and K e y G e n T S , and obtains the secret value r shared by the KFS and the DU, the secret value r 2 shared by the TS and the DU, and some of the MS and KFS’s partial master key ( M S K M S , M S K K F S ) , the DO’s public–private key pairs ( p k D O , s k D O ) , the DU’s public–private key pairs ( p k D U , s k D U ) , and the TS’s public–private key pairs ( p k T S , s k T S ) . Then, challenger C sends ( P P , M S K M S , p k D O , p k D U , p k T S ) to adversary A and ( P P , M S K K F S , p k D O , p k D U , p k T S , s k T S , r , r 2 ) to simulator B . Adversary A chooses two keyword sets W 0 and W 1 and corresponding trapdoor [ t ] P K b , where b { 0 , 1 } . Next, A sends ( W 0 , W 1 , [ t ] P K b ) to B . Finally, B gives a value b to guess the set of keywords W b corresponding to the trapdoor [ t ] P K b , where b { 0 , 1 } . Throughout the process, simulator B allows adversary A to perform the following oracle queries with the following responses:
  • Query Trap ( W b , pk DU ) : Input a keyword set W b and the DU’s public key p k D U , because simulator B has the TS’s private key s k T S , so it can obtain the encryption key P K = g θ 2 θ 3 ; then, simulator B runs the T r a p d o o r algorithm to generate the query trapdoor [ t ] P K b corresponding to keyword set W b , and then, simulator B sends this query trapdoor to adversary A .
  • Query TS ( TSN ) : The current timestamp T S N is input, and since the simulator B has the secret value r shared by the KFS and the DU and the secret value r 2 shared by the TS and the DU, it can obtain the processed timestamp g r · F ( r 2 , T S N ) and give this value to the adversary A .
  • Query Test ( [ T ] pk DO , [ t ] PK b , MSK MS , MSK KFS , pk DO , PK ) : In this stage of interrogation, adversary A has the partial master key M S K M S of the MS, and simulator B uses the partial master key M S K K F S of the KFS to cooperate with adversary A to execute the T e s t algorithm and send the matching result to adversary A . For trapdoor [ t ] P K b = ( t D U , 1 , t D U , 2 ) , it satisfies:
    t D U , 1 = g θ 2 · θ 3 · r · F ( r 2 , T S O ) ( 1 + t b N ) m o d N 2 ,
    t D U , 2 = g θ 2 θ 3 m o d N 2
At this point, the simulator B gives the guessed value b by a non-negligible advantage such that b = b . This contradicts the DDH difficulty problem based on Z N 2 * , so the query trapdoor satisfies indistinguishability in this scheme until it reaches the T S O specified by the DU. □
Theorem 2. 
The searchable ciphertext does not reveal information about its own keywords.
Proof. 
When the document is uploaded to the MS, the DO should also attach the corresponding encrypted searchable ciphertext. It is required that the encrypted searchable ciphertext should not reveal any information about its underlying keywords. Similar to the previous security requirements, trapdoor queries and ciphertext queries are not considered. The indistinguishability of the searchable ciphertext IND-SC is defined by an interaction experiment between challenger C and adversary A , which is represented as follows:
  • Setup: First, A chooses a point in time T at which to make the attack. Then, C executes algorithms S e t u p , K e y G e n S e r , K e y G e n D U , K e y G e n D O , and K e y G e n T S to obtain P P , ( M S K M S , M S K K F S ) , ( p k D U , s k D U ) , ( p k D O , s k D O ) , and ( p k T S , s k T S ) . Furthermore, it sends ( M S K M S , p k D U , p k D O , p k T S ) to A .
  • Challenge:  A picks two different keyword sets W 0 , W 1 W, and sends T 0 , T 1 , representing W 0 and W 1 , to C . Then, C picks b R { 0 , 1 } , runs P E K S (W, W T , p k D O ) → [ T ] p k D O , and sends [ T b ] P K to A .
  • Guess:  A starts guessing b. If the given guess b satisfies b = b, then A wins this game.
If A can win the game described above with a non-negligible probability, then the KFA-PEKS is deemed IND-SC secure.
Suppose there exists an adversary A who can win the game described above with a non-negligible probability. Let C be the challenger of the searchable ciphertext indistinguishability experiment in KFA-PEKS, and here construct a simulator B to exploit adversary A to defeat the DDH hardness problem based on Z N 2 * .
Let ( g , g x , g y , g x y , g z ) be a set of examples of DDH hardness problems based on Z N 2 * . Challenger C first uses the generating element g to construct the public parameters of the scheme P P = ( W , η , N , g ) . Then, challenger C uses P P by itself to run S e t u p , K e y G e n S e r , K e y G e n D O , K e y G e n D U , and K e y G e n T S , and obtains the secret value r shared by the KFS and the DU, the secret value r 2 shared by the TS and the DU, and some of the MS and KFS’s partial master key ( M S K M S , M S K K F S ) , the DO’s public–private key pairs ( p k D O , s k D O ) , the DU’s public–private key pairs ( p k D U , s k D U ) , and the TS’s public–private key pairs ( p k T S , s k T S ) . Then, challenger C sends ( P P , M S K M S , p k D O , p k D U , p k T S ) to adversary A and ( P P , M S K K F S , p k D O , p k D U , p k T S , s k T S , r , r 2 ) to simulator B . Adversary A chooses two keyword sets W 0 and W 1 and corresponding trapdoor [ t ] P K b , where b { 0 , 1 } . Next, A sends ( W 0 , W 1 , [ t ] P K b ) to B . Finally, B gives a value b to guess the set of keywords W b corresponding to the trapdoor [ t ] P K b , where b { 0 , 1 } . Throughout the process, simulator B allows adversary A to perform the following oracle queries with the following responses:
  • Query Cipher ( W b , pk DO ) : Input the keyword set W b and the DO’s public key p k D O , and simulator B runs the P E K S algorithm to generate the searchable ciphertext [ T ] p k D O b = ( T D O , 1 , T D O , 2 ) corresponding to the keyword set W b , where the ciphertext structure is as follows:
    T D O , 1 = g r 1 θ 1 ( 1 + T b N ) m o d N 2 ,
    T D O , 2 = g r 1 m o d N 2
    Then, simulator B sends this searchable ciphertext to adversary A .
At this point the simulator B gives the guess value b by a non-negligible advantage such that b = b . This contradicts the DDH difficulty problem based on Z N 2 * , so the searchable ciphertext in this scheme satisfies indistinguishability. □
Theorem 3. 
Upon receipt of a time token from TS, the keyword information in trapdoor cannot be recovered by anyone except the KFS and the corresponding DU, based on the intractability of DDH’s assumption over Z N 2 * [34] (more specific information about the hardness of DDH’s assumption over Z N 2 * can be found in [34]).
Proof. 
After reaching the timestamp T S O specified by the DU, only the KFS can obtain the keyword information in the trapdoor. Similar to the previous security requirements, trapdoor queries and ciphertext queries are not considered. The trapdoor privacy after TSO is defined by an interaction experiment IND-sT-TP (the indistinguishability trapdoor privacy with a specific TSO) between challenger C and adversary A , which is expressed as follows:
First, A chooses a time point T * to attack, and selects two different sets of keywords of interest, W 1 and W 2 , which will be sent to C ; then, C randomly selects one of the two sets to generate a trapdoor which is then sent to A . Finally, A tries to distinguish which set of keywords corresponds to the trapdoor returned by C .
In the following, if our scheme passes the indistinguishability game of trapdoor privacy with a specific TSO (IND-sT-TP), then the scheme can be deemed IND-sT-TP secure.
  • Setup: First, A chooses a point in time T at which to make the attack. Then, C executes algorithms S e t u p , K e y G e n S e r , K e y G e n D U , K e y G e n D O , and K e y G e n T S to obtain P P , ( M S K M S , M S K K F S ) , ( p k D U , s k D U ) , ( p k D O , s k D O ) , and ( p k T S , s k T S ) . Furthermore, it sends ( M S K M S , p k D U , p k D O , p k T S ) to A .
  • Phase 1:  A interacts with C , which executes the T i m e T o k e n G e n algorithm in an imitation of the TS, and sends the resulting time token to A .
  • Challenge: A picks two different keyword sets W 0 , W 1 W, and sends t 0 and t 1 , representing the two sets, to C . Then, C picks b R { 0 , 1 } , runs T r a p d o o r (W, r, s k D U , p k T S , T S O ) → [ t b ] P K , and sends [ t b ] P K to A .
  • Phase 2:  A can continue with the first phase of queries while also executing the algorithm P E K S in an imitation of the MS interacting with the KFS.
  • Guess:  A starts guessing b. If the given guess b satisfies b = b , then A wins this game.
If A can win the game described above with a non-negligible probability, then the KFA-PEKS is deemed IND-sT-TP secure.
Suppose there exists an adversary A who can win the game described above with a non-negligible probability. Let C be the challenger of the trapdoor privacy experiment after T S O in KFA-PEKS, and here construct a simulator B to exploit adversary A to defeat the DDH hardness problem based on Z N 2 * .
Let ( g , g x , g y , g x y , g z ) be a set of examples of DDH hardness problems based on Z N 2 * . Challenger C first uses the generating element g to construct the public parameters of the scheme P P = ( W , η , N , g ) . Then, challenger C uses P P by itself to run S e t u p , K e y G e n S e r , K e y G e n D O , K e y G e n D U , and K e y G e n T S , and obtains the secret value r shared by the KFS and the DU, the secret value r 2 shared by the TS and the DU, and some of the MS and KFS’s partial master key ( M S K M S , M S K K F S ) , the DO’s public–private key pairs ( p k D O , s k D O ) , the DU’s public–private key pairs ( p k D U , s k D U ) and the TS’s public–private key pairs ( p k T S , s k T S ) . Then, challenger C sends ( P P , M S K M S , p k D O , p k D U , p k T S ) to adversary A and ( P P , M S K K F S , p k D O , p k D U , p k T S , s k T S , r , r 2 ) to simulator B . Adversary A chooses two keyword sets W 0 and W 1 and corresponding trapdoor [ t ] P K b , where b { 0 , 1 } . Next, A sends ( W 0 , W 1 , [ t ] P K b ) to B . Finally, B gives a value b to guess the set of keywords W b corresponding to the trapdoor [ t ] P K b , where b { 0 , 1 } . Throughout the process, simulator B allows adversary A to perform the following oracle queries with the following responses:
  • Query Trap ( W b , pk DU ) : Input a keyword set W b and DU’s public key p k D U , because simulator B has the TS’s private key s k T S , so it can obtain the encryption key P K = g θ 2 θ 3 ; then simulator B runs T r a p d o o r algorithm to generate the query trapdoor [ t ] P K b corresponding to keyword set W b , and then simulator B sends this query trapdoor to adversary A .
  • Query TS ( TSN ) : The current timestamp T S N is input, and since the simulator B has the secret value r shared by the KFS and the DU and the secret value r 2 shared by the TS and the DU, it can obtain the processed timestamp g r · F ( r 2 , T S N ) and give this value to the adversary A .
  • Query Test ( [ T ] pk DO , [ t ] PK b , MSK MS , MSK KFS , pk DO , PK ) : In this stage of interrogation, adversary A has the partial master key M S K M S of the MS, and simulator B uses the partial master key M S K K F S of the KFS to cooperate with adversary A to execute the T e s t algorithm and send the matching result to adversary A . For trapdoor [ t ] P K b = ( t D U , 1 , t D U , 2 ) , it satisfies:
    t D U , 1 = g θ 2 · θ 3 · r · F ( r 2 , T S O ) ( 1 + t b N ) m o d N 2 ,
    t D U , 2 = g θ 2 θ 3 m o d N 2
At this point the simulator B gives the guessed value b by a non-negligible advantage such that b = b . This contradicts the DDH difficulty problem based on Z N 2 * , so the query trapdoor satisfies indistinguishability in this scheme until it reaches the T S O specified by the DU. □
Theorem 4. 
The KFS is unable to locate the relevant DU through the keyword being recovered.
Proof. 
In Theorem 3, we prove that after reaching the time point T S O , specified by the DU, only the KFS can open the keyword information in the corresponding trapdoor, for the purpose of performing keyword frequency analysis. In order to ensure the privacy of DUs while performing keyword frequency analysis, our solution needs to prevent the KFS from finding the corresponding DUs through the recovered keywords. In the specific structure of our scheme, the MS is responsible for providing file storage services to the DO, and for responding to query requests initiated by the DU at the same time. When performing keyword frequency analysis, the MS sends the query trapdoor of the DU to the KFS, and then the KFS uses the time token of the TS to recover the queried keywords of the DUs. During the entire process, the KFS does not interact with the DUs. On the premise that the MS does not collude with the KFS, the KFS can only recover the query keywords, but cannot link them to the corresponding DUs through the recovery results. In this way, the security required by the scheme is achieved. □
Theorem 5. 
Internal or external attackers cannot obtain any information about users’ keywords from encrypted searchable ciphertext or query trapdoors.
Proof. 
In the specific structure of the scheme, the result of the joint MS and KFS calculation is a secret value which is encrypted with the DU’s public key. Therefore, only the DU can know the query result. In this way, even if an external attacker can construct enough encrypted searchable ciphertext to perform exhaustive attacks on query trapdoors, they have no way of knowing the results of each retrieval; thus, keyword-guessing attacks by external attackers can be avoided. Intuitively, because our solution uses a dual-server setup, the T e s t algorithm is completed by the MS and KFS in succession. Therefore, on the premise that the two servers MS and KFS do not collude, the MS or KFS cannot execute the test algorithm alone. This means that keyword-guessing attacks by internal attackers can also be avoided. □

6. Performance Evalution

We conducted experiments by using the Python language, based on the charm-crypto library [35], on a PC with an AMD Ryzen 5 3600 3.6 GHz processor, 4 GB of RAM, and the Ubuntu 18.04 LTS operating system. In the experiment, the security parameter was set to 1024, so that N was a positive integer of 1024 bits, and p and q were large prime numbers of 512 bits. We tested the operating efficiency of KFA-PEKS when the maximum number of keywords was 5, 10, 15, and 20, and carried out a corresponding comparative analysis with [24].

6.1. Experimental Results

Figure 3a–d depict comparisons of the average execution times of the K e y G e n D , K e y G e n U , K e y G e n T S , and T i m e T o k e n G e n algorithms, respectively, between our scheme and [24]. The average execution times of these algorithms in [24] are 9.107 ms, 6.274 ms, 3.989 ms, and 8.735 ms, respectively. As a comparison, the average execution times of these algorithms in our scheme are 1.409 ms, 1.384 ms, 1.367 ms, and 1.496 m, respectively.
Figure 4 shows a comparison of the time cost of P E K S algorithms. Since [24] is a single-keyword scheme, the time to generate ciphertexts is linearly related to the number of keywords. When the number of keywords is 20, the time taken reaches 243.65 ms. In contrast, the time cost of our ciphertext generation is only 2.739 ms, and is a figure independent of the number of keywords.
Figure 5 represents the performance comparison of the T r a p d o o r algorithm. The time cost in [24] is linearly related to the number of keywords and is approximately 681.503 ms when the number of keywords is 20. In contrast, our time cost is constant, and measures only about 4 ms.
Figure 6 shows a comparison of the T e s t algorithm. It can be seen that the time cost of [24] is superior to ours when the number of keywords is relatively small, which is due to the fact that our t e s t algorithm requires multiple rounds of interaction between the MS and the KFS, and the network latency affects the time consumption to a great extent. However, the performance of [24] increases with the number of keywords, and its time cost is 2234.265 ms at a keyword count of 20. In comparison, ours is about 1200 ms, and again is independent of the keyword count.
Figure 7 shows the performance comparison of the O p e n K e y w o r d algorithm. The time costs of [24] are 14.173 ms, 28.47 ms, 42.119 ms, and 57.199 ms for keyword counts of 5, 10, 15, and 20, respectively. As a comparison, our time cost is a constant value of about 0.03 ms.

6.2. Theoretical Analysis

We show some feature comparisons between KFA-PEKS and the existing schemes in Table 2. In past research, SE schemes have been classified into four types as follows. The single-writer/single-reader (SW/SR) setting [16] is often SSE, wherein the writer and the reader act as one actor, i.e., only the writer is allowed to initiate the query. The multi-writer/single-reader (MW/SR) setting [4,24,36] is a scheme in which multiple writers can generate searchable ciphertexts for querying by a specific reader, as opposed to the single-writer/multi-reader (SW/MR) setting [18]. The multi-writer/multi-reader (MW/MR) setting [22] allows multiple users to encrypt and upload data while being able to search all users’ stored encrypted data.
In the single-keyword schemes [4,24,36], the reader needs to generate multiple different query trapdoors for multiple different query keywords, i.e., the number of trapdoors is linearly related to the number of keywords. Similarly, the number of searchable ciphertexts is influenced by the number of keywords in the relevant document. The size of the trapdoor and searchable ciphertext are significant factors affecting the efficiency of keyword-searchable encryption schemes, compared to single-keyword schemes; this is related to the | W i d | in [16]. In [18], the writer establishes the corresponding indexes for the documents to be uploaded (and the keywords contained in them) in advance, so the size of ciphertext is linearly related to | W i d | . Similarly, the search trapdoors generated by the reader when making a query are all possible indexing relationships between the query keywords and associated documents, so the number of trapdoors is linearly related to both | D S | and | S Q | . Both the trapdoors and ciphertexts of our study and of [22] are homomorphic encryptions of a decimal integer of constant size.
In PEKS, the searchable ciphertext is processed by the writer using the public key of the target reader for the query keyword; therefore, the attacker can perform KGA on the query trapdoor by forging the ciphertext. Ref. [16] is an SSE scheme in which the generation of the ciphertext and trapdoor require the user’s key, so the attacker cannot generate the test ciphertext used to perform KGA. The reader in [36] shares a secret value with the writer, and the attacker cannot generate a test ciphertext to execute the KGA since they do not know the secret value. Ref. [18] is an SW/MR setting, which means that only the writer can generate the ciphertext. Our scheme and that of [22] use the partial private keys of the MS and KFS as the input for the T e s t algorithm, which means that the attacker is unable to execute the T e s t algorithm freely. Furthermore, among the above schemes, only ours and that of [24] can complete a keyword frequency analysis of the query keywords. In conclusion, the performance of our solution exceeds that of related works.

7. Conclusions

In this paper, we propose a PEKS scheme with secure keyword frequency analysis. Within our scheme, multiple DOs can use their public keys to encrypt their data and upload it to the MS. Then, multiple DUs can independently generate trapdoors that support a multi-keyword search for data retrieval, without interacting with the DO. The entire solution adopts a dual-server architecture, which can effectively resist KGA by internal attackers. After reaching the time node T S O , as specified by the DU, the KFS is able to recover the keyword information in the trapdoor for keyword frequency analysis, and cannot link the recovered keywords with their corresponding data queriers. Although the generation and utilization of a single time token are efficient for a time server, generating a common time token for all data queriers and reducing the traffic between the MS and KFS remains the direction of our future research.

Author Contributions

Conceptualization, Y.Z. and Y.L.; methodology, Y.Z.; software, Y.Z.; validation, B.S., D.Z. and Y.Z.; formal analysis, Y.Z., Y.L. and C.W.; investigation, Y.Z. and Y.L.; resources, Y.Z. and D.Z.; data curation, Y.Z. and B.S.; writing—original draft preparation, Y.Z.; writing—review and editing, D.Z., C.W. and B.S.; visualization, Y.Z. and Y.L.; supervision, D.Z. and C.W.; project administration, D.Z. and C.W.; funding acquisition, D.Z. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was funded by the National Natural Science Foundation of China (Nos. U2001205, 61732021, 61932010), Guangdong Basic and Applied Basic Research Foundation (Nos. 2019B030302008, 2023B1515040020), Guangdong Provincial Key Laboratory of Power System Network Security (No. GPKLPSNS-2022-KF-05), and TESTBED2 (No. H2020-MSCA-RISE-2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All data are presented in the main text.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lv, Z.; Singh, A.K. Big Data Analysis of Internet of Things System. ACM Trans. Internet Technol. 2021, 21, 28:1–28:15. [Google Scholar] [CrossRef]
  2. Li, X.; Liu, H.; Wang, W.; Zheng, Y.; Lv, H.; Lv, Z. Big data analysis of the Internet of Things in the digital twins of smart city based on deep learning. Future Gener. Comput. Syst. 2022, 128, 167–177. [Google Scholar] [CrossRef]
  3. Kamara, S.; Lauter, K.E. Cryptographic Cloud Storage. In Proceedings of the Financial Cryptography and Data Security, FC 2010 Workshops, RLCPS, WECSR, and WLC 2010, Tenerife, Canary Islands, Spain, 25–28 January 2010; Revised Selected Papers. Sion, R., Curtmola, R., Dietrich, S., Kiayias, A., Miret, J.M., Sako, K., Sebé, F., Eds.; Springer: Cham, Switzerland, 2010; Volume 6054, pp. 136–149. [Google Scholar] [CrossRef]
  4. Boneh, D.; Crescenzo, G.D.; Ostrovsky, R.; Persiano, G. Public Key Encryption with Keyword Search. In Proceedings of the Advances in Cryptology-EUROCRYPT 2004, International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; Springer: Cham, Switzerland, 2004; Volume 3027, pp. 506–522. [Google Scholar] [CrossRef]
  5. Liu, J.; Wu, M.; Sun, R.; Du, X.; Guizani, M. BMDS: A Blockchain-based Medical Data Sharing Scheme with Attribute-Based Searchable Encryption. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  6. Li, H.; Yang, Y.; Dai, Y.; Yu, S.; Xiang, Y. Achieving Secure and Efficient Dynamic Searchable Symmetric Encryption over Medical Cloud Data. IEEE Trans. Cloud Comput. 2020, 8, 484–494. [Google Scholar] [CrossRef]
  7. Liu, P.; Liu, K.; Fu, T.; Zhang, Y.; Hu, J. A privacy-preserving resource trading scheme for Cloud Manufacturing with edge-PLCs in IIoT. J. Syst. Archit. 2021, 117, 102104. [Google Scholar] [CrossRef]
  8. Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the Proceeding 2000 IEEE Symposium on Security and Privacy, S&P 2000, IEEE, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
  9. Curtmola, R.; Garay, J.A.; Kamara, S.; Ostrovsky, R. Searchable symmetric encryption: Improved definitions and efficient constructions. In Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS 2006, Alexandria, VA, USA, 30 October–3 November 2006; Juels, A., Wright, R.N., di Vimercati, S.D.C., Eds.; ACM: New York, NY, USA, 2006; pp. 79–88. [Google Scholar] [CrossRef]
  10. Moataz, T.; Shikfa, A. Boolean symmetric searchable encryption. In Proceedings of the 8th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’13, Hangzhou, China, 8–10 May 2013; Chen, K., Xie, Q., Qiu, W., Li, N., Tzeng, W., Eds.; ACM: New York, NY, USA, 2013; pp. 265–276. [Google Scholar] [CrossRef]
  11. Cash, D.; Jarecki, S.; Jutla, C.; Krawczyk, H.; Roşu, M.C.; Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries. In Proceedings of the Annual Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2013; Springer: Cham, Switzerland, 2013; pp. 353–373. [Google Scholar]
  12. Rhee, H.S.; Park, J.H.; Susilo, W.; Lee, D.H. Improved searchable public key encryption with designated tester. In Proceedings of the 2009 ACM Symposium on Information, Computer and Communications Security, ASIACCS 2009, Sydney, Australia, 10–12 March 2009; Li, W., Susilo, W., Tupakula, U.K., Safavi-Naini, R., Varadharajan, V., Eds.; ACM: New York, NY, USA, 2009; pp. 376–379. [Google Scholar] [CrossRef]
  13. Lu, Y.; Li, J. Efficient searchable public key encryption against keyword guessing attacks for cloud-based EMR systems. Clust. Comput. 2019, 22, 285–299. [Google Scholar] [CrossRef]
  14. Senouci, M.R.; Benkhaddra, I.; Senouci, A.; Li, F. An efficient and secure certificateless searchable encryption scheme against keyword guessing attacks. J. Syst. Archit. 2021, 119, 102271. [Google Scholar] [CrossRef]
  15. Gu, X.; Wang, Z.; Fu, M.; Ren, P. A Certificateless Searchable Public Key Encryption Scheme for Multiple Receivers. In Proceedings of the 2021 IEEE International Conference on Web Services, ICWS 2021, Chicago, IL, USA, 5–10 September 2021; Chang, C.K., Daminai, E., Fan, J., Ghodous, P., Maximilien, M., Wang, Z., Ward, R., Zhang, J., Eds.; IEEE: Piscataway, NJ, USA, 2021; pp. 635–641. [Google Scholar] [CrossRef]
  16. Wang, P.; Wang, H.; Pieprzyk, J. Keyword field-free conjunctive keyword searches on encrypted data and extension for dynamic groups. In Proceedings of the International Conference on Cryptology and Network Security, Hong Kong, China, 2–4 December 2008; Springer: Cham, Switzerland, 2008; pp. 178–195. [Google Scholar]
  17. Zhang, B.; Zhang, F. An efficient public key encryption with conjunctive-subset keywords search. J. Netw. Comput. Appl. 2011, 34, 262–267. [Google Scholar] [CrossRef]
  18. Sun, S.; Liu, J.K.; Sakzad, A.; Steinfeld, R.; Yuen, T.H. An Efficient Non-interactive Multi-client Searchable Encryption with Support for Boolean Queries. In Proceedings of the Computer Security-ESORICS 2016 - 21st European Symposium on Research in Computer Security, Heraklion, Greece, 26–30 September 2016; Proceedings, Part I. Askoxylakis, I.G., Ioannidis, S., Katsikas, S.K., Meadows, C.A., Eds.; Springer: Cham, Switzerland, 2016; Volume 9878, pp. 154–172. [Google Scholar] [CrossRef]
  19. Bethencourt, J.; Sahai, A.; Waters, B. Ciphertext-policy attribute-based encryption. In Proceedings of the 2007 IEEE Symposium on Security and Privacy (SP’07), IEEE, Berkeley, CA, USA, 20–23 May 2007; pp. 321–334. [Google Scholar]
  20. Xu, L.; Yuan, X.; Steinfeld, R.; Wang, C.; Xu, C. Multi-writer searchable encryption: An LWE-based realization and implementation. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, Auckland, New Zealand, 9–12 July 2019; pp. 122–133. [Google Scholar]
  21. Camenisch, J.; Kohlweiss, M.; Rial, A.; Sheedy, C. Blind and anonymous identity-based encryption and authorised private searches on public key encrypted data. In Proceedings of the International Workshop on Public Key Cryptography, Irvine, CA, USA, 18–20 March 2009; pp. 196–214. [Google Scholar]
  22. Liu, X.; Yang, G.; Susilo, W.; Tonien, J.; Liu, X.; Shen, J. Privacy-preserving multi-keyword searchable encryption for distributed systems. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 561–574. [Google Scholar] [CrossRef]
  23. Liu, X.; Deng, R.H.; Choo, K.K.R.; Weng, J. An efficient privacy-preserving outsourced calculation toolkit with multiple keys. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2401–2414. [Google Scholar] [CrossRef]
  24. Xu, L.; Li, W.; Zhang, F.; Cheng, R.; Tang, S. Authorized keyword searches on public key encrypted data with time controlled keyword privacy. IEEE Trans. Inf. Forensics Secur. 2019, 15, 2096–2109. [Google Scholar] [CrossRef]
  25. Byun, J.W.; Rhee, H.S.; Park, H.A.; Lee, D.H. Off-line keyword guessing attacks on recent keyword search schemes over encrypted data. In Proceedings of the Workshop on Secure Data Management, Seoul, Korea, 10–11 September 2006; pp. 75–83. [Google Scholar]
  26. Yau, W.C.; Heng, S.H.; Goi, B.M. Off-line keyword guessing attacks on recent public key encryption with keyword search schemes. In Proceedings of the International Conference on Autonomic and Trusted Computing, Oslo, Norway, 23–25 June 2008; pp. 100–105. [Google Scholar]
  27. Huang, Q.; Li, H. An efficient public-key searchable encryption scheme secure against inside keyword guessing attacks. Inf. Sci. 2017, 403, 1–14. [Google Scholar] [CrossRef]
  28. Qin, B.; Chen, Y.; Huang, Q.; Liu, X.; Zheng, D. Public-key authenticated encryption with keyword search revisited: Security model and constructions. Inf. Sci. 2020, 516, 515–528. [Google Scholar] [CrossRef]
  29. He, D.; Ma, M.; Zeadally, S.; Kumar, N.; Liang, K. Certificateless Public Key Authenticated Encryption With Keyword Search for Industrial Internet of Things. IEEE Trans. Ind. Inf. 2018, 14, 3618–3627. [Google Scholar] [CrossRef]
  30. Xu, P.; Jin, H.; Wu, Q.; Wang, W. Public-key encryption with fuzzy keyword search: A provably secure scheme under keyword guessing attack. IEEE Trans. Comput. 2012, 62, 2266–2277. [Google Scholar] [CrossRef]
  31. Ghareh Chamani, J.; Papadopoulos, D.; Papamanthou, C.; Jalili, R. New constructions for forward and backward private symmetric searchable encryption. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1038–1055. [Google Scholar]
  32. Samanthula, B.K.; Chun, H.; Jiang, W. An efficient and probabilistic secure bit-decomposition. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, Hangzhou, China, 8–10 May 2013; pp. 541–546. [Google Scholar]
  33. Kamara, S.; Mohassel, P.; Raykova, M. Outsourcing Multi-Party Computation. Available online: https://eprint.iacr.org/2011/272 (accessed on 25 May 2023).
  34. Bresson, E.; Catalano, D.; Pointcheval, D. A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In Proceedings of the Advances in Cryptology-ASIACRYPT 2003: 9th International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, 30 November–4 December 2003; Proceedings 9. Springer: Cham, Switzerland, 2003; pp. 37–54. [Google Scholar]
  35. Akinyele, J.; Green, M.; Rubin, A. Charm-Crypto Framework. Available online: https://eprint.iacr.org/2011/617 (accessed on 25 May 2023).
  36. Zheng, Y.; Xu, P.; Wang, W.; Chen, T.; Susilo, W.; Liang, K.; Jin, H. DEKS: A Secure Cloud-Based Searchable Service Can Make Attackers Pay. In Proceedings of the Computer Security-ESORICS 2022-27th European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022; Proceedings, Part II. Atluri, V., Pietro, R.D., Jensen, C.D., Meng, W., Eds.; Springer: Cham, Switzerland, 2022; Volume 13555, pp. 86–104. [Google Scholar] [CrossRef]
Figure 1. Our scheme consists of four main steps. In the first step, the DOs encrypt their data using their public key to generate searchable ciphertexts, which are then uploaded to an MS. In the second step, the DUs generate query trapdoors for the set of keywords of interest and send them to the MS. In the third step, the MS and the KFS cooperate to calculate the query results, which are then sent back to the corresponding DU. Finally, in the fourth step, the TS periodically sends time tokens to the KFS. The KFS can use time tokens to recover the keywords that satisfy the query trapdoor for the current timestamp, and then analyze these keywords. The analysis results are sent back to the DOs.
Figure 1. Our scheme consists of four main steps. In the first step, the DOs encrypt their data using their public key to generate searchable ciphertexts, which are then uploaded to an MS. In the second step, the DUs generate query trapdoors for the set of keywords of interest and send them to the MS. In the third step, the MS and the KFS cooperate to calculate the query results, which are then sent back to the corresponding DU. Finally, in the fourth step, the TS periodically sends time tokens to the KFS. The KFS can use time tokens to recover the keywords that satisfy the query trapdoor for the current timestamp, and then analyze these keywords. The analysis results are sent back to the DOs.
Futureinternet 15 00197 g001
Figure 2. Workflow of our system.
Figure 2. Workflow of our system.
Futureinternet 15 00197 g002
Figure 3. The average execution times of ours and [24].
Figure 3. The average execution times of ours and [24].
Futureinternet 15 00197 g003
Figure 4. The encryption times of ours and [24].
Figure 4. The encryption times of ours and [24].
Futureinternet 15 00197 g004
Figure 5. The trapdoor generation times of ours and [24].
Figure 5. The trapdoor generation times of ours and [24].
Futureinternet 15 00197 g005
Figure 6. The search operation times of ours and [24].
Figure 6. The search operation times of ours and [24].
Futureinternet 15 00197 g006
Figure 7. The keyword recovery times of ours and [24].
Figure 7. The keyword recovery times of ours and [24].
Futureinternet 15 00197 g007
Table 1. Summary of notations.
Table 1. Summary of notations.
NotationsDescriptions
WThe universal keyword set
DA document
D S All documents stored on the server
D S [ ω ] All documents that contain the keyword ω
S Q Keywords included in a query
W i d All keywords contained in document with identifier i d
W c The keyword set represented by c
ω A keyword
( c η 1 ,   ,   c 0 ) The unsigned binary representation of a positive decimal c
c η 1 The highest bit
c 0 The least bit
η The bit length of c, the number of keywords in W
¬ c The complement of c
TA positive decimal integer is used as the plaintext representation of searchable ciphertext, and its binary representation can represent the relationship between the set W T and the universal set W
tA positive decimal integer is used as the plaintext representation of searchable ciphertext, and its binary representation can represent the relationship between the set W t and the universal set W
s k i Secret key of participant i
p k i Public key of participant i
M S K i Partial master key of participant i
T S O Timestamp specified by DUs
T S N Timestamp generated periodically by the time server
[ · ] p k The encryption of · under the public key p k
S U M The decimal integer 2 η 1
L ( · ) The bit length of ·
Table 2. The feature and theoretical comparisons between our scheme and the existing schemes.
Table 2. The feature and theoretical comparisons between our scheme and the existing schemes.
Multi-UserKGA ResilienceKeyword RecoveryCipNum 1 TrapNum 2 CipSize 3 TrapSize 4
[4]××× ω W | D S [ ω ] | | S Q | O | W i d | O | S Q |
[22]× | D S | 1 O ( 1 ) O ( 1 )
[24]×× ω W | D S [ ω ] | | S Q | O | W i d | O | S Q |
[36]×× ω W | D S [ ω ] | | S Q | O | W i d | O | S Q |
[16]×× ( | D S | ) 1 O | W i d | O | W i d |
[18]×× ω W | D S [ ω ] | | S Q | O | W i d | O | D S | * | S Q |
ours | D S | 1 O ( 1 ) O ( 1 )
1 The number of searchable ciphertexts. 2 The number of trapdoors generated when a data owner executes a query SQ. 3 The size of the searchable ciphertext corresponding to a document. 4 The trapdoor size corresponding to the SQ operation.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Zhou, D.; Li, Y.; Song, B.; Wang, C. How Can We Achieve Query Keyword Frequency Analysis in Privacy-Preserving Situations? Future Internet 2023, 15, 197. https://doi.org/10.3390/fi15060197

AMA Style

Zhu Y, Zhou D, Li Y, Song B, Wang C. How Can We Achieve Query Keyword Frequency Analysis in Privacy-Preserving Situations? Future Internet. 2023; 15(6):197. https://doi.org/10.3390/fi15060197

Chicago/Turabian Style

Zhu, Yiming, Dehua Zhou, Yuan Li, Beibei Song, and Chuansheng Wang. 2023. "How Can We Achieve Query Keyword Frequency Analysis in Privacy-Preserving Situations?" Future Internet 15, no. 6: 197. https://doi.org/10.3390/fi15060197

APA Style

Zhu, Y., Zhou, D., Li, Y., Song, B., & Wang, C. (2023). How Can We Achieve Query Keyword Frequency Analysis in Privacy-Preserving Situations? Future Internet, 15(6), 197. https://doi.org/10.3390/fi15060197

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop