1. Introduction
With the development of the information revolution, human society has entered a highly informationized mode. The development of informatization has fundamentally changed the human way of life and brought great developments to the industrial society. As an important source of power for the development of social informatization, data are now been regarded as a strategic resource like oil and coal by many countries. Although the emergence of massive data has greatly promoted the progress of society, it has brought problems as well, for example, the extra cost of massive data storage. As data owners (DO) can spend a great deal of time and energy on local data storage, they generally choose to store these data using cloud service providers (CSP) with greater storage and computing capabilities. However, although this move reduces the burden of data ownership to a certain extent, it leads to the DO losing direct control over the data, that is, the owner may not be able to obtain the latest iteration of their data at any time. Data stored in the cloud can be corrupted for a variety of reasons. For example, in order to save on storage cost, a CSP may intentionally delete data that is not frequently accessed by the DO. In addition, malicious attacks from third parties or hardware failures on CSP’s servers can cause data corruption. Furthermore, the CSP may choose to hide this from the DO in order to preserve their reputation. Such concealment by the CSP is likely to cause a degree of economic loss or other harm to the DO. In order to solve this problem, remote data integrity auditing scheme are increasingly being applied.
The first data integrity auditing scheme was proposed by Ateniese et al. [
1] in 2007, in which the concept of provable data possession (PDP) was proposed. During the same period, Juels et al. [
2] proposed another type of auditing scheme, proof of retrievability (PoR), for the first time. Following these early developments, research on data integrity auditing schemes has made great progress, and many schemes with their own characteristics have been proposed one after the other [
3,
4,
5,
6,
7].
In the auditing scheme developed and proposed here, the designer pays attention to realizing other properties in addition to ensuring the accuracy of data auditing, among which the most important is privacy protection. The privacy of data has become a topic of great concern in today’s society. This is mainly because data often contain private information pertaining to the DO, which may cause serious harm if leaked. The existing solution is that, before uploading data to the cloud, the DO encrypts or blinds the data locally before generating tags and uploading [
8,
9]. Although this approach can effectively protect the privacy of data, encryption and blinding operations incur a huge computational burden on the DO, hindering the realization of the data’s value.
In addition to privacy, data access control is a concern. Data can only be of maximum value when circulated. The most direct manifestation of this is the emergence of a large number of base stations in the edge environment. In order to realize the value of their data, the CSP or DO chooses to set up base stations in a fixed area and provides services to nearby users. Therefore, there are many data integrity auditing schemes in edge environments [
10,
11,
12]. In the field of cloud storage, data access control cannot directly apply the model in the edge environment. This is because the cloud storage environment does not consider users in a fixed area, only authorized users who hold legal warrants to acquire data from the CSP. Therefore, data access control under cloud storage conditions must consider the correctness and revocability of warrants.
Finally, the core of the extant data integrity auditing schemes is digital signature technology. The DO guarantees the implementation of the scheme by generating corresponding tags for each data block. This approach can effectively check the integrity of data stored in the cloud, however, the appearance of tags means that the DO needs to prepare extra overhead to store tags in the cloud. In most of the existing schemes, the storage overhead generated by the tag is close to or even greater than the storage overhead of the data block itself [
13,
14,
15]. The storage overhead incurred by these tags has become a major factor hindering the application of data auditing schemes.
1.1. Contribution
In order to solve the main problems with current auditing schemes, we design a certificateless remote data integrity auditing scheme with an access control function and sensitive information hiding. The main contributions of this paper are as follows:
(1) We propose a certificateless remote data integrity scheme that avoids the additional overhead brought by certificates. At the same time, in order to reduce the storage overhead caused by tags, our scheme does not choose to generate tags for each data block when generating tags for data blocks, instead aggregating multiple data blocks into one signature to reduce the number of tags.
(2) In order to make better use of the value of data, our scheme implements an access control function. That is, the DO generates warrants for authorized users, allowing them to acquire data from the cloud. The warrants genarated in our scheme are revocable, which means that DO can revoke a user’s access at any time. At the same time, the scheme proposed in this paper takes privacy into account. During implementation, only authorized users, the DO, and the CSP can access the original master data, while unauthorized users and TPA cannot obtain the data by any means.
(3) We provide a detailed proof of the process of the proposed scheme and compare it with two other schemes through both theoretical analysis and experimental verification. The results show that the proposed scheme outperforms [
8,
15], mainly in terms of shorter time spent in tag generation and lower storage overhead on the part of the CSP.
1.2. Related Works
In 2012, Yang et al. [
16] proposed a data integrity auditing scheme with privacy protection and support for dynamic updates, and which can support batch auditing for multiple DOs and CSPs. However, their approach is based on the traditional public key infrastructure design audit protocol, and as such there is a huge cost problem caused by the use of certificates. In 2013, Wang et al. [
17] first proposed a certificateless public auditing scheme to verify the integrity of data in the cloud in order to solve the security risks brought on by public key infrastructure (PKI); however, they did not consider privacy or dynamic updating of data. In the same year, Kai et al. [
18] proposed a data integrity auditing scheme in a multi-cloud environment with support for simultaneous verification of multiple audit requests for different data files stored on different CSPs by different DOs. The authors introduced a homomorphic encryption mechanism to ensure the privacy of the auditing process, however, the extra overhead caused by certificates was ignored. In 2014, Yuan et al. [
19] proposed a remote data integrity auditing scheme with support for multi-user modification and collusion resistance based on full consideration of realistic scenarios. However, their approach did not address the risk of data leakage and was not able to guarantee privacy. In 2015, Jiang et al. [
20] proposed a data integrity auditing scheme based on vector commitment and verifier-local revocation group signatures, however, the privacy of the data cannot be guaranteed by this approach. In 2016, Zhang et al. [
21] proposed a lightweight auditing scheme based on entrusting most of the computation in the auditing process to the cloud in order to reduce the burden of auditing. Under this scheme third-party auditors can perform multiple audit tasks simultaneously, however, the risk of data leakage is increased. In 2017, Li et al. [
22] designed a scheme to solve the complex key management problem in traditional methods by employing biometrics as a fuzzy identity; however, new computational overhead is generated in the process of key matching. In another paper from 2019, Shen et al. [
5] designed an auditing scheme that does not require a private key using biometric data such as fingerprints as a fuzzy private key to sign data blocks to eliminate the storage burden of saving the private key. However, the feature of fuzzy private keys requires extra computational overhead to implement. In 2018, in order to better realize data sharing, Shen et al. [
23] implemented data auditing and sharing by introducing a sanitizer to sanitize the data blocks containing sensitive information and their corresponding tags; however, the inclusion of these cleaning agents increases the computational cost of label generation. In 2020, Zhao et al. [
24] embedded blockchain technology into the cloud auditing system and designed a data integrity auditing scheme with a privacy protection function by combining a blockchain and digital signature technology. However, the encryption algorithm they used to ensure data security adds an additional computational burden. In the same year, Lan et al. [
25] pointed out that the RDIC protocol proposed by C. Sasikala et al. was not secure, and provided a rigorous proof analysis. In 2021, in order to avoid the waste of resources caused by repeatedly challenging specific data blocks over a short period of time, Wang et al. [
26] confirmed the time at which such data blocks are checked by predicting user behavior and selecting the challenging data blocks on this basis. In 2022, Yang et al. [
27] implemented a data integrity auditing scheme with support for dynamic insertion and provable seletion through a number-rank-based Merkle hash tree. In another 2022 paper, S. Dhelim et al. [
28] proposed and designed a large-scale IoT trust management system able to effectively manage the trust relationships of devices in large-scale IoT contexts.
Table 1 compares the characteristics of the works referenced above.
1.3. Organization
The rest of this paper is organized as follows. In
Section 2, we describe several of the technologies that support our paper and then provide the scheme outline and security model. In
Section 3, we elaborate our proposed remote data integrity auditing scheme. In
Section 4, we analyze the security of our scheme in detail.
Section 5 presents the results of our performance evaluation. Finally, we present the conclusions of this paper in
Section 6.
2. Preliminaries
In this section, we provide relevant knowledge points and describe the security model of the scheme we designed. Important mathematical notations used in this paper are listed in
Table 2.
2.1. System Model
There are five entities in our scheme: DO, CSP, KGC, TPA, and Users. The relationship between them is shown in
Figure 1.
(1) KGC: The KGC is responsible for generating relevant parameters and generating a partial private key for the DO.
(2) DO: The DO is the actual owner of the data and is responsible for splitting the file into appropriate sizes, generating corresponding tags, and generating warrants for authorized users.
(3) CSP: The CSP is an institution with sufficient storage and computing power, and is responsible for properly storing data and making it available to authorized users.
(4) TPA: The TPA is an institution with sufficient computing power, and is responsible for randomly issuing data integrity challenges to the CSP. In our scheme, the TPA is considered semi-honest, i.e., while it honestly reports auditing results, it is curious about the data.
(5) Users: Users are the groups who want to use the data stored by the CSP; they can obtain data from the CSP using legitimate warrants.
2.2. Bilinear Maps
Let q be a large prime and and be two cyclic multiplicative groups with the same order q, where g is a generator of and is a bilinear map with the following properties:
(1) Bilinearity: for and , .
(2) Non-degeneracy: , and .
(3) Computability: for , there is an efficient algorithm to calculate .
2.3. Security Assumptions
2.3.1. Computational Diffie–Hellman (CDH) Problem
Let be a multiplicative cyclic group, where g is a generator of . Given the tuple , where is unknown, the CDH problem is to calculate .
2.3.2. CDH Assumption
For a probabilistic polynomial time (PPT) adversary
A, the advantage for
A in solving the CDH problem in
is negligible. Assume that
is a negligible value; then, it can be defined as
2.3.3. Discrete Logarithm (DL) Problem
Let be a multiplicative cyclic group, where g is a generator of . Given the tuple , where is unknown, the DL problem is to calculate x.
2.3.4. DL Assumption
For a PPT adversary
A, the advantage for
A in solving the DL problem in
is negligible. Assume that
is a negligible value; then, it can be defined as
2.4. Outline of Our Scheme
Our designed certificateless data integrity auditing scheme with sensitive information hiding involves eight algorithms with the following functions:
: this algorithm is performed by the KGC to generate the relevant parameters and master key.
: this algorithm is performed by the KGC, mainly to generate the partial private key for the DO.
: this algorithm is run by the DO, and its main function is to generate a complete private key.
: this algorithm is performed by the DO; its main function is to generate tags for each data block.
: this algorithm is executed by the TPA, and mainly generates relevant parameters used to challenge the integrity of the data blocks stored in the cloud.
: this algorithm is executed by the CSP to generate relevant proofs in response to TPA challenges.
: this algorithm is executed by TPA, mainly to judge whether the proof of the CSP response is legal and to judge the integrity of the data.
: this algorithm is executed by the DO, mainly to generate warrants for authorized users for access control.
2.5. Security Model
The security model of our scheme is based on a game played between a challenger and an adversary A, where the challenger represents the DO and TPA and the adversary A represents the malicious cloud CSP. The specific details of the game are as follows:
Setup: The challenger executes the algorithm to generate the master key and the public parameters , then sends the public parameters to the adversary A and stores the properly.
Hash Queries: The adversary A can query any type of hash function, then challenger performs related calculations and sends the results to A.
Private/Public Key Queries: The adversary A can query the private key of the user with identity . When receiving a private key query request from adversary A, challenger runs and , then feeds back the public key and the combined private key to A.
Tag Queries: The adversary A can randomly select a data block and query its corresponding tag; the challenger then executes to generate the signature of the data block and sends the result to A.
Integrity Proof Queries: The adversary A can randomly select any number of data blocks and query their corresponding integrity proof; the challenger then executes and returns the result to A.
Challenge: The challenger selects a certain number of data blocks to send to the adversary A in order to obtain a data integrity proof.
Forging: In response to a challenge message sent by challenger , adversary A forges an integrity proof and returns it to to verify the validity of the proof. If this proof can always pass the challenger’s verfication, the adversary A is considered to have won the game.
Based on the above game, we have the following definitions.
Definition 1. A certificateless remote data integrity auditing scheme is considered to be secure if the probability of adversary A winning the game against challenger β is negligible.
Furthermore, we propose a formal definition to satisfy the security requirement of sensitive information hiding.
Definition 2. A certificateless remote data integrity auditing scheme is regarded as having sensitive information hiding if only the DO and authorized users can obtain the original data from the CSP during the execution of the scheme.
3. Our Proposed Scheme
In this part, we elaborate the details of our design scheme.
: This algorithm is performed by the KGC. Given a security parameter k, the KGC randomly generates a large prime q with the feature . Then, the KGC generates two multiplicative cyclic groups and , both of order q, where g is a generator of and represents t random elements. In addition, the KGC selects three secure cryptographic hash functions . Here, is a pseudo-random permutation (PRP), is a pseudo-random function (PRF), e is a bilinear map acting on , and : . Then, the KGC generates the master secret key , where s is randomly selected from and the master public key is calculated by . After carrying out the above operations, the KGC publishes and keeps the private.
: this algorithm is responsible for generating part of the private key for the DO. After receiving the of the DO, the KGC computes and feeds the result back to the DO.
: when the DO receives the part of the private key sent by the KGC, he or she first verifies whether is established. If the equation does not hold, the DO refuses to recognize the legitimacy of and initiates a new round of application to the KGC. Otherwise, the DO randomly selects an element and computes . After completing the above operation, the DO publishes the public key and keeps the private key secret.
: here, we assume that file
F has been divided into
n parts of
before uploading it to the cloud and that each part contains
t small sectors with the same length, that is,
, where
.
is the unique
of the file
F. The DO computes the tag for each row block (
by computing Equation (
1):
The set of all tags is denoted by . After generating tags for all data row blocks, the DO transmits to the CSP. When receiving the tag set , the CSP can verify the single tag through .
: in order to reduce overhead, the TPA generally randomly selects partial data blocks when checking the integrity of data stored in the cloud. Suppose the TPA chooses to check c data blocks; then, the TPA randomly selects two elements . Finally, the TPA sends the challenge message to the CSP.
: after receiving the challenge request initiated by the TPA, the CSP first genarates the relevant parameters set
, where
. Then, the CSP selects a value
and computes
After completing the above calculation, the CSP sends the proof to the TPA.
: The TPA uses PRF and PRP locally to calculate the relevant parameter set
. In this way, the TPA can conduct verification using Equation (
2):
If Equation (
2) holds, the algorithm outputs a value of 1, indicating that the checked data is securely stored in the cloud. Otherwise, the data have been polluted, and the algorithm outputs 0.
: To generate a warrant that can access the data stored in the cloud, the DO and user
randomly select
, respectively. After completing the above operation, the DO and
keep
private, and publish
, respectively. To authorize
for sensitive data access, the DO computes a warrant
and sends it to
. After
receives
W, he or she calculates
and sends it to the CSP to access the data. When the CSP receives the request, it checks whether
holds or not. If the equation holds, then the CSP transmits the data to
; otherwise, the CSP rejects the application. When the DO wants to revoke
’s warrant, he or she simply re-selects a new
and updates
, meaning that
can no longer obtain data from the CSP.
4. Security Proof
In this section, we provide a theoretical analysis of our designed scheme from four aspects: correctness, soundness, sensitive information hiding, and detectability.
4.1. Correctness Proof
The correctness of our proposed scheme can be summarized in four points: (1) if the partial private key sent by the KGC to the DO is correct, the DO always accepts it; (2) a single tag always passes validation if the DO generates tags correctly for the data blocks; (3) if the CSP stores the data properly in the cloud, every generated proof can be verified by the TPA; (4) if the user holds a valid warrant from the DO for authorization, the user can obtain sensitive data from the CSP.
Proof. (1) The correctness of the partial private key generated by the KGC for the DO is as follows:
(2) A single tag can be generated by Equation (
1), and its correctness is proven as follows:
(3) The correctness of auditing relies on checking Equation (
2), the correctness of which is proven as follows:
(4) If the user holds the correct warrant, then it can be verified by the CSP as follows:
□
4.2. Soundness Proof
Theorem 1 (Auditing soundness). If the CDH assumption holds on the multiplicative cycle groups and , then a malicious cloud cannot forge a verifiable proof.
Proof. We prove through a series of games that there exists an extractor to solve the CDH problem by extracting the interaction information between the challenger and adversary A if the CSP has forged a reasonable proof without saving the complete data. □
The specific process of this game has been elaborated in
Section 2, therefore, we omit it here.
This game is largely the same as , except with the addition of a new rule. Challenger maintains a list locally to record Tag Queries made by adversary A. If adversary A generates a proof P that is successfully verified by the challenger and the proof contains at least one tag that is not recorded in the list, then challenger terminates the game and declares defeat.
Assuming that adversary A wins with a non-negligible probability, then there is an extractor that can solve the CDH problem through the following steps. Without loss of generality, suppose that the identity of the adversary is , the file is , and the file name is .
Suppose the honest prover produces the correct proof
; then, it has
When adversary
A successfully forges a verifiable proof
, it has
Obviously,
; otherwise,
. Given
, its goal is to compute
. The extractor randomly selects
for each
in the challenge, then sets
, where
and
. This means that
. Meanwhile, the extractor randomly selects an element
as
and then performs
and
to generate the private key
. Then, the extractor randomly selects an element
and sets
, which means that
. Finally, for each
i, the extractor sets
Hence, the extractor can compute .
Thus, by dividing Equations (
4) and (
3) we can obtain
From Equation (
5), we can obtain
According to the above equation, the CDH problem is unsolvable only when . Obviously, in the finite field generated by the large prime number q, the probability of is infinitely close to . This means that if adversary A successfully forges a verifiable proof, the extractor can solve the CDH problem with probability , which contradicts the difficulty of solving the CDH problem.
introducesa new rule on the basis of , that is, if the challenger finds that in proof generated by adversary A is not equal to the expected M, then the challenger terminates the game and declares defeat.
Given , we build an extractor the purpose of which is to compute a value which satisfies . We set , where and .
Suppose
P is a correct proof generated by the honest cloud; then, we have
The adversary then generates a proof
, meaning that we have
From
, we know that
. Thus, the extractor has
Therefore, the extractor can solve the DL problem as follows:
According to the above equation, the DL problem is unsolvable only when . Obviously, in the finite field generated by the large prime number q, the probability of is infinitely close to . This means that if adversary A successfully forges a verifiable proof, the extractor can solve the DL problem with probability , which contradicts the difficulty of solving the DL problem. From the above analysis, it can be seen that the proposed scheme is secure unless the adversary successfully solves both the CDH problem and the DL problem.
4.3. Sensitive Information Hiding Proof
Theorem 2 (Sensitive information hiding). In our designed scheme, only the CSP, DO, and legitimate users authorized by the DO can access the original data.
Proof. We demonstrate this proof in two parts: (1) the TPA cannot obtain the original data during the auditing process and (2) the DO-generated warrants for authorized users are hard to forge. □
In the process of auditing, the TPA can select several data blocks for multiple challenges; its purpose is to try to reverse the original data blocks through a sufficient number of polynomials. In our scheme, the proof generated by the CSP adds a blinding factor r to all linearly aggregated data blocks , meaning that unless the TPA can solve r from , it cannot recover the data. However, due to the difficulty of solving the DL problem, it is very unlikely that the TPA will be able to recover the data through multiple challenges. In addition, unauthorized users and users who have revoked warrants and want to continue to obtain data have difficulty forging legitimate warrants. The reason for this is simple: if the warrant is successfully forged, it means that is solved from , which contradicts the difficulty of solving the DL problem. Therefore, as long as the DL assumption is always true, the scheme designed in this paper can guarantee the privacy of data.