Next Article in Journal
DQKNet: Deep Quasiconformal Kernel Network Learning for Image Classification
Next Article in Special Issue
Secure and Flexible Privacy-Preserving Federated Learning Based on Multi-Key Fully Homomorphic Encryption
Previous Article in Journal
Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Public Authentic-Replica Sampling Mechanism in Distributed Storage Environments

1
School of Computer Science, School of Cyber Science and Engineering, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
2
College of Computer Science and Technology, Shanghai University of Finance and Economics, Shanghai 200433, China
3
School of Software, Shandong University, Jinan 250100, China
4
State Grid Zaozhuang Power Supply Company, Zaozhuang 277899, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(21), 4167; https://doi.org/10.3390/electronics13214167
Submission received: 19 September 2024 / Revised: 17 October 2024 / Accepted: 19 October 2024 / Published: 23 October 2024
(This article belongs to the Special Issue Novel Methods Applied to Security and Privacy Problems, Volume II)

Abstract

:
With the rapid development of wireless communication and big data analysis technologies, the storage of massive amounts of data relies on third-party trusted storage, such as cloud storage. However, once data are stored on third-party servers, data owners lose physical control over their data, making it challenging to ensure data integrity and security. To address this issue, researchers have proposed integrity auditing mechanisms that allow for the auditing of data integrity on cloud servers without retrieving all the data. To further enhance the availability of data stored on cloud servers, multiple replicas of the original data are stored on the server. However, in existing multi-replica auditing schemes, there is a problem of server fraud, where the server does not actually store the corresponding data replicas. To tackle this issue, this paper presents a formal definition of authentic replicas along with a security model for the authentic-replica sampling mechanism. Based on time-lock puzzles, identity-based encryption (IBE) mechanisms, and succinct proof techniques, we design an authentic replica auditing mechanism. This mechanism ensures the authenticity of replicas and can resist outsourcing attacks and generation attacks. Additionally, our schemes replace the combination of random numbers and replica correspondence tables with Linear Feedback Shift Registers (LFSRs), optimizing the original client-side generation and uploading of replica parameters from being linearly related to the number of replicas to a constant level. Furthermore, our schemes allow for the public recovery of replica parameters, enabling any third party to verify the replicas through these parameters. As a result, the schemes achieve public verifiability and meet the efficiency requirements for authentic-replica sampling in multi-cloud environments. This makes our scheme more suitable for distributed storage environments. The experiments show that our scheme makes the time for generating copy parameters negligible while also greatly optimizing the time required for replica generation. As the amount of replica data increases, the time spent does not grow linearly. Due to the multi-party aggregation design, the verification time is also optimal. Compared to the latest schemes, the verification time is reduced by approximately 30%.

1. Introduction

Storage as service [1,2,3] allows for consumption-based storage facilities to store and process massive amounts of data through cloud storage. Cloud storage relieves clients of the burden of storage management [4], simplifying data maintenance and management and reducing reliance on specialized IT personnel. Additionally, cloud storage facilitates universal data access across independent geographical locations, enabling users to easily access and share data regardless of their location. Moreover, cloud storage eliminates the substantial costs associated with purchasing hardware, software, and personnel maintenance [5], providing a cost-effective solution that significantly reduces the total cost of ownership, enhancing operational efficiency and competitiveness for enterprises. However, cloud service providers are not entirely trustworthy, and clients need to ensure that their data stored on the servers are not subject to integrity or other security threats [6,7,8]. Therefore, effective mechanisms are needed to ensure data integrity in cloud storage. Integrity auditing technology is considered an effective means for clients to verify the correctness of data stored in the cloud. In 2007, Ateniese et al. [9] proposed Provable Data Possession (PDP), and Jules and Kaliski [10] introduced Proofs of Retrievability (PORs). Both methods provide clients with guarantees of the integrity and availability of outsourced data. Subsequently, more and more experts began researching how to ensure data integrity for clients in cloud storage. In schemes where clients interact with cloud storage servers [11,12], only the original data are stored, making it difficult to recover in case of damage to the original data. However, in multi-replica storage, any damaged data can be correctly recovered using other replicas [13,14,15], refs. [16,17,18], which has been widely applied in industries such as finance, education, and healthcare, where data availability is critical.
However, in multi-replica schemes, servers may still engage in fraudulent behavior, i.e., not actually storing all data replicas. This lack of assurance regarding the genuine storage of data replicas not only leads to the inability to recover damaged data but also significantly undermines the integrity and availability of the stored data, which is especially critical in industries involving key business operations. Therefore, there is an urgent need for a genuine replica sampling mechanism that can resist server fraud to ensure data security and availability, protecting user interests and trust. Moreover, we specifically define server fraud as outsourcing attacks and generation attacks [19].
In an outsourcing attack, a malicious server attempts to deceive the verifier into passing the audit. During the integrity audit challenge, the malicious server, upon receiving a challenge from the verifier, quickly retrieves the corresponding data from another storage provider and generates the proof. The malicious server appears to be storing the data continuously but, in reality, it does not bear the responsibility of storing it. This behavior allows the malicious server to deceive the client not only about the physical space allocated to the data but also about its storage capacity, even claiming to store data beyond its limit. This behavior not only affects data reliability but also imposes significant economic losses and risks of data leakage for the client. Furthermore, since the data retrieved from external storage lack security guarantees, the overall security of the data are greatly compromised, increasing the risk of data leakage and loss, further exacerbating the trust crisis.
Secondly, in a generation attack, the malicious server uses certain algorithms or predefined methods to quickly and dynamically generate replicas and produce proofs during a challenge without actually storing real data. This allows the malicious server to deceive the client, reducing the actual storage space required, thus saving costs, but posing a serious threat to data security. Therefore, to enhance data integrity and availability, effective mechanisms must be developed to defend against these fraudulent behaviors and maintain user trust and interests.
In data integrity auditing schemes, the client and the server, as two interacting entities, have different evaluation criteria to consider from their respective perspectives [15,20,21]. Existing schemes often focus on different evaluation criteria, attempting to make the scheme more favorable to either the server or the client. However, considering only one party’s evaluation criteria will lead to an imbalanced scheme. Therefore, it is necessary to trade off between the client and the server, meaning that the scheme must consider how to meet the evaluation criteria of both parties. Armknecht et al. proposed a Mirror algorithm [22], which shifts the replica generation process to the server, reducing the client’s computational burden and significantly saving bandwidth resources. At the same time, the server also benefits from reduced communication transmission, saving bandwidth resources. Additionally, although the server regains the initiative in replication, it is almost impossible for it to engage in improper behavior. This scheme is client-friendly while imposing as little burden as possible on the server. However, this trade-off still faces the issue of linear storage cost increase with the number of replicas when the client generates and uploads copy parameters. Moreover, it does not meet the criteria for public verifiability compared to other schemes. In 2023, the user-friendly scheme [23] satisfies the public verifiability feature of client generation and uploading of copy parameters under a single cloud server, where the generation and uploading of copy parameters remain constant as the number of copies increases and the copy parameters are recoverable.
Although the aforementioned schemes are excellent, they have some obvious limitations. First, these schemes typically rely on a single-cloud storage environment, making the system vulnerable to single points of failure. If the cloud provider experiences an outage, data may become inaccessible or be permanently lost. Second, in a single-cloud environment, the tasks of replica generation [13,19] and storage are all handled by a single server, which not only increases the computational and storage burden but can also lead to performance bottlenecks when dealing with large-scale data. Furthermore, single-cloud schemes are less capable of defending against outsourcing attacks and generation attacks, where a malicious server can deceive the verifier by using external resources or dynamically generating replicas. In contrast, a multi-cloud environment distributes storage and processing tasks, improving the system’s security and stability while also reducing verification overhead. Lastly, the public verifiability of single-cloud schemes is limited; the verifier must rely on a single cloud service provider, lacking the transparency of multi-party collaboration. Multi-cloud storage, however, can enhance the reliability of the audit process through collaboration among different cloud providers. Unfortunately, the above schemes have not been implemented in a multi-cloud environment, and both public verifiability and constant-level copy parameter construction will be called into question.
To address the aforementioned challenges, we propose a public authentic-replica sampling mechanism. This mechanism is a security sampling scheme designed to tackle data integrity and availability issues in a multi-cloud storage environment. Our primary goal is to ensure that the data replicas stored on cloud servers genuinely exist and have not been tampered with, thus maintaining the integrity and availability of the data. To achieve this goal, we designed an auditing method based on time-lock puzzles, succinct proofs, and identity-based encryption to ensure the authenticity of replicas. Additionally, recoverable copy parameters are employed, allowing any third party to verify the data, achieving public verifiability. Based on this sampling mechanism, we have instantiated a public authentic-replica sampling scheme. During the designated replica generation process, by utilizing a Ref-Table and random seeds, we achieve the distributed storage of data replicas across multiple clouds with negligible overhead while also reducing the burden on the client. The overall design cleverly allows for proof aggregation, further reducing verification overhead, resulting in a scheme that is both efficient and secure. The main contributions can be summarized as follows:
  • To address the issue of servers falsely storing replica data, this paper designs an authentic-replica sampling mechanism that periodically checks the server’s replica storage. We formally define the security model for the authentic-replica sampling mechanism and design a publicly verifiable authentic-replica sampling scheme for distributed storage environments. Our scheme distributes data replicas across multiple clouds, ensuring low verification overhead while distributing the pressure of replica generation. Through security analysis, we demonstrate the security of this scheme, ensuring that the replicas stored on the server occupy actual storage space.
  • This scheme is based on time-lock puzzles and employs identity-based encryption mechanisms and succinct proof techniques to ensure the authentic replica. It is designed to resist generation attacks and outsourcing attacks, avoiding the associated security threats.
  • To address the issue of linear storage cost increase when the client generates and uploads copy parameters as the number of replicas increases, this paper combines random seeds and reference tables. This approach ensures that even as the number of stored replicas grows, the cost for the client to generate and upload copy parameters remains constant, which greatly reduces the computational overhead on the client side. Additionally, by using publicly recoverable copy parameters, the scheme achieves public verifiability. Security analysis demonstrates the scheme’s security.
We organize our paper as follows: We review the preliminaries required in our paper in Section 2. The system model and security model of the public authentic-replica sampling mechanism are described in Section 3. In Section 4, we give the detailed construction of the public authentic-replica sampling scheme and the security analysis. We give the performance of our scheme in Section 5, and finally the conclusion is in Section 6.

Related Work

Distributed storage systems combine network and storage technologies to allow clients to store data remotely and provide various services such as archiving, publishing, federation, and anonymity. As networking technology has evolved, new types of distributed storage systems have emerged [24]. These systems can be categorized into two main types of architectures: client–server and peer-to-peer. In a client–server architecture [25], each entity is either a client or a server with the server responsible for authentication, data replication, backup, and handling client requests. This architecture is commonly used in distributed storage systems. In contrast, in a peer-to-peer architecture [25,26], each participant can be both a client and a server.
In distributed storage systems [27,28,29,30], storing data is essentially a data outsourcing problem for the client. Consequently, ensuring the integrity of the data that has been outsourced is of critical importance in terms of security in order to ensure the integrity of outsourced data on cloud servers, Ateniese [9] et al. proposed the Provable Data Possession (PDP) concept and constructed the PDP scheme, which serves the purpose of verifying the integrity of remote data by checking whether the server has certain blocks through Homomorphic Verifiable Tag (HVT) in the form of probability-based sampling. Secure PDP (S-PDP) is the first sampling mechanism-based PDP, which strongly guarantees the possession of data with its added robustness but at the cost of huge computation on the client side. For this reason, Efficient PDP (E-PDP) [31] has been proposed again, which is used to reduce the computation on the client side, but they all suffer from high communication cost and a linear relationship between the client’s computation and the replica. In 2013, Hanser and Slamanig provided a PDP based on the elliptic curve cryptosystem [32], which enables the same tag to be verified both publicly and privately by identifying the vectors of each data block, which in turn achieves the public verifiability of the scheme, enabling the client and third party to audit the data remotely at the same time. In 2013, Chen proposed algebraic signatures to effectively check data possession in cloud storage [33]. Algebraic signatures have less computational overhead than homomorphic cryptosystems, which makes the scheme guarantee data possession while reducing the overhead, but it suffers from an upper limit on the number of verifications and relies on probabilistic guarantees. Compared to PDP, Juels and Kaliski [10] proposed the Proofs of Retrievability (PoRs), which is a form of encrypted Proof of Knowledge (PoK) that ensures the integrity of outsourced data in untrusted cloud storage while also enabling the recoverability of slightly damaged data through the use of Forward Error-correcting Codes (FECs), but it has the limitation of having an upper limit on the number of verifications. Subsequently, Shacham and Waters achieved data retrievability through coding techniques in 2008 [34], and through homomorphic authenticators, they achieved an infinite number of integrity verifications. In 2013, Yuan and Yu designed a new PoR scheme, which generates polynomials with short strings as a constant-size polynomial commitment technique through the use of constant-size polynomials as a proof, which makes the scheme have a communication cost of constant magnitude. Subsequently, more and more researchers have devoted themselves to this topic and produced a large number of research results.
However, even PoR schemes that can have a weak recovery capability can only repair minor damage. If the source data are damaged to a greater extent, the data will not be recovered, and at the same time, data retrievability will be damaged at the same time. In order to make up for this shortcoming, many researchers have begun to focus on replica storage. Replica [35] is used to ensure data availabilit. Basic principles are used in the server to store different replicas, so that even if the data and more than one replica are corrupted, the remaining complete replica can restore the damaged data, to a large extent, to ensure the availability of data. In the specific scheme to realize multiple replicas, the first thing that researchers think of is to make the replicas directly become additional copies of data. This approach is simpler to implement, but it does not provide strong proof that the server is actually storing multiple copies; the server can be challenged without the client submitting a sufficient number of copies, and it will create copies when challenged, which in turn achieves the effect of spoofing. Thus, the copies that the client expects the server to store become difficult to prove. These schemes also do not guarantee the authenticity, integrity, and availability of the data copies on the server. In 2008, Curtmola et al. made the first attempt at Multiple-Replica Provable Data Possession (MR-PDP) [36], in which the client is able to securely confirm that the server stores multiple unique replicas. In addition, because multiple replicas can be checked in a batch, the overhead of verifying n replicas is less than that of verifying one replica n times. It succeeds in securely networking [37] and storing multiple replicas to create uniquely distinguishable copies of data, but MR-PDP only supports private authentication. In 2009, Bowers et al. proposed a new cloud storage scheme that is based on PoR and utilizes both within-server redundancy and cross-server redundancy, allowing for the testing and reallocation of storage resources when failures are detected. In 2010, Barsoum and Hasan proposed another scheme [36]. There are two versions: one is at the cost of higher storage overhead on the client and server, and the other one is with probabilistic detection to reduce client and server overhead. This scheme is able to delegate the auditing task to a trusted third party using a BLS homomorphic linear authenticator, but the client must re-encrypt and upload the entire copy to the server. Subsequently, multi-copy integrity auditing on dynamic data updates was also proposed. In 2014, Barsoum and Hasan proposed a scheme [38], which supports multi-copy storage in addition to the dynamic operations on the number of outsourcing.
In 2016, Armknecht et al. proposed the Mirror scheme [22], which is based on the difficult problem of Linear Feedback Shift Registers (LFSRs) with Rivest, where mirroring shifts the overhead of building replicas to the cloud servers, which makes the cloud servers reduce the bandwidth overhead. Subsequently, Guo et al. improved on Mirror [39] in order to resist the replacement forgery attacks caused by the extensibility of data labels. These schemes reduce the computation of the client and make the replica generation part realized by the server; however, these schemes still have the problem that the client has a linear relationship with the copy parameters, and the client still has a relatively large amount of computation. At the same time, compared with the previous schemes, it cannot satisfy the public verifiability characteristic. In 2023, the user-friendly scheme [23] satisfies the public verifiability feature of client generation and uploading of copy parameters under a single cloud server, where the generation and uploading of copy parameters remain constant as the number of copies increases and the copy parameters are recoverable. However, if all cloud servers store one more layer of copy parameter generation in a multi-cloud environment, in this environment, the constant and copy parameter generation cannot be satisfied, and the public verifiability characteristic will be questioned. None of the existing integrity auditing schemes can resist the two attacks: the generation attack and the outsourcing attack.
Although existing schemes have made significant efforts in ensuring data integrity and availability, most are confined to single-cloud environments. In a multi-cloud environment, these schemes fail to guarantee client-friendliness and cannot achieve public verifiability. Verification is limited to a single cloud, and whether efficient verification can be achieved in a multi-cloud setting remains uncertain. In contrast, the scheme proposed in this paper overcomes these limitations. By adopting techniques such as time-lock puzzles, succinct proofs, and identity-based encryption, it ensures the authenticity and security of replicas, effectively resisting outsourcing attacks and generation attacks. In a multi-cloud environment, our scheme significantly reduces the overhead of replica generation by combining Ref-Table and random seeds. Additionally, it introduces public verifiability in the replica verification process, greatly enhancing the security and efficiency of the system.

2. Preliminaries

In this section, we will outline the key techniques used in the solution, including time-lock puzzles and succinct proofs for hidden order groups.

2.1. Time-Lock Puzzles

A time-lock puzzle is an encryption technique used to unlock data after a specific time. The basic concept is to encrypt a message so that it can only be decrypted after a set time delay. Rivest et al. introduced an RSA-based time-lock puzzle to encrypt a message [40]; the puzzle can be solved to reveal the key or message only after a certain amount of time. Subsequently, Freitag et al. introduced Non-Malleable Time-Lock Puzzles, which not only require the solver to decrypt after a specified time but also ensure that the puzzle’s solution cannot be forged. These puzzles often have a wide range of applications in areas such as electronic auctions, corporate contract signing, etc. [41,42,43]. Rivest’s time-lock puzzle is related to the RSA cryptosystem in a way, where Alice has to encrypt the message m for T seconds, which works as follows.
  • Setup ( λ , k , m ) ( N , u , t )
    • Generate two large prime numbers p , q at random.
    • Calculate the Euler function ϕ ( N ) = ( p 1 ) ( q 1 ) for modes N = p q .
    • Determine the number S of modulo N-squared operations that the solver B o b can perform per second and compute t = T · S .
    • Encrypt m with secret key K.
    • Randomly select u Z N * and encrypt K as C K = K + u 2 t m o d N .
    • For C K , output the time-lock puzzle ( N , u , t ) .
  • Solve ( N , u , t ) ( y ) . When B o b receives the puzzle, without knowing p , q , it can only be solved for y by t successive squared mode computations.
  • Verify ( u , x , y , ϕ ( N ) ) { 0 , 1 } . Alice knows that ϕ ( N ) , with r = 2 t m o d ϕ ( N ) , can obtain the y result quickly and validate the solver Bob’s result y . If the y = y , output 1; otherwise, output 0.

2.2. Succinct Proofs for Hidden Order Groups

In succinct proofs for hidden order groups, there are two types of succinct proofs: one is the succinct proof of exponentiation (POE), and the other is the succinct proof of knowledge of a discrete-log (POKD).
In POE, given a group G of unknown order, Z N , and prime numbers P r i m e s λ in 0 , 2 λ , both the prover and the verifier are given the triad u , w , x , u , w G , x Z , and it takes the prover to convince the verifier that there is a u x = w in the group G . Define this relationship as R P O E = { ( ( u , w G , x Z ) ; ) : w = u x G } , where x can be much larger than the order of the group G [44].
  • R P O E . P r o v e ( u , w , x , N ) P . The prover receives the verifier’s prime N P r i m e s λ and computes the q = x N and the r, x = q N + r , and it sends P u q to the verifier.
  • R P O E . V e r i f y P , w , x , N { 0 , 1 } . The verifier computes r x m o d N and verifies that P N u r = w holds in the group, returning 1 if it does and 0 otherwise.
POE can be adapted to provide POKD, given a group G of unknown order, and the generator g, Z N and prime numbers P r i m e s λ in [ 0 , 2 λ ] . Define this relationship as R P O K D = { ( w G , x Z ) : w = g x G } .
  • R P O K D . P r o v e ( w , x , N ) P . The prover receives the verifier’s prime N P r i m e s λ , computes the q Z and the r, x = q N + r , and sends pair ( P g N , r ) to the verifier.
  • R P O K D . V e r i f y ( P g N , r ) , w , N { 0 , 1 } . The verifier verifies that P N u r = w and r [ N ] holds in the group, returning 1 if it does and 0 otherwise.

3. Syetem Model and Security Model of Public Authentic-Replica Sampling Mechanism

In this section, we will introduce the specific content of the public authentic-replica sampling mechanism, including the formal definition, system model, and security model.

3.1. Notations

In Table 1, we introduce the main notations of our scheme.

3.2. Overview of Our System Model

The system model of our scheme is represented in Figure 1. It contains four kinds of entities: several cloud servers (CSs), a client (C), a cloud organizer (CO), and a third-party auditor (TPA).
  • Client: The client, as the data owner, has a significant volume of data that needs to be stored on a cloud server.
  • CS: CS is a cloud server that provides enough storage space for the client and is always ready to take on challenges.
  • CO: The CO is used to aggregate all the proofs generated in the CS and sends the results of this aggregated proof to the TPA.
  • TPA: The TPA accepts the client’s delegation to check the integrity of the data.
The system works as follows.
  • Request Store: The client initiates the data outsourcing process by sending a formal request to the cloud storage (CS). The client sends the already encrypted and encoded data to the CS, requesting the storage of these data.
  • Data Access: Upon receiving the request, the cloud storage evaluates it to ensure it can meet the client’s needs. Once it confirms its ability to fulfill the request, the cloud storage accepts the request and asks the client to provide the necessary parameters for storing their data.
  • Ref-Table: The client prepares a Ref-Table that details the requirements for the distribution of data replicas. This table specifies which specific data (or data blocks) should be stored in which cloud storage providers to ensure redundancy and fault tolerance. The client sends this table to the cloud operator (CO). Upon receiving the Ref-Table, the CO must relay the corresponding information to each cloud provider. Once all clouds are aware of their necessary information based on the Ref-Table, they will respond to the messages, which will ultimately be verified by the client. For the client, data storage can only be considered distributed across multiple clouds after a successful verification process.
  • Auditing Delegation: After the client ensures that its data are stored in a multi-cloud environment, it delegates the responsibility of data verification to a third-party auditor (TPA). The TPA acts as an independent entity to verify the integrity of the client’s data in the multi-cloud storage.
  • Challenges: The TPA randomly selects challenges and sends them to each cloud storage. Specific data integrity challenges are directed to each cloud storage. These challenges are designed to verify whether the data are stored correctly and remain intact.
  • Proof: Each cloud storage, upon receiving the challenge, needs to generate an audit proof for its data and its replicas by utilizing the necessary parameters from the Ref-Table in conjunction with the received challenge. Each cloud storage submits its proof to the CO, which then uses an aggregation algorithm to combine all proofs into a single proof. This proof represents the collective data integrity status of all relevant cloud storages.
  • Auditing Result: After receiving the proofs, the TPA uses its verification algorithm to validate the responses from the cloud storage. Once the verification is complete, the TPA returns the results to the client, indicating whether the data integrity has been maintained across the multi-cloud environment.

3.3. The Formal Definition of Public Authentic-Replica Sampling Mechanism

Definition 1.
The public authentic-replica sampling mechanism contains the following seven algorithms: Setup, KeyGen, Store, Table, Replicate, Challenge and Verify.
1. 
S e t u p λ , k p p : Input the security parameter λ , k to generate the public parameter p p .
2. 
K e y G e n p p { s k c , p k c , ( s k S l , p k S l ) } : Input the public parameter p p to generate the public–private key pair ( s k c , p k c ) for the client, and for each server S l , generate the public–private key pair ( s k S l , p k S l ) ; thus, generate all the server public–private key pair records as ( ( s k S l , p k S l ) ) .
3. 
S t o r e F , n a m e , s k c , p k c , p p ( { σ i } , Γ , T ) : Input source coded file F , source coded file unique identifier n a m e , client private–public key pair s k c , p k c and public parameter p p , preprocessing for the source coded file F , output the tag σ i corresponding to the data block m i , the file tag Γ and the server Ref-Table T .
4. 
T a b l e { S l } , { I D S l } , c , I D c , p p , T { 0 , 1 } : The client decides on the replica allocation, inputs the Ref-Table T , encrypts and transmits the number of replicas required to be stored by the corresponding cloud server, and the corresponding server decrypts them, encrypts them again, and hands them over to the client for verification. Only after the client has verified is the Ref-Table made public.
5. 
R e p l i c a t e F , N , ξ , s k S l , p p , T , N S l { m i S l , j } : Each server S l is required to execute this R e p l i c a t e with input source encoding file F , replica number N random seed ξ sent by the client, Ref-Table T , and the server’s own secret key s k S l , the public parameter pp, the server’s own unique identifier N S l . Generate the replicas that the server should store by itself { m i S l , j } .
6. 
C h a l l e n g e { C h a l S l } , m i , σ i , { m i R e p } , p p P : Each server S l will be challenged with its respective corresponding challenge C h a l S l , the challenge data block, the tag corresponding to the data block, the corresponding replica stored by that server and the public parameters. For the challenge, the proofs are output, and finally the proofs from the server are handed over to the TPA for aggregation, and the TPA hands over the aggregated proof P to the client.
7. 
V e r i f y C h a l , P , p k c , p k s l , p p { 0 , 1 } : After receiving the final proof P, the client verifies it using its own public key. If the verification is successful, it outputs 1; otherwise, it outputs 0.

3.4. Serurity Model

To achieve the security goal, consider a game between an adversary A and a simulation environment E where the environment E acts as an honest client, an honest verifier, and a partially honest server:
  • Environment E runs KeyGen to generate the key pair ( s k c , p k c ) and gives p k c to the adversary.
  • Adversary A selects the encoded file F while interacting with environment E to store the encoded file with the store random oracle. Environment E performs the S t o r e algorithm and returns the { σ i } to adversary A .
  • Environment E executes the Table algorithm and returns the Ref-Table T A to the adversary A .
  • The adversary A runs Replicate to generate and store the T A replicas { m i A , j } generated by file F , while the environment E itself honestly generates and stores N - T A replicas { m i E , j } .
  • For any encoded file F block for which the adversary A has performed a store query, the adversary A executes the Challenge algorithm to generate proofs for its own stored replica, while the environment E executes the Challenge algorithm to generate honest proofs for all remaining replicas. Both the proof of the adversary A and the honest proof generated by environment E is handed over to environment E to execute the Verify algorithm for verification. After each execution of the protocol, environment E hands over the results of each verification to adversary A .
  • Adversary A selects the file obtained from the return of some store random oracle and outputs a description of the storage service provider.
For a cheating server S c , we say that a cheating server is ϵ -admissible if the cheating server is able to complete integrity verification convincingly with probability no less than ϵ , where
Pr m i S c , j S c . R e p l i c a t e ( T S c ) P S c S c . C h a l l e n g e ( T S c , F i ) V e r i f y ( P s c P t r u s t ) = 1 | C o n d i t i o n ϵ
where the condition is
C o n d i t i o n = ( s k c , p k c ) KeyGen ( p p ) , o r a c l e S c ( F ) T S c T a b l e m i T r u s t , j R e p l i c a t e ( N T S c ) F i o r a c l e P t r u s t C h a l l e n g e ( N T S c , F i )
Definition 2
(Authentic Replica). We introduce the authentic-replica property, which allows the prover P to commit to storing n distinct replicas of the data D, all of which have actual storage space, and for the prover to be able to give a proof that convinces the verifier.
We formalize two challenges: outsourcing attacks and generation attacks. An authentic-replica sampling mechanism design scheme that can overcome both attacks will distinguish itself from other integrity design schemes.
Given a multi-copy integrity auditing algorithm  P o S , an honest server provider SP, a malicious adversary  A , an honest verifier  V , and specific data D, assume that the adversary  A  has the following data storage capacity: 
  • The adversary  A is only able to truthfully physically store the number of copies of n 1 .
  • The adversary A is able to interact with an honest server provider S P where S P has unlimited storage space.
1. 
The honest verifier V generates n copies ( n > n 1 ) by means of the P o S algorithm, which are handed over to the adversary A for storage. (Here, in order to ensure that these n copies are real, they are generated by an honest verifier).
2. 
Honest validator V for generating a series of challenges c i to hand over to the adversary A .
3. 
The adversary A generates a series of proofs for these challenges π i .
4. 
An honest verifier V verifies a series of challenges generated by an adversary A and gives the verification results.
5. 
If there is P [ V . P o s V e r i f y ( π i ) = 1 | π i = A . P o s P r o v e ( c i ) ] > 1 ξ , then adversary A wins the game; otherwise, adversary A loses.
If no adversary can win the game with non-negligible probability, the scheme is characterized by the authentic-replica property.
Definition 3
(Extractability). If any probabilistic polynomial-time (PPT) adversary plays the aforementioned game and manages to successfully output an S c , that achieves ϵ-admissible. It can be demonstrated that an extraction algorithm exists which, with overwhelming probability, can recover the challenged data block.
Definition 4
(Storage allocation). For any rational adversary can successfully output a service provider in the above game that is ϵ-acceptable and satisfies authentic-replica (η), specifically, that allocates replica storage for only an extreme η ratio of the source data, where 0 < η < 1 , with a negligible probability of acceptance by the verifier, then we say that the protocol is ( η , ϵ ) -storage allocation satisfying.
Definition 5
(Detectability). If the probability of detecting the corruption of all stored data is no less than ψ when the corrupted fraction of the source data is ζ 0 and the corrupted fraction of each replica is ζ j , then we say that the protocol is ( ζ 0 , ζ 1 , , ζ t , ψ ) detectable.
Definition 6.
If the q-DABDHE problem is hard, then the Ref-Table transmission process is secure under our scheme.
Definition 7
(Outsourcing attack). Upon receiving the challenge c, adversary A quickly obtains the corresponding data D from another storage provider P * and produces a proof, thus giving the illusion that adversary A itself actually stores the data D.
Definition 8
(Generation attack). If adversary A is able to identify the data D that it needs to store, A can lock that data D, allowing A to regenerate D on demand. On receiving a challenge c, A generates data D on demand and then generates a proof from that newly generated data D, thus giving the illusion that adversary A itself actually stores the data D.

4. Our Proposed Scheme

In this section, we have provided specific implementation steps for the public authentic-replica sampling scheme and demonstrated its security through security analysis.

4.1. Construction of the Public Authentic-Replica Sampling Scheme

The scheme is as follows:
  • S e t u p λ , k p p
    K G C generates an RSA modulus N = q 0 · q 1 , q 0 = 2 q 0 + 1 , q 1 = 2 q 1 + 1 , q 0 , q 1 containing large prime numbers known only to C. The unique cyclic group QR N of order q 0 q 1 Z N * , obtains u 0 from Z N * , where g c d u 0 ± 1 , N = 1 , and obtains the generating element u = u 0 2 of the group. Choose the multiplicative cyclic group G 1 , G T , with prime p, the order of G 1 , g, the generator of G 1 , and e : G 1 × G 1 G T . Choose the following hash function, two pseudorandom sequences, and a pseudorandom function to randomly select α ¯ , β ¯ 1 , β ¯ 2 , β ¯ 3 Z p , and compute m p k = ( g 1 , g 2 , g 3 , g 4 ) = ( g α ¯ , g β ¯ 1 , g β ¯ 2 , g β ¯ 3 ) , m s k = ( α ¯ , β ¯ 1 , β ¯ 2 , β ¯ 3 ) .
    H 0 : { 0 , 1 } * Z p H 1 : { 0 , 1 } * Q R N H 2 : { 1 , 2 , , n } { 0 , 1 } κ H 3 : { 0 , 1 } * Prm ( 2 λ ) H 4 : { 0 , 1 } * Prm ( 2 λ ) f : Z p × { 1 , 2 , , n } Z p φ : Z p × Z p Z p π : Z p × { 1 , 2 , , n } Z p
    Obtain public parameters p p , where p p = { N , u , G 1 , G T , g , { H i } , f , φ , π , m p k } , i [ 0 , 4 ] .
  • K e y G e n ( p p , m s k , m p k ) ( ( s k c , p k c ) , { s k s , p k s } )
    C puts his identity I D c { 0 , 1 } * to K G C ; K G C takes C’s identity, master key, and public parameters as input and randomly selects three random numbers r 1 , r 2 , r 3 Z p to compute C’s partial private key s k p c , s k p c = r 1 , g ( β ¯ 1 r 1 ) ( α ¯ I D c ) , r 2 , g ( β ¯ 2 r 2 ) ( α ¯ I D c ) , r 3 , g β ¯ 3 r 3 α ¯ I D c . Similarly, for each server S l , its partial private key is also calculated based on its identity I D S l , s k p S l = r 1 S l , g β ¯ 1 r 1 S l α ¯ I D S l , r 2 S l , g β ¯ 2 r 2 S l α ¯ I D S l , r 3 S l , g β ¯ 3 r 3 S l α ¯ I D S l . C picks a large prime number α and calculates α = α 1 m o d q 0 q 1 ; s k c = ( s k c 1 = α , s k p c ) , p k c = ( p k c 1 = α , I D c ) . For each store on S l , choose β S l 1 , β S l 2 Z p as the private key ( s k S l 1 , s k S l 2 ) and compute the public key ( g β S l 1 , g β S l 2 ) . S l has s k S l = s k S l 1 , s k S l 2 , s k p S l , p k S l = p k S l 1 , p k S l 2 , I D S l
  • S t o r e ( F , n a m e , s k c , p k c , p p ) ( { σ i } , Γ , T )
    γ = H 4 ( s k c 1 | | n a m e ) γ = γ 1 mod q 0 q 1 γ ^ = γ · p k c 1 mod q 0 q 1
    Calculate the file tag Γ = ( γ , γ ) , calculate to obtain the tag corresponding to data block m i , σ i = ( H 1 ( n a m e | i ) ) γ · m i s k c 1 . Given a Ref-Table T , T denotes the delegated replica servers and their delegated replicas:
    T = S 1 R e p , S 2 R e p , S 3 R e p , , S k R e p T = 1 , 2 , 4 3 , 5
    for a total of N replicas and k servers
    i = 1 k S i R e p = N
  • T a b l e . E n r ( { S l } , { I D S l } , c , I D c , p p , T ) { 0 , 1 } . For each server S l , l 1 , k , the number of replicas commissioned as well as the original encoded file need to be stored according to T . By using IBE, client C will ensure that the Ref-Table is known to the individual service providers.
    C . E n r T S l , I D S l , s , p p C T S l . Client C chooses a random number s Z p , H s i g : { 0 , 1 } * G 1 . Calculate C T S l = C 1 , C 2 , C 3 C 4 , ( C 1 , C 2 , C 3 ) = ( g 1 g I D S l ) s , e ( g , g ) s , e ( g 4 , g ) s · T S l , and C 4 = e ( g 2 , g ) s · e ( g 3 , g ) s w , where w = H s i g C 1 , C 2 , C 3 .
    C . D e c C T S l , s k p S l , p p T S l . When the server receives the ciphertext of the Ref-Table, it decrypts the plaintext of the Ref-Table based on a portion of its own partial private key and obtains its own Ref-Table that it should need to store.
    T S l = C 3 e ( C 1 , g β ¯ 3 r 3 s l α ¯ I D S l ) · C 2 g β ¯ 3 r 3 s l α ¯ I D S l = e ( g 4 , g ) s · T S l e g α ¯ I D S l · s , g β ¯ 3 r 3 S l α ¯ I D S l · e ( g , g ) s · r 3 S l
    And the server needs to convince the client that it already knows which copies it needs to store and for which purpose. It also needs to encrypt the transmission of its own results:
    S . E n r T S l , I D c , s , p p C T S l ;
    S . D e c C T S l , s k p c , p p T S l .
    After both the client and the server have confirmed completion, if the Ref-Table was accepted correctly, output 1 and expose the Ref-Table T ; otherwise, output 0.
  • R e p l i c a t e ( F , N , ξ , s k S l , p p , T , S l , p k c ) { m i ( S l , j ) }
    For each replica j 1 , N S l R e p , obtain the random seed ξ and compute
    r S l , j = f ξ ( H 0 ( n a m e | | j | | N S l ) ) r ˙ S l , j = s k S l 2 r S l , j H 0 ( n a m e | | r S l , j ) s k S l 1 R S l , j = g r S l , j
    Then, each data block m i is encrypted to generate a replica m i S l , j based on an RSA time-lock puzzle.
    μ S l , i , j = μ S l , i 1 , j θ ϑ i H 2 ( θ | | R S l , j ) , μ = μ S l , 0 , j m i ( S l , j ) = m i · μ S l , i , j
    where ϑ i
    ϑ i = [ i , i + δ ] i [ 1 , n δ ] [ i , i + δ ] i [ n δ + 1 , n ]
  • C h a l l e n g e C h a l , m i , σ i , { m i ( S l , j ) } , p p P
    C h a l l e n g e C h a l S l , m i , σ i , { m i ( S l , j ) } , p p P S l
    TPA randomly selects two values r s 1 , r s 2 Z q , where r s 1 is the random seed of the π and r s 2 is the random seed of the ϕ . Together with the c challenge block numbers to be checked, the TPA sends C h a l = { c , c p } , r s 1 , r s 2 ; C searches the Ref-Table T and sends the challenge C h a l S l to the corresponding S l .
    Upon receiving C h a l S l , the server S l responds to the query block m i and its corresponding tag σ i as well as the copy m i S l , j as follows:
    • S l generates its own challenge set; after accepting the challenge, C h a l S l generates the challenge set C S l = C 1 , C S l 2 = { i β ˙ , v β ˙ } , j ε , i = π r s 1 , β ˙ , v β ˙ = ϕ r s 2 , β ˙ , j ε = π r s 1 , ε , β ˙ 1 , c , ε 1 , c p S l R e p , C = ( C 1 , C 2 = l = 1 k C S l 2 )
    • S l computes data proofs M S l ˜ , tag proofs σ S l ˜ and replica proofs R S l ˜ , where j ε ( [ 1 , c p ] S l R e p ) .
      M ˜ S l = ( i β ˙ , ν β ˙ ) C 1 m i β ˙ ν β ˙ σ ˜ S l = ( i β ˙ , ν β ˙ ) C 1 σ i β ˙ ν β ˙ R ˜ S l = ( i β ˙ , ν β ˙ ) C 1 j ε m i ( S l , j ε ) v β ˙
    • S l generates itself honestly and correctly generates copies of the proof U S l
      U S l = R S l ˜ M S l ˜ | [ 1 , c p ] S l R e p | v S l ˜ = ( i β ˙ , v β ˙ ) C 1 v β ˙ j ε ( [ 1 , c p ] S l R e p ) k ˙ [ 1 , i β ˙ ] θ ϑ k ˙ H 2 ( θ R S l , j ε )
    A g g r e s s i v e ( { P S l } , T , p p ) P a g g
    • C O verifies all the σ S l ˜ , and if successful, it aggregates all the proofs of S l . Since all the σ S l ˜ are challenged for the same chunks of data, it stands to reason that all of the σ S l ˜ would have the same result.
      σ ˜ = σ S l , l [ 1 , k ] U = l = 1 k U S l v ^ = l = 1 k v S l τ H 3 ( { v i } i c , u ) ω [ v ˜ / n ] Ω = u ω M ˜ = l = 1 k M S l ˜ R ¯ = l = 1 k R S l ˜
    • Give the aggregated proof P = M ˜ , σ ˜ , R ˜ , U , Ω to V, and let V validate the integrity verification of the file F and the copy.
  • V e r i f y C h a l , P , p k c , p k s l , p p { 0 , 1 }
    • The verifier recovers for each replica the corresponding R S l , j
      R S l , j = p k S l 1 1 · ( p k S l 2 · g r S l , j ) 1 H 0 ( n a m e | | r S l , j )
    • Generation of necessary parameters
      ν ˜ = l = 1 k ˙ ( i β ˙ , v β ˙ ) C 1 v β ˙ · j ε ( [ 1 , c p ] S l R e p ) ( k ˙ [ 1 , i β ˙ ] θ ϑ k ˙ H 2 ( θ | | R S l , j ε ) ) τ H 3 { v i } ( i β ˙ , v β ˙ ) C 1 , u w ν ˜ mod τ
    • Verify
      U = Ω τ · u w R ˜ = M ˜ · U σ ˜ γ = ( i β ˙ , v β ˙ ) C 1 H 1 ( n a m e | | i β ˙ ) v i p k c 1 · M ˜ γ

4.2. Security Analysis

Theorem 1.
The proof can pass final validation if all participants honestly follow the specified process.
Proof. 
Proof 1: U = Ω τ · u w
U = l = 1 k U S l = l = 1 k R S l ˜ M S l ˜ [ 1 , c p ] S l R e p = l = 1 k R S l ˜ ( i β ˙ , ν β ˙ ) C 1 ( m i β ˙ ν β ˙ ) [ 1 , c p ] S l R e p = l = 1 k ( i β ˙ , v β ˙ ) C 1 j ε [ 1 , c p ] S l R e p m i β ˙ ( S l R e p , j ε ) ν β ˙ ( i β ˙ , v β ˙ ) C 1 m i β ˙ ν β ˙ [ 1 , c p ] S l R e p = l = 1 k ( i β ˙ , ν β ˙ ) C 1 j ε [ 1 , c p ] S l R e p m i β ˙ · μ S l , i β ˙ , j ε ν β ˙ ( i β ˙ , ν β ˙ ) C 1 m i β ˙ ν β ˙ [ 1 , c p ] S l R e p = l = 1 k ( i β ˙ , ν β ˙ ) C 1 j ε [ 1 , c p ] S l R e p μ S l , i β ˙ , j ε ν β ˙ = μ v ˜ = Ω τ · u w
Proof 2: R ˜ = M ˜ · U
R ˜ = l = 1 k R S l ˜ = l = 1 k ( i β ˙ , ν β ˙ ) C 1 j ε [ 1 , c p ] S l R e p m i β ˙ ( S l , j ε ) ν β ˙ = l = 1 k ( i β ˙ , v β ˙ ) C 1 j ε [ 1 , c p ] S l R e p m i β ˙ · μ S l , i β ˙ , j ε ν β ˙ = M ˜ · l = 1 k μ ν S l = M ˜ · μ v ˜ = M ˜ · U
Proof 3: σ ˜ γ ^ = ( i β , ν β ) C 1 H 1 ( n a m e | | i β ) v i p k c · M ˜ γ
σ ˜ γ ^ = ( i β , ν β ) C 1 σ i β ν β γ ^ = ( i β ˙ , ν β ˙ ) C 1 ( H 1 ( n a m e | | i β ˙ ) ) γ · m i β ˙ s k c 1 ν β ˙ γ ^ = ( i β ˙ , v β ˙ ) C 1 H 1 ( n a m e | | i β ˙ ) v β ˙ p k c 1 · M ˜ γ
Theorem 2.
The protocol ensures extractability under the RSA assumption in the random oracle model.
Proof. 
First, we assume the existence of an adversary A capable of breaking the extractability of the protocol. We then construct a verifier B that interacts with adversary A , leveraging the assumption that A can indeed compromise extractability.
  • Preliminary stage: Specify the large prime y Z N * and α to generate the RSA instance N , α , y .
  • Setup stage: B calculates a = y 2 m o d N and sends N , α to the adversary A . Meanwhile, B itself calculates γ = H 4 ( α | n a m e , α = α 1 m o d q 0 q 1 and with γ = γ 1 m o d q 0 q 1 as the key.
  • Query phase: Adversary A adaptively queries B in this phase with the only restriction that A can never query individual blocks with the same index i. B honestly responds as follows:
    • When B receives a query for σ i , if there is no record for σ i in B’s record table, it generates θ i R Q R N as σ i and returns it to A . It also records in the record table m i , i , θ i , σ i .
    • When B receives a query for σ i , it returns σ i directly if there is already a record for σ i in B’s record table.
    • When B receives a query for H 1 ( n a m e | | i ) , if there is no record for H 1 ( n a m e | | i ) in B’s record table, generate H 1 ( n a m e | | i ) = θ i m i α γ as H 1 ( n a m e | | i ) and return it to A . It also records n a m e , i , H 1 ( n a m e | | i ) in the record table.
    • When B receives a query for H 1 ( n a m e | | i ) , return it to A if there is already a record for H 1 ( n a m e | | i ) in B’s record table.
  • Challenge Phase: B generates a challenge set for the data block and data copy to be challenged: C A = ( C 1 , C A 2 ) = ( { ( i β ˙ , v β ˙ } , { j ε } ) , i = π r s 1 , β ˙ , v β ˙ = ϕ r s 2 , β ˙ , j ε = π r s 1 , ε , β ˙ 1 , c , ε 1 , c p A R e p . And give A the challenge set C A .
  • Forge phase: Based on the challenge given by B, A generates its own proof P A * = ( M ˜ A * , σ ˜ A * , R ˜ A * , U A * ) . At the same time, we have other honest provers for generating proofs of their own challenged data blocks and data replicas: { P o r } o t h e r = { ( M ˜ o r , σ ˜ o r , R ˜ o r , U ˜ o r ) } . Also, record a special honest prover that is challenged in the same way as adversary A . We record the proof generated by this special honest prover as P u n i = ( M ˜ u n i , σ ˜ u n i , R ˜ u n i , U ˜ u n i ) . We determine, by assumption, that P A * + { P o r } o t h e r is able to pass the verification after undergoing aggregation. At the same time, based on the correctness of the protocol, we are able to obtain that the final proof generated as a result of the aggregation by { P o r } o t h e r + P u n i must also be able to pass validation given that we have the assumption that the adversary A is able to similarly pass validation. Then, comparing the adversary A and the special honesty prover u n i , both of the following should hold:
    σ ˜ u n i γ ^ = ( i β ˙ , v β ˙ ) C 1 H 1 ( n a m e | | i β ˙ ) v β ˙ α · M ˜ u n i γ
    σ A * ˜ γ ^ = ( i β ˙ , v β ˙ ) C 1 H 1 ( n a m e | | i β ˙ ) v β ˙ α · M ˜ A * γ
    So there we have it:
    σ ˜ u n i σ ˜ A * γ ^ = M ˜ u n i γ M ˜ A * γ
    simplifying gives the following:
    σ ˜ u n i σ ˜ A * α = M ˜ u n i M ˜ A *
    Let x = σ ˜ u n i σ ˜ A * and y = a Φ ( m , v ) = M ˜ u n i M ˜ A * ; then, we have x α = y 2 * Φ m , v . Since α is a large prime, g c d α , 2 · Φ m , v = 1 . Therefore, we obtain b 0 , b 1 according to the extended Euclidean principle such that b 0 · α + b 1 · 2 · Φ m , v = g c d α , 2 · Φ m , v = 1 . and generates the solution for the RSA instance:
    y 1 α = y b 0 x b 1
Second, we assume that adversary A has ϵ -admissible, i.e., adversary A can pass the ϵ part of the challenge. We set the ratio of recovered files to ρ and compute ω = 1 / 2 λ + ρ n C 1 / n C 1 + 1 C 1 . We have that ϵ ω is non-negligible, so A is able to perform a number of interactions before recovering the encoded file of the ρ ratio.
Eventually, it is possible to complete the recovery of the entire file through a number of coded files recovered according to the ρ ratio. □
Theorem 3.
If the replica generation algorithm based on time-locked puzzles is time-consuming for the prover, the proposed protocol guarantees storage allocation, which is an authentic-replica feature.
Proof. 
Through the R e p l i c a t e algorithm, we are able to find that for each replica, the time of generating the replica depends on the time when the generation of μ S l , i , j is finished, and its generation time can be determined by ϑ i , which is simplified according to (9). μ S l , i , j is based on the variable δ . Therefore, we define the time of replica generation as T c p δ . At the same time, we assume that each adversary A i did not keep all the replicas, but the only ratio of the replicas kept is ϱ . Therefore, for the adversary A i , the only data actually stored by the adversary A i is the original encoded data and the ϱ radio of the replicas that should have been stored for each copy, that is, | F | ( 1 + ρ · | S A | ) , and upon receiving a challenge to a block of data from | C 1 | , adversary A i usually needs to compute the missing portion, i.e., the ( 1 ϱ ) radio of each replica. We therefore assume that the time used by adversary A i to compute the missing copies 1 ϱ C 1 in its storage is T A m i n . To achieve this, we only need to set the forced time for copy generation to exceed the time at which the adversary is able to compute the missing copy, i.e., T c p δ > T A m i n . This is because when the adversary A i tries to recover the missing block, the time for the adversary A i to generate the copy block is greater than T A m i n according to the limitations of the algorithm. In such a case, it is guaranteed that the adversary A i is not able to obtain a copy between the receipt of the challenge and the generation of the replica proof, and an honest verifier can clearly notice that the adversary A i does not store a full storage replica. At the same time, in distributed environments, it is common to set T c p δ to have a significant difference with T A m i n , so that the verifier can obviously feel the difference between the honest prover and the adversary A i , which is able to resist outsourcing attacks and generation attacks.
We are given the premise that the time for adversary A i to recover the ( 1 ϱ ) ratio replica at the time of receiving the challenge is at least T A m i n , which is equivalent to adversary A i being able to recover the ( 1 ϱ ) ratio replica in at most T A m i n time. If the ratio of copies to be recovered is less than ( 1 ϱ ) , then it is possible for adversary A i to complete the challenge without being suspected by the verifier. To achieve this, we need to be able to generalize and obtain the probability that the adversary will defeat the scheme in the worst case.
Given all challenge data blocks C 1 , the adversary A i still keeps only replicas of the ϱ ratio, and the probability that one of the challenge blocks received by the adversary A i is in the replica retained by itself is P r . P r is the probability that out of all the challenge-selected C 1 blocks, at least one of the selected challenge blocks is in a copy block that is actually kept by the adversary A i :
P r = C ϱ n | C 1 | + C ϱ n | C 1 | 1 + + C ϱ n Q | C 1 | + 1 C n | C 1 |
We have the number of challenged blocks C 1 < < n . For ease of computation, we give the reasonable assumption that C 1 < ϱ n / 2 . Thus, for a > b , we have C ϱ n a > C ϱ n b .
P r = C ϱ n | C 1 | + C ϱ n | C 1 | 1 + + C ϱ n Q | C 1 | + 1 C n | C 1 | ( 1 ϱ | C 1 | ) C ϱ n | C 1 | C n | C 1 | = ( 1 ϱ | C 1 | ) ϱ n ( ϱ n 1 ) ( ϱ n | C 1 | + 1 ) n ( n 1 ) ( n | C 1 | + 1 ) = ϱ n n ϱ n 1 ϱ n 1 ϱ n | C 1 | 1 ϱ n | C 1 | + 1 ( 1 ϱ ) | C 1 | ϱ | C 1 | | C 1 | ( 1 ϱ )
Each S l is storing all the source data blocks, so the probability that any one S l alone can pass the tapping scheme is at most ϱ C 1 C 1 1 ϱ . Note that each S l is independent of each other, so we are able to conclude that the probability that all S l are breached is at most ( ϱ C 1 C 1 1 ϱ ) k . □
Theorem 4.
The proposed protocol is ζ 0 , ζ 1 , , ζ N , ψ detectable. Specifically, each S l has ζ 0 , { ζ j ε } , ψ S l detectable, where ψ S l = 1 j ε [ 1 , c p ] S l R e p ( 1 ζ j ε ) | C 1 | , and ( ζ 0 , ζ 1 , , ζ N ) is the corruption rate of the copy.
Proof. 
Consider a data file with a corruption rate of ζ 0 and N replicas, each with a corruption rate of ζ i , i [ 1 , N ] , with a corruption detectability rate of not less than 1 j ε 0 , c p 1 ζ j ε C 1 .
A questioned data block or replica block is considered complete only if it is selected with probability 1 ζ j ε from the complete portion, where j ε 1 , c p S l R e p , and then, ( 1 ζ j ε ) | C 1 | represents the probability that all challenged blocks are deemed well preserved, applying to both the data file and its replicas. In this scenario, corruption of both the data and the replicas remains undetectable. In addition, a challenge replica is specified in each S l such that the probability of non-detectability is j ε ( [ 1 , c p ] S l R e p ( 1 ζ j ε ) | C 1 | . Therefore, the probability of being undetectable is j ε 0 , c p 1 ζ j ε C 1 in all the challenge copies throughout the protocol and for this reason, our protocol is detectable with a probability at least 1 j ε 0 , c p 1 ζ j ε C 1 . □
Theorem 5.
If the q D A B D H E problem is difficult, then R e p l i c a t e ( F , N , ξ , s k s , p p , T , S l ) in which S l is given to C is then safe.
Proof. 
Let us assume that there is an adversary A capable of attacking the program. Given a problem instance g 0 , g 0 a q + 2 , g , g a , g a 2 , , g a q , Z on a bilinear pairing group PG and an honest entity B:
  • Setup: For B, choose three polynomials of order q, F 1 x , F 2 x , F 3 x at random from Z p x . Then, compute
    g 1 = g a , h 1 = g F 1 ( a ) , h 2 = g F 2 ( a ) , h 3 = g F 3 ( a )
    Then, there is a private–public key pair α = a , g 1 , ( β ˙ 1 = F 1 a , h 1 ) , ( β ˙ 2 = F 2 a , h 2 ) , ( β ˙ 3 = F 3 a , h 3 )
  • Query 1. Adversary A asks for I D ’s private key, B computes d I D and gives d I D to adversary A . Adversary A asks for the decryption result of C T , and B computes the decryption result and gives it to adversary A . d I D = ( F 1 ( I D ) , g F 1 ( a ) F 1 ( I D ) x I D , F 2 ( I D ) , g F 2 ( a ) F 2 ( I D ) x I D ,   F 3 ( I D ) , g F 3 ( a ) F 3 ( I D ) x I D ) .
  • Challenge: The adversary A outputs two equal-length messages m 0 , m 1 G T , and the challenge identity I D * { 0 , 1 } n . Let d I D = d 1 * , d 2 * , d 3 * , d 4 * , d 5 * , d 6 * be a private key corresponding to I D * . B randomly selects c { 0 , 1 } and computes the ciphertext C T * = C 1 * , C 2 * , C 3 * , C 4 * , where C 1 * = g 0 a q + 2 I D * q + 2 , C 2 * = Z · e g 0 , i = 0 q g f i a i , C 3 * = e ( C 1 * , d 6 * ) · ( C 2 * ) d 5 * · m c , C 4 * = e ( C 1 * , d 2 * ( d 4 * ) w * ) · ( C 2 * ) d 1 * + d 3 * w * , where w * = H ( C 1 * , C 2 * , C 3 * ) , and f i represents the coefficients of x i in the polynomial x q + 2 ( I D * ) q + 2 x I D * .
  • Query 2. Adversary A asks I D I D * for the private key with respect to the decryption result of the ciphertext C T C T * , and B computes d I D = ( F 1 ( I D ) , g F 1 ( a ) F 1 ( I D ) x I D ,   F 2 ( I D ) , g F 2 ( a ) F 2 ( I D ) x I D , F 3 ( I D ) , g F 3 ( a ) F 3 ( I D ) x I D ) and the corresponding decryption result.
    We let z be a random and non-zero integer, Z = e ( g 0 , g ) a q + 1 · e ( g , g ) z . Challenge ciphertexts C 1 * = g s ( α I D * ) , C 2 * = e ( g , g ) s + z , C 3 * = e ( g , g ) z d 3 * · e ( h 3 , g ) s · m c .
By randomness, adversary A can only obtain d 5 * from a decryption query to I D * , C T . B can make C T = ( C 1 , C 2 , C 3 , C 4 ) , C 1 = ( g 1 g I D * ) s , C 2 = e ( g , g ) s , w = H ( C 1 , C 2 , C 3 ) . If s = s , then B can output C 3 e ( C 1 , d 6 * ) · C 2 d 5 * = C 3 e ( h 3 , g ) s , and the adversary cannot obtain d 5 * from it. If s s , then the ciphertext is wrong and B can reject it outright.
Since a , F 1 x , F 2 x , F 3 x , l o g g g 0 are all random, the encryption scheme is randomized. Then,
  • If C 1 , C 2 , C 3 = C 1 * , C 2 * , C 3 * and H C 1 , C 2 , C 3 = w = w * , in order for the incorrect ciphertext to be verified, it would need to be C 4 = C 4 * . However, the ciphertext is a challenge ciphertext and is not interrogated.
  • Else, H C 1 , C 2 , C 3 = w w * . In order to make the ciphertext go through, the adversary needs to compute to obtain C 4 . But in C 4 = e ( C 1 , d 2 * ( d 4 * ) w ) · C 2 d 1 * + d 3 * w , since d 1 * + d 3 * w and d 1 * + d 3 * w * are random and independent, the adversary has no advantage to generate a verifiable C 4 . The probability of the first C 4 adaptive selection is 1 p and the second is 1 p 1 . Thus, q d decrypted queries with a probability of at most q d p q d . Plus, the probability of guessing at least c correctly is 1 / 2 . Thus, the probability that the adversary succeeds is at most 1 / 2 + q d p q d .

5. Performance of Our Scheme

To evaluate performance, we compare the property analysis and experimental results of the proposed protocol with those of the schemes by Armknecht et al. [22], Guo et al. [39], and Shen et al. [23]. All three protocols support proof of data replication.

5.1. Property Analysis

The performance analysis is shown in Table 2. Armknecht et al. [22] and Guo et al. [39] require a large amount of precomputation of private copy parameters to generate replicas on the client side due to the use of secret and public LFSRs, which leads to the fact that these two protocols only support private authentication. Secondly, the user-friendly solution [23] to the above two problems proposes a new replica structure that is publicly verifiable. However, the new replica structure in Shen et al. [23] does not satisfy the requirements of a distributed environment, cannot satisfy public verifiability in distributed environment, does not have authentic-replica characteristics, and does not have the problem of resisting disaster environments. Therefore, we propose a new public authentic-replica sampling mechanism in distributed storage environments, which ensures a low amount of preparation work for replica generation. It also ensures that under the same number of replicas, each server can generate replicas faster, and the authentication expenditure is independent of the number of replicas. At the same time, it can resist outsourcing attacks and generation attacks, and it can achieve public verifiability in distributed storage environments.

5.2. Communication Overhead

The communication overhead for different schemes is shown in Table 3, where λ * denotes the length of the public LSFR. In this comparison, the other schemes are subjected to multiple experiments according to the k-cloud environment, and the results show that we achieve the transmission of Ref-Tables and random numbers through a small amount of additional overhead, which must be borne to ensure the security of the scheme.

5.3. Experimental Performance

In the experiments, the test file size was 100 MB, the RSA mode size was set to 3072 bits, the experiments were performed on Ubuntu 20.04.4 LTS, the computer was configured with an Intel i5 CPU running VMware Fusion 13.0.0, and the programs were executed with Python 3.8.
A numerical comparison of the computational costs is shown in Table 4, where P denotes power taking on Z , M is multiplication on groups, R denotes P R F , V is the inverse operation, λ is the length of the secret L F S R , δ is the tunable parameter when generating the replicas, and | C 1 | and | C 2 | are the number of all challenge blocks and replicas, respectively. Several operations with negligible computational overhead, such as hashing and modulo addition, are ignored in the comparison. To ensure that the single replica generation time is consistent in the experiments, we set the adjustable parameter to 8 in both the Shen et al. [23] scheme and our scheme.
We demonstrate that the authentic-replica sampling scheme is effective in five ways: the overhead time of copy parameters, the time of generating data tags, the time of generating the same number of replicas, the time of generating proofs and verifying the proofs of the challenge blocks, and the total time of the different replicas and challenged blocks. We compare this paper scheme with those of Guo et al. [39], Armknecht et al. [22], Gritti et al. [45] and Shen et al. [23].
We demonstrate the time required to generate copy parameters for different numbers of replicas. To maintain consistency with the subsequent experiments, we limit the number of replicas to 20 or fewer. As shown in Figure 2a, the copy parameters in our scheme are simply the random seeds sent by the client to each cloud storage node and the Ref-Table. Thus, the transmission time is extremely short and can be almost neglected. This indicates that the overhead of generating copy parameters in our scheme is independent of the number of replicas, presenting a very low constant overhead.
We demonstrate the time consumed to generate the corresponding tags for different data blocks. As shown in Figure 2b, the tag generation time in our scheme is not the fastest. This is because the tags we set need to ensure the final tags can be aggregated. Additionally, to ensure higher security, especially to resist outsourcing attacks and generation attacks, we made a compromise in the tag generation time. Constructing more secure tags requires additional computation time, which leads to an increase in time overhead. However, this overhead is justified, as the extra time is invested to enhance the security and robustness of the overall scheme, better protecting data integrity.
We demonstrate the time overhead required for generating replicas for different sizes of replica data when the data block size is fixed at 8 KB Figure 3a. We know that all the necessary parameters for generating replicas, such as random numbers and the Ref-Table, have already been obtained before replica generation. Therefore, in our scheme, after distributing the delegated replica data across different clouds, the replica generation time is significantly accelerated. In contrast, in the schemes proposed by Armknecht et al. [22], Guo et al. [39], and Shen et al. [23] for single-cloud storage, the replica generation time increases linearly with the number of replicas. Additionally, we randomly constructed the Ref-Table to calculate the time for replica generation in a multi-cloud environment; see Figure 3b. Although the time overhead is not ideal, it demonstrates that our scheme can handle different delegation numbers for different clouds based on the Ref-Table.
We demonstrate the time overhead required for generating proofs for different numbers of challenged blocks. In comparison with the schemes of Armknecht et al. [22], Guo et al. [39], and Shen et al. [23] in a single-cloud environment, as shown in Figure 4a, by eliminating communication delays and using the same delegated replicas, our approach generates proofs more efficiently. In contrast to a single-cloud environment, when each cloud is challenged, it only needs to generate proofs for the challenged blocks corresponding to the replicas it stores. Since the number of replicas is distributed across different clouds, the challenges are also spread out, accelerating the proof generation process.
In Figure 4b, compared to the schemes of Armknecht et al. [22], Guo et al. [39], Gritti et al. [45] and Shen et al. [23], our solution aggregates all servers belonging to the same cloud in the authentication phase with a unique identifier for that cloud. At the same time, the extension to all clouds will enable batch authentication in multi-cloud environments, which will lead to a reduction in authentication time.
We present the total time overhead required for different numbers of replicas and different challenged blocks under the condition of maintaining 20 data blocks, each sized 8 KB, across various schemes. As shown in Figure 5, the efficiency advantage of our scheme is clearly demonstrated. Regardless of the scenario, our scheme consistently achieves the optimal overall time efficiency.
The proposed scheme demonstrates excellent performance under various parameter settings, making it adaptable to environments with large files, multiple replicas, and multi-cloud storage. Furthermore, its resistance to outsourcing attacks and generation attacks, combined with a constant-level user computation overhead, makes it suitable for most multi-cloud environments. Examples include its application in cloud storage systems for electronic medical records, electronic evidence, and e-commerce systems.

6. Conclusions

In this paper, we propose a public authentic-replica sampling mechanism in a distributed storage environment, give a formal definition of authentic-replica, and at the same time give a security model of the authentic-replica sampling mechanism. Based on the time-locking puzzle, we design an authentic-replica auditing mechanism using an identity encryption mechanism and succinct proof technique, which enables the scheme to satisfy the authentic-replica characteristics and resist outsourcing attacks and generation attacks. In addition, the scheme achieves the cost of generating and uploading copy parameters by the client to be at a constant level even if the number of stored replicas grows through the combination of random numbers and Ref-Table, which ensures the server will store data with a smaller bandwidth, and at the same time, guarantees the publicly verifiable characteristic through the publicly recoverable copy parameters. The experimental results show that the scheme is secure, acceptably efficient, and reasonable. The experiments show that our scheme makes the time required to generate copy parameters negligible while significantly optimizing the time required for replica generation. As the amount of replica data increases, the time spent does not grow linearly. Additionally, due to the multi-party aggregation design, the verification time is also optimal. Compared to the latest schemes, the verification time is reduced by approximately 30%. We will expand the functionality of data migration as the future direction of this paper, aiming to dynamically migrate user data in distributed storage while maintaining low costs. Ensuring that clients are convinced the new physical space can meet their needs will be a key issue we need to focus on and resolve moving forward.

Author Contributions

Conceptualization, J.Y.; methodology, J.Y.; validation, J.Y. and Y.B.; resources, J.X.; writing—original draft preparation, J.Y.; writing—review and editing, J.Y. and S.H.; visualization, J.Y.; supervision, Z.H.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China grant number 62072249. This work was also supported by the National Natural Science Foundation of China (62032025, 62172258), and the Shenzhen Science and Technology Program (JCYJ20210324134810028).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Wei Wan was employed by the company State Grid ZaoZhuang Power Supply Company. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Xue, K.; Li, S.; Hong, J.; Xue, Y.; Yu, N.; Hong, P. Two-cloud secure database for numeric-related SQL range queries with privacy preserving. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1596–1608. [Google Scholar] [CrossRef]
  2. Majumdar, A.; Biswas, A.; Majumder, A.; Sood, S.K.; Baishnab, K.L. A novel DNA-inspired encryption strategy for concealing cloud storage. Front. Comput. Sci. 2021, 15, 1–18. [Google Scholar] [CrossRef]
  3. Noor, T.H.; Sheng, Q.Z.; Zeadally, S.; Yu, J. Trust management of services in cloud environments: Obstacles and solutions. ACM Comput. Surv. (CSUR) 2013, 46, 1–30. [Google Scholar] [CrossRef]
  4. Mansouri, Y.; Toosi, A.N.; Buyya, R. Data storage management in cloud environments: Taxonomy, survey, and future directions. ACM Comput. Surv. (CSUR) 2017, 50, 1–51. [Google Scholar] [CrossRef]
  5. Khan, A.A.; Zakarya, M. Energy, performance and cost efficient cloud datacentres: A survey. Comput. Sci. Rev. 2021, 40, 100390. [Google Scholar] [CrossRef]
  6. Li, Y.; Yu, Y.; Min, G.; Susilo, W.; Ni, J.; Choo, K.K.R. Fuzzy identity-based data integrity auditing for reliable cloud storage systems. IEEE Trans. Dependable Secur. Comput. 2017, 16, 72–83. [Google Scholar] [CrossRef]
  7. Wei, J.; Chen, X.; Wang, J.; Huang, X.; Susilo, W. Securing fine-grained data sharing and erasure in outsourced storage systems. IEEE Trans. Parallel Distrib. Syst. 2022, 34, 552–566. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Yu, J.; Hao, R.; Wang, C.; Ren, K. Enabling efficient user revocation in identity-based cloud storage auditing for shared big data. IEEE Trans. Dependable Secur. Comput. 2018, 17, 608–619. [Google Scholar] [CrossRef]
  9. Xu, J.; Wang, C.; Jia, X. A survey of blockchain consensus protocols. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
  10. Xiao, Y.; Zhang, N.; Lou, W.; Hou, Y.T. A survey of distributed consensus protocols for blockchain networks. IEEE Commun. Surv. Tutor. 2020, 22, 1432–1465. [Google Scholar] [CrossRef]
  11. Majumdar, S.; Chawla, G.S.; Alimohammadifar, A.; Madi, T.; Jarraya, Y.; Pourzandi, M.; Wang, L.; Debbabi, M. ProSAS: Proactive security auditing system for clouds. IEEE Trans. Dependable Secur. Comput. 2021, 19, 2517–2534. [Google Scholar] [CrossRef]
  12. He, D.; Kumar, N.; Zeadally, S.; Wang, H. Certificateless provable data possession scheme for cloud-based smart grid data management systems. IEEE Trans. Ind. Inform. 2017, 14, 1232–1241. [Google Scholar] [CrossRef]
  13. Miao, Y.; Huang, Q.; Xiao, M.; Susilo, W. Blockchain assisted multi-copy provable data possession with faults localization in multi-cloud storage. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3663–3676. [Google Scholar] [CrossRef]
  14. Gudeme, J.R.; Pasupuleti, S.K.; Kandukuri, R. Certificateless multi-replica public integrity auditing scheme for dynamic shared data in cloud storage. Comput. Secur. 2021, 103, 102176. [Google Scholar] [CrossRef]
  15. Zhao, Y.; Qu, Y.; Xiang, Y.; Uddin, M.P.; Peng, D.; Gao, L. A comprehensive survey on edge data integrity verification: Fundamentals and future trends. ACM Comput. Surv. 2024, 57, 1–34. [Google Scholar] [CrossRef]
  16. Yu, H.; Yang, Z.; Waqas, M.; Tu, S.; Han, Z.; Halim, Z.; Sinnott, R.O.; Parampalli, U. Efficient dynamic multi-replica auditing for the cloud with geographic location. Future Gener. Comput. Syst. 2021, 125, 285–298. [Google Scholar] [CrossRef]
  17. Garg, N.; Bawa, S.; Kumar, N. An efficient data integrity auditing protocol for cloud computing. Future Gener. Comput. Syst. 2020, 109, 306–316. [Google Scholar] [CrossRef]
  18. Zhou, L.; Fu, A.; Mu, Y.; Wang, H.; Yu, S.; Sun, Y. Multicopy provable data possession scheme supporting data dynamics for cloud-based electronic medical record system. Inf. Sci. 2021, 545, 254–276. [Google Scholar] [CrossRef]
  19. Benisi, N.Z.; Aminian, M.; Javadi, B. Blockchain-based decentralized storage networks: A survey. J. Netw. Comput. Appl. 2020, 162, 102656. [Google Scholar] [CrossRef]
  20. Susilo, W.; Li, Y.; Guo, F.; Lai, J.; Wu, G. Public cloud data auditing revisited: Removing the tradeoff between proof size and storage cost. In Proceedings of the European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022; pp. 65–85. [Google Scholar]
  21. Sellami, Y.; Imine, Y.; Gallais, A. A verifiable data integrity scheme for distributed data sharing in fog computing architecture. Future Gener. Comput. Syst. 2024, 150, 64–77. [Google Scholar] [CrossRef]
  22. Armknecht, F.; Barman, L.; Bohli, J.M.; Karame, G.O. Mirror: Enabling proofs of data replication and retrievability in the cloud. In Proceedings of the 25th USENIX Security Symposium (USENIX Security 16), Austin, TX, USA, 10–12 August 2016; pp. 1051–1068. [Google Scholar]
  23. Shen, J.; Chen, X.; Huang, X.; Xiang, Y. Public Proofs of Data Replication and Retrievability with User-friendly Replication. IEEE Trans. Dependable Secur. Comput. 2023, 31, 2057–2067. [Google Scholar] [CrossRef]
  24. Ren, Y.; Leng, Y.; Cheng, Y.; Wang, J. Secure data storage based on blockchain and coding in edge computing. Math. Biosci. Eng 2019, 16, 1874–1892. [Google Scholar] [CrossRef] [PubMed]
  25. Sookhak, M.; Gani, A.; Talebian, H.; Akhunzada, A.; Khan, S.U.; Buyya, R.; Zomaya, A.Y. Remote data auditing in cloud computing environments: A survey, taxonomy, and open issues. ACM Comput. Surv. (CSUR) 2015, 47, 1–34. [Google Scholar] [CrossRef]
  26. Daniel, E.; Tschorsch, F. IPFS and friends: A qualitative comparison of next generation peer-to-peer data networks. IEEE Commun. Surv. Tutor. 2022, 24, 31–52. [Google Scholar] [CrossRef]
  27. Yu, H.; Chen, Y.; Yang, Z.; Chen, Y.; Yu, S. EDCOMA: Enabling Efficient Double Compressed Auditing for Blockchain-Based Decentralized Storage. IEEE Trans. Serv. Comput. 2024, 17, 2273–2286. [Google Scholar] [CrossRef]
  28. Zhou, M.; Yang, Z.; Yu, H.; Yu, S. VDFChain: Secure and verifiable decentralized federated learning via committee-based blockchain. J. Netw. Comput. Appl. 2024, 223, 103814. [Google Scholar] [CrossRef]
  29. Wang, X.; Yu, H.; Chen, Y.; Sinnott, R.O.; Yang, Z. PrVFL: Pruning-Aware Verifiable Federated Learning for Heterogeneous Edge Computing. IEEE Trans. Mob. Comput. 2024, 1–18. [Google Scholar] [CrossRef]
  30. Ren, Y.; Lv, Z.; Xiong, N.N.; Wang, J. HCNCT: A cross-chain interaction scheme for the blockchain-based metaverse. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–23. [Google Scholar] [CrossRef]
  31. Du, Y.; Duan, H.; Zhou, A.; Wang, C.; Au, M.H.; Wang, Q. Enabling secure and efficient decentralized storage auditing with blockchain. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3038–3054. [Google Scholar] [CrossRef]
  32. Li, Y.; Yu, Y.; Chen, R.; Du, X.; Guizani, M. IntegrityChain: Provable data possession for decentralized storage. IEEE J. Sel. Areas Commun. 2020, 38, 1205–1217. [Google Scholar] [CrossRef]
  33. Yang, Y.; Chen, Y.; Chen, F.; Chen, J. An efficient identity-based provable data possession protocol with compressed cloud storage. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1359–1371. [Google Scholar] [CrossRef]
  34. Tian, G.; Hu, Y.; Wei, J.; Liu, Z.; Huang, X.; Chen, X.; Susilo, W. Blockchain-based secure deduplication and shared auditing in decentralized storage. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3941–3954. [Google Scholar] [CrossRef]
  35. Ren, Y.; Leng, Y.; Qi, J.; Sharma, P.K.; Wang, J.; Almakhadmeh, Z.; Tolba, A. Multiple cloud storage mechanism based on blockchain in smart homes. Future Gener. Comput. Syst. 2021, 115, 304–313. [Google Scholar] [CrossRef]
  36. Tang, J.; Cui, Y.; Li, Q.; Ren, K.; Liu, J.; Buyya, R. Ensuring security and privacy preservation for cloud data services. ACM Comput. Surv. (CSUR) 2016, 49, 1–39. [Google Scholar] [CrossRef]
  37. Sun, L.; Wang, Y.; Ren, Y.; Xia, F. Path signature-based xai-enabled network time series classification. Sci. China Inf. Sci. 2024, 67, 170305. [Google Scholar] [CrossRef]
  38. Barsoum, A.F.; Hasan, M.A. Provable multicopy dynamic data possession in cloud computing systems. IEEE Trans. Inf. Forensics Secur. 2014, 10, 485–497. [Google Scholar] [CrossRef]
  39. Guo, W.; Qin, S.; Lu, J.; Gao, F.; Jin, Z.; Wen, Q. Improved proofs of retrievability and replication for data availability in cloud storage. Comput. J. 2020, 63, 1216–1230. [Google Scholar] [CrossRef]
  40. Zhang, C.; Li, X.; Au, M.H. epost: Practical and client-friendly proof of storage-time. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1052–1063. [Google Scholar] [CrossRef]
  41. Boneh, D.; Bonneau, J.; Bünz, B.; Fisch, B. Verifiable delay functions. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 9–23 August 2018; pp. 757–788. [Google Scholar]
  42. Liu, Y.; Wang, Q.; Yiu, S.M. Towards practical homomorphic time-lock puzzles: Applicability and verifiability. In Proceedings of the European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022; pp. 424–443. [Google Scholar]
  43. Katz, J.; Loss, J.; Xu, J. On the security of time-lock puzzles and timed commitments. In Proceedings of the Theory of Cryptography: 18th International Conference, TCC 2020, Durham, NC, USA, 16–19 November 2020; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2020; pp. 390–413. [Google Scholar]
  44. Boneh, D.; Bünz, B.; Fisch, B. Batching techniques for accumulators with applications to IOPs and stateless blockchains. In Proceedings of the Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2019; Proceedings, Part I 39. Springer: Berlin/Heidelberg, Germany, 2019; pp. 561–586. [Google Scholar]
  45. Gritti, C. Publicly verifiable proofs of data replication and retrievability for cloud storage. In Proceedings of the 2020 International Computer Symposium (ICS), Tainan, Taiwan, 17–19 December 2020; pp. 431–436. [Google Scholar]
Figure 1. System model of public authentic-replica sampling mechansim.
Figure 1. System model of public authentic-replica sampling mechansim.
Electronics 13 04167 g001
Figure 2. The time cost of copy parameters generation (a) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23] and time cost of tags generation (b) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23], P-PoRR [45].
Figure 2. The time cost of copy parameters generation (a) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23] and time cost of tags generation (b) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23], P-PoRR [45].
Electronics 13 04167 g002
Figure 3. The time cost of copy parameters generation with uniformly distributed Ref-Table (a) and random Ref-Table (b) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23].
Figure 3. The time cost of copy parameters generation with uniformly distributed Ref-Table (a) and random Ref-Table (b) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23].
Electronics 13 04167 g003
Figure 4. Time cost of proofs generation (a) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23] and time cost of verify time (b) with different numbers of challenged blocks between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23], P-PoRR [45].
Figure 4. Time cost of proofs generation (a) between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23] and time cost of verify time (b) with different numbers of challenged blocks between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23], P-PoRR [45].
Electronics 13 04167 g004
Figure 5. The total time with between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23].
Figure 5. The total time with between our scheme and Mirror [22], IMP-Mirror [39], Friendly-PoRR [23].
Electronics 13 04167 g005
Table 1. Notations.
Table 1. Notations.
NotationDescription
λ Security parameter
kSafety factor
G 1 Multiplicative cyclic group
G T Multiplicative cyclic group
pMultiplicative cyclic group G 1 of order
eBilinear pair
S l Cloud server
S l R e p Collection of replicas stored in S l
fPseudorandom sequence
φ Pseudorandom function
π Pseudorandom sequence
H i Hash function
CClient
N Total number of replicas stored on all cloud servers
N R S A ’s model
T Ref-Table
m i Coded data block
σ i The tag corresponding to the m i data block
m i S l , j The j-th replica about data block m i is stored in the S l
i β ˙ Challenged data blocks
j ε Challenged replicas
C 1 Collection of challenged data blocks
C h a l Challenged collections
R S l ˜ Proof of replica generated by S l
M S l ˜ Proof of data generated by S l
F Original encoding file
C S l S l Generated ciphertext
nNumber of blocks in the original coded file cut
n a m e Unique identifier for source code files
N S l Unique identifier for S l
Table 2. Comparison of properties.
Table 2. Comparison of properties.
ProtocolsArmknecht et al. [22]Guo et al. [39]Shen et al. [39]Our
Storage allocation××
Efficient preparation for replication××
Public verifiability××
Verification timehighhighlowlow
Disaster recovery capabilitylowlowlowhigh
Resist generation attacks, outsourcing attacks×××
Support multi-cloud×××
Table 3. Comparison of communication.
Table 3. Comparison of communication.
SchemeArmknecht et al. [22]Guo et al. [39]Shen et al. [23]Ours
Copy parameters N λ * ( | q 0 | + | Z q 1 | + 2 λ * | Z | ) N λ * ( | q 0 | + | Z q 1 | + 2 λ * | Z | ) | Z p | + | Z | | Z p | + | Z |
Data uploading 2 n [ k ] | Z N * | 2 n [ k ] | Z N * | 2 [ k ] ( n | Z N * | + | Z q 0 q 1 | ) ( ( 2 n + 1 ) k | Z N * | + 2 k | Z q 0 q 1 |
Proof responding 2 | Z N * | 2 | Z N * | 5 | Z N * | 6 | Z N * |
Table 4. Comparsion of computation.
Table 4. Comparsion of computation.
SchemeArmknecht et al. [22]Guo et al. [39]Our
Copy parameter generation 2 N ( λ ( λ * λ ) M + λ * P ) 2 N ( λ ( λ * λ ) M + λ * P ) 0
Tag generation n P n ( 2 P + M + R ) 2 n P + n M + V
Replica generation 2 N λ * ( ( n λ * ) ( P + M ) + M ) 2 t λ * ( ( n λ * ) ( P + M ) + M ) N ( ( n + 1 ) P + n ( 1 + δ ) M + | C 1 | )
Proof generation 2 | C 1 | | P + ( 2 + | C 2 | ) | C 1 | M 2 | C 1 | | P + ( 2 + | C 2 | ) | C 1 | M 3 | C 1 | P + ( 3 + | C 2 | ) | C 1 | M
Proof Verification | C 1 | ( | C 2 | + 1 ) P + ( | C 1 | + | C 2 | + | C 1 | | C 2 | + λ n ) M | C 1 | ( | C 2 | + 1 ) P + ( | C 1 | + | C 2 | + | C 1 | | C 2 | + λ n ) M + | C 1 | ( M + R ) ( | C 1 | + 3 | C 2 | ) P + ( | C 2 | + ( δ + 1 ) | C 1 | ) M
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, J.; Bai, Y.; Xu, J.; Huang, S.; Han, Z.; Wan, W. Public Authentic-Replica Sampling Mechanism in Distributed Storage Environments. Electronics 2024, 13, 4167. https://doi.org/10.3390/electronics13214167

AMA Style

Ye J, Bai Y, Xu J, Huang S, Han Z, Wan W. Public Authentic-Replica Sampling Mechanism in Distributed Storage Environments. Electronics. 2024; 13(21):4167. https://doi.org/10.3390/electronics13214167

Chicago/Turabian Style

Ye, Jiale, Yongmei Bai, Jiang Xu, Shitao Huang, Zhaoyang Han, and Wei Wan. 2024. "Public Authentic-Replica Sampling Mechanism in Distributed Storage Environments" Electronics 13, no. 21: 4167. https://doi.org/10.3390/electronics13214167

APA Style

Ye, J., Bai, Y., Xu, J., Huang, S., Han, Z., & Wan, W. (2024). Public Authentic-Replica Sampling Mechanism in Distributed Storage Environments. Electronics, 13(21), 4167. https://doi.org/10.3390/electronics13214167

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop