Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication

Gan, Chuang; Wang, Weichun; Hu, Yuchong; Zhao, Xin; Dun, Shi; Xiao, Qixiang; Wang, Wei; Huang, Huadong

doi:10.3390/app15031245

Open AccessArticle

Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication

by

Chuang Gan

^1,†,

Weichun Wang

^2,†,

Yuchong Hu

^1,*,

Xin Zhao

¹,

Shi Dun

¹

,

Qixiang Xiao

¹,

Wei Wang

² and

Huadong Huang

²

¹

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

²

Hikvision Digital Technology Co., Ltd., Hangzhou 310052, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(3), 1245; https://doi.org/10.3390/app15031245

Submission received: 9 November 2024 / Revised: 28 December 2024 / Accepted: 30 December 2024 / Published: 26 January 2025

(This article belongs to the Special Issue Application of Deep Learning and Big Data Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Outsourcing storage to the cloud can save storage costs and is commonly used in businesses. It should fulfill two major goals: storage efficiency and data confidentiality. Encrypted deduplication can achieve both goals via performing deduplication to eliminate the duplicate data within encrypted data. Traditional encrypted deduplication generates the encryption key on the client side, which poses a risk of offline brute-force cracking of the outsourced data. Server-aided encryption schemes have been proposed to strengthen the confidentiality of encrypted deduplication by distributing the encryption process to dedicated servers. Existing schemes rely on expensive cryptographic primitives to provide a decentralized setting on the dedicated servers for scalability. However, this incurs substantial performance slowdown and can not be applied in practical encrypted deduplication storage systems. In this paper, we propose a new decentralized server-aided encrypted deduplication approach for outsourced storage, called ECDedup, which leverages secret sharing to achieve secure and efficient key management. We are the first to use the coding matrix as the encryption key to couple the encryption and encoding processes in encrypted deduplication. We also propose a acceleration scheme to speed up the encryption process of our ECDedup. We prototype ECDedup in cloud environments, and our experimental results based on the real-world backup datasets show that ECDedup can improve the client throughput by up to 51.9% compared to the state-of-the-art encrypted deduplication schemes.

Keywords:

encrypted deduplication; secret sharing; server-aided encryption; message-locked encryption; distributed data management

1. Introduction

Outsourcing storage management to the cloud is common for businesses to save self-managing overheads of their massive data, and two major goals, security and storage efficiency, must be satisfied for practical outsourced storage. The outsourced data must be protected from unauthorized access via malicious clients or even the cloud providers, and the storage overhead of the outsourced data needs to be reduced to save outsourcing costs.

To satisfy the above goals, encrypted deduplication that performs deduplication on encrypted data has been widely deployed in outsourced storage. Original data are often organized as chunks in deduplication and duplicate chunks are eliminated to save storage space. For security, encrypted deduplication adds an encryption layer atop deduplication to encrypt the original plaintext chunks (from the same or different clients) into ciphertext chunks based on their content-derived keys (i.e., the cryptographic hash of their content). Thus, duplicate plaintext chunks can be encrypted using an identical key generated from their identical content, which ensures that their corresponding ciphertext chunks are also identical and thus can be deduplicated for storage efficiency.

Although the content-derived key can be used to encrypt plaintext chunks to preserve deduplication capability on their ciphertext chunks, the security of the encryption remains vulnerable to offline brute-force attacks, as an adversary can enumerate all possible plaintext chunks to match the target ciphertext chunk. Existing encrypted deduplication approaches [1,2,3,4,5,6] often employ server-aided encryption to deploy a dedicated key server to manage key generation requests from clients. The encryption key of each chunk is derived based on both the secret key of the key server and the content-based key of the chunk itself, such that an adversary cannot feasibly derive the key without knowing the secret key.

However, previous server-aided encryption schemes presume the centralized key-server-based setting, where only one single secret key (the secret key is held by a single key server [1] or shared by a group of key servers [7]) is used to managing all encryption keys, such that deduplication is only applicable over ciphertext chunks encrypted by the same secret key, and all key generation requests are centralized to the single key server (or the group of key servers). This would lead to scalability and security issues for a cloud storage service [8]. To address this issue, a new class of decentralized server-aided key management scheme is proposed to improve scalability via enabling multiple key servers with different secret keys to collectively manage encryption keys. Identical plaintext chunks are encrypted into different ciphertext chunks using distinct secret keys from multiple key servers. Then, it uses a complicated cryptographic strategy (e.g., computational Diffie–Hellman coupled with blind signature [8]) to achieve key exchange, such that these different ciphertext chunks encrypted from the identical plaintext chunks can be deduplicated in the cloud servers. However, the cryptographic strategy incurs a significant computational overhead (see Section 2.3), making it not applicable to the chunk-level deduplication, which is the current mainstream deduplication method [9].

Fortunately, we find that secret sharing, which is often used in the encrypted deduplication storage systems to share data among multiple cloud servers, can also be leveraged to share different secret keys among multiple key servers, so as to achieve efficient key management in decentralized server-aided encryption. Specifically, secret sharing first encrypts the data chunk and encodes the chunk via a specific coding matrix into multiple small-size shares (see Section 2.2), then distributes the shares to multiple cloud servers. Based on the secret sharing, we can share the chunk’s encryption key to multiple key servers to collectively generate the server-aided encryption key. However, sharing the key incurs additional encryption and encoding overheads from secret sharing, which will impair the performance of the entire system. Therefore, it is still challenging to efficiently apply secret sharing to decentralized server-aided encryption.

In this paper, we propose a new decentralized server-aided encrypted deduplication storage system, called ECDedup, which couples secret sharing with decentralized server-aided encryption to achieve the encryption and encoding of both keys and data chunks in a single process, thereby enabling efficient decentralized server-aided encrypted deduplication. The main contributions of this work include:

We are the first to leverage secret sharing to realize decentralized server-aided encryption in the encrypted deduplication storage systems, where we first use the coding matrix instead of the key to couple the encryption process with the encoding process, so as to achieve the secret sharing of both keys and data chunks in one encoding process (Section 2).
We propose ECDedup, which realizes efficient decentralized encrypted deduplication via sharing the coding matrix to multiple key servers, and improves the matrix generation performance via two optimization methods (Section 3).
We implement the ECDedup prototype and we also deploy a blockchain integration to further strengthen the security of our prototype (Section 4). Our experiments show that ECDedup improves the client throughput by up to 51.9% atop existing encrypted deduplication methods (Section 5).

2. Background and Motivation

2.1. Encrypted Deduplication

Basics: Deduplication is a data reduction technique that eliminates duplicate data in storage systems. In this paper, we mainly focus on chunk-based deduplication, which generally splits data files into variable-size (an average of 8 KB [9,10,11] for deduplication efficiency) chunks (called chunking) based on the boundary points detected by the Rabin algorithm (a rolling hash algorithm [12]). Each chunk is uniquely identified by a hash fingerprint (called fingerprinting) computed using a cryptographic hash function (e.g., SHA-1, SHA-256). Then, the fingerprints are indexed and only one physical copy of the duplicate data chunks is stored (called indexing), so as to save storage space.

Encrypted deduplication extends traditional data deduplication with encryption to provide both data confidentiality and storage efficiency for outsourced storage systems. Specifically, the client first performs chunking to divide a data file into multiple plaintext chunks. Then, the client encrypts each plaintext chunk with the encryption key and stores all ciphertext chunks in the cloud. Deduplication is then performed in the cloud to eliminate the duplicate ciphertext chunks. For file reconstruction, the client creates a file recipe that lists the fingerprints, sizes, and keys of the ciphertext chunks. The file recipe is encrypted by the master secret key of the client itself and the encrypted file recipe is stored in the cloud.

Encrypted deduplication uses Message-Locked Encryption (MLE) to derive the symmetric secret key (i.e., the same key is used for both the encryption and decryption) based on the chunk’s content. For example, in MLE, a typical instantiation Convergent Encryption (CE) [13] uses the cryptographic hash of a plaintext chunk as the MLE key, such that the identical chunks will have the same encryption key, resulting in identical ciphertext chunks that can be deduplicated after MLE encryption.

Server-aided MLE: However, CE is vulnerable to offline brute-force attacks since its MLE key can be publicly derived [14], and an adversary can infer the input plaintext chunk from a target ciphertext chunk without knowing the MLE key. Specifically, the adversary can enumerate all possible plaintext chunks and check if any resulting ciphertext chunk of those plaintext chunks matches the target ciphertext chunk. Server-aided MLE [1,2,4,5], which deploys a dedicated key server for MLE key generation, enhances the security of encrypted deduplication against offline brute-force attacks. When the client encrypts a plaintext chunk, a fingerprint of the plaintext chunk is first sent to the key server. The key server then generates and returns the corresponding MLE key based on this fingerprint and a global key maintained by the key server. If this global key is secure, the adversary cannot feasibly launch offline brute-force attacks. Otherwise, the security will degrade to ordinary MLE.

2.2. Secret Sharing

Secret sharing aims to provide both fault tolerance and confidentiality guarantees for the shared data via transforming the data input (called secret) into a set of coded outputs (called shares). A secret-sharing algorithm can be defined based on three configurable parameters, namely n, k, and r (where

n > k > r \geq 0

) [14], wherein a

(n, k, r)

secret-sharing algorithm encodes a secret into n shares and the secret can be reconstructed from any k shares, while any r shares cannot be inferred (even partially) from any r shares.

AONT-RS [15] is a secret-sharing algorithm that achieves security with reduced storage space. It combines Rivest’s all-or-nothing transform (AONT) [16] for confidentiality and Reed–Solomon (RS) codes [17] for fault tolerance. Specifically, AONT-RS first encrypts the secret into an AONT package with a random key. The package is then divided into k equal-size blocks which are encoded into n blocks (i.e., n shares) using systematic RS codes (meaning that k original blocks are kept in n blocks). In this way, the adversary cannot infer anything about the AONT package without obtaining the whole package.

Many state-of-the-art encrypted deduplication schemes [2,4,5] are based on CDStore [14], which deploys CAONT-RS, using a deterministic rather than a random key to enable deduplication atop AONT-RS. CAONT-RS generates the deterministic key based on the content of the input plaintext chunk, such that identical input plaintext chunks can generate the same ciphertext chunk; then, the same ciphertext chunks can be deduplicated and only one ciphertext chunk is stored, so as to save storage costs.

2.3. Limitations

Previous server-aided MLE implementations typically rely on a centralized key server that stores only a single secret key [2,4,5]. This design makes the key server a bottleneck in the encrypted deduplication storage system. Consequently, a tenant may prefer to deploy multiple key servers, each maintaining the same secret key, in order to scale up the system effectively. However, sharing a secret key among multiple key servers poses a serious security risk; if the secret key from one key server is compromised, it could lead to the breach of the entire system. Thus, decentralized server-aided MLE is proposed to benefit from deduplication while guaranteeing the scalability and confidentiality of the outsourced data.

However, the cryptographic strategy (e.g., computational Diffie–Hellman (CDH) problem and blind signature) applied in the state-of-the-art decentralized server-aided MLE [8] incurs high additional computation overhead for encrypting a small chunk. For example, it only takes about 0.02 ms to compute the hash key of an 8 KB chunk, while the CDH and blind signature will take 8.41 ms and 92.45 ms, respectively [8]. Recall from Section 2.1 that deduplication storage systems usually configure 8 KB as the average chunk size. Therefore, the additional computation overhead will significantly degrade the deduplication efficiency.

2.4. Motivation and Challenges

In this work, our main goal is to design a efficient decentralized server-aided MLE scheme that can be applied to small-size chunks (e.g., 8 KB) in encrypted deduplication storage systems.

Note that the secret-sharing algorithm can be leveraged to share data among multiple nodes (Section 2.2). Thus, it is natural to ask whether we can apply secret sharing with decentralized server-aided MLE to share a single MLE key among multiple key servers to allow multiple key servers with different secret keys to encrypt one MLE key collaboratively without any single server knowing the entire MLE key.

However, sharing the MLE key of the plaintext chunk incurs independently of the encryption and encoding processes in addition to the original encryption and encoding on the plaintext chunks, which leads to high computation overhead. Fortunately, non-systematic codes [18] can be leveraged to couple the encryption and encoding processes together. Unlike systematic codes, non-systematic codes only store the parity chunks encoded from plaintext data chunks, where the parity chunks cannot be inferred without the entire coding matrix [18], meaning that the parity chunks naturally maintain data confidentiality. This motivates us to encode plaintext chunks into parity chunks using non-systematic codes, such that we only need to perform one encoding process to enable both the encryption and secret-sharing processes simultaneously, while the coding matrix can naturally be decomposed and shared.

2.5. Threat Model

Adversarial capabilities: We consider the cloud servers and the key servers as honest-but-curious, which means that they will honestly follow the storage protocol and execute the assigned tasks in the system. However, they would like to learn as much information as possible about the encrypted outsourced data via inferring the plaintext data from the ciphertext data stored in the cloud. Moreover, some clients are also curious and would try to access unauthorized data out of the scope of their privileges. Specifically, there are two types of adversaries considered in the system:
- An inside adversary refers to a cloud storage service provider (owns the cloud servers) or any of the key servers. The inside adversary is assumed to be honest-but-curious and its goal is to infer the plaintext data from the outsourced ciphertext data it possesses to obtain useful information.
- An outside adversary refers to any external entity that attempts to gain unauthorized access to the data stored in the cloud. The outside adversary plays the role of a client, interacting with cloud servers and key servers.
Assumptions: Our threat model makes the following assumptions: (i) All communications among the clients, the key server, and the cloud server are protected via SSL/TLS against tampering. (ii) The key manager rate-limits the client’s key generation requests to defend against online brute-force attacks [1] (see Section 2.1). (iii) To ensure data availability, we deploy remote auditing [19] for data integrity and deduplication-aware secret sharing [14] across multiple storage sites for fault tolerance.

3. Design

We present ECDedup, employing secret sharing in a decentralized server-aided encrypted deduplication storage system with the following design goals:

Enabling decentralized server-aided encryption: ECDedup first realizes coding-based secret sharing, which replaces the encryption key with a coding matrix to achieve the decentralized server-aided encryption, where the data chunks can be encoded into the coded chunks to achieve data confidentiality, while each key server only obtains a vector of the coding matrix so that it cannot restore the full matrix (Section 3.2).
Preserving deduplication efficiency: ECDedup designs a content-based coding matrix to generate an identical coding matrix for the identical data chunk, so as to generate an identical coded chunk and preserve the deduplication capability (Section 3.3).
Efficient and scalable matrix generation: We design an element-based matrix generation pipeline to reduce the computation and bandwidth overhead of the matrix generation to improve generation efficiency. We further design a rolling-based matrix generation scheme to enhance the scalability of the matrix (Section 3.4).

3.1. Architecture

We first introduce the architecture of ECDedup, which aims at building a decentralized secure deduplicated storage system, which is commonly deployed in outsourced storage, particularly for backup workloads with high content redundancy. The idea behind ECDedup’s design is also applicable for centralized secure deduplication.

As shown in Figure 1, ECDedup mainly comprises: (i) multiple clients that own data and wish to outsource their data into the cloud storage for backup or file-sharing; (ii) multiple decentralized key servers, providing the interface for the clients to access coding matrix for encrypting clients’ data; and (iii) multiple servers that offer cloud storage services, including data deduplication. Each client first divides its data into variable-size data chunks by defining boundaries based on content, then generates the hash value of each chunk and sends it to key servers to generate the corresponding coding matrix. The coding matrix is collaboratively generated by multiple key servers based on the following two designs: (i) the coding-based secret-sharing scheme to ensure the security of the decentralized matrix generation (see Section 3.2), and (ii) the content-based coding matrix, which enables deduplication on the coded chunks for storage efficiency (see Section 3.3). The client then encodes the data chunks into the coded chunks using the coding matrix, and the coded chunks will be distributed to multiple cloud servers for further deduplication.

ECDedup benefits both performance and security. It avoids the expensive cryptographic primitives (see Section 2.3) of decentralized server-aided encryption and protects the coding matrix via multipartite matrix generation, such that each key server can obtain a vector of the coding matrix to complete part of the computation, without disclosing the entire coding matrix.

3.2. Coding-Based Secret Sharing

Design idea: Our idea is that the coding matrix of the non-systematic codes can be leveraged to encrypt the plaintext chunks, such that we can achieve both encryption and encoding in a single encoding process (see Section 2.4). We also find that the coding matrix can naturally be decomposed into multiple vectors, allowing the encryption key (in this case the key is the coding matrix) to be shared, such that the server-aided encryption can be divided into multiple small encryption tasks which can be assigned to multiple key servers for parallel computation. This ensures that each key server only accesses the part of the matrix necessary for its computation.
Sharing the coding matrix among key servers Based on the idea illustrated above, we design our coding-based secret-sharing scheme. We first decompose the coding matrix into multiple vectors and then transfer each vector to each key server, respectively. The number of vectors generated from the coding matrix is based on the number of key servers.

To protect the coding matrix from brute-force attacks, each key server leverages server-aided encryption to encrypt the received vector. We hard-code a unique secret for each server in the same way as state-of-the-art server-aided encryption [2,3,4,5]. In this way, the coding matrix can be generated from the vectors encrypted based on different secrets stored in multiple key servers. This ensures the security of the coding matrix. Note that only with the coding matrix can the client decode the coded chunks into the original data chunks (see Section 2.4). Thus, the confidentiality of the outsourced data chunks can be guaranteed. If the secrets of all key server are simultaneously compromised, the security of ECDedup reduces to that of the original MLE (the same as the state-of-the-art server-aided encryption schemes [5]).

Now we also need to carefully design the coding matrix, such that we can generate the same coding matrix for the identical data chunk to preserve deduplication capability. Here, we predefine an original matrix for the data chunk (the original matrix can be generated in various ways, as long as it ensures the deduplication of the chunk after encoding; we introduced a method of generation in Section 3.3). In ECDedup, the original matrix is divided into n vectors (n is the number of key server involved in this encoding).

Figure 2 shows an example of our coding-based secret-sharing scheme with the coding parameters of

(n, k) = (4, 3)

, which means that the number of key servers is 4 and the data chunk is divided into 3 shares (the same as CAONT-RS; see Section 2.2). Specifically, we outline the coding matrix into four steps: (Step 1) The client first generates the original matrix based on the content of plaintext chunks. (Step 2) Then, the original matrix is divided into

n = 4

vectors

V_{1}, V_{2}, V_{3}

and

V_{4}

, and each vector is sent to a different key server. (Step 3) Each key server generates a new vector based on the received vector and its own secret (e.g.,

V_{1}^{'} = V_{1} ⨁ S e c r e t 1

, where ⨁ denotes the encryption operator). (Step 4) The client receives the returned vector and generates the coding matrix. The coding matrix consists of 4 new vectors (i.e.,

V_{1}^{'}, V_{2}^{'}, V_{3}^{'}

and

V_{4}^{'}

) generated by 4 key servers individually. To maintain decent performance, vectors from different matrices are packed into a batch and then sent to the corresponding key server.

3.3. Content-Based Coding Matrix

After we share the coding matrix among key servers to enable the decentralized server-aided encryption, we need to consider how to design the content-based coding matrix to generate the identical coding matrix for the identical data chunk, so as to achieve storage efficiency.

Insight: We observe that the Cauchy matrix, which is used in RS codes to encode data chunks, can be generated in a customized manner. Thus, we can generate the coding matrix based on the Cauchy matrix to obtain the identical coding matrix for the identical data chunk.

Equation (1) shows the definition of the Cauchy matrix. We consider a

(n, k)

RS code, and the corresponding Cauchy matrix used for encoding in the

(n, k)

RS code is an

n \times k

matrix. Let

X = {x_{1}, x_{2}, \dots, x_{n}}

,

Y = {y_{1}, y_{2}, \dots, y_{k}}

be two disjointed sets of elements of Galois Field

G F (2^{w})

[20] (i.e., the size of each element is w bits,

w = 8

or 16) such that: (i)

\forall i \in {1, 2, \dots, n}, j \in {1, 2, \dots, k}

,

x_{i} + y_{j} \neq 0

; (ii)

\forall i, j \in {1, 2, \dots, n}

,

x_{i} \neq x_{j} (i \neq j)

and (iii)

\forall i, j \in {1, 2, \dots, k}

,

y_{i} \neq y_{j} (i \neq j)

. Then, the Cauchy matrix G can be defined as:

G = (\begin{matrix} \frac{1}{x_{1} + y_{1}} & \frac{1}{x_{1} + y_{2}} & \dots & \frac{1}{x_{1} + y_{k}} \\ \frac{1}{x_{2} + y_{1}} & \frac{1}{x_{2} + y_{2}} & \dots & \frac{1}{x_{2} + y_{k}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{1}{x_{n} + y_{1}} & \frac{1}{x_{n} + y_{1}} & \dots & \frac{1}{x_{n} + y_{k}} \end{matrix}) .

(1)

Design idea: Based on the above observation, we propose a content-based coding matrix generation scheme to generate the identical coding matrix for the identical chunk, so as to preserve deduplication capability while encoding original data chunk in ECDedup. Our idea is that the Cauchy matrix leveraged in the encoding can be deterministically generated based on the chunk’s hash (which is derived based on the chunk’s content). For each chunk, ECDedup first generates a hash h using an (optionally salted) hash function H (e.g., SHA-256); then, we generate the elements $x_{i}$ and $y_{j}$ of the Cauchy matrix based on the hash h.
Fixed-size sampling: To efficiently generate the coding matrix based on the hash value, we directly sample every w bits from the begining to the end of the hash value h to generate each element in two sets, X and Y, of the Cauchy matrix.

For example, for generating a

n \times k

coding matrix, we choose

n + k

elements sampled from h to generate

{x_{1}, x_{2}, \dots, x_{n}}

and

{y_{1}, y_{2}, \dots, y_{k}}

. Figure 3 shows an example of sampling elements in Galois Field

G F (2^{8})

on a 256-bit hash value generated using SHA-256. Note that the elements must satisfy the definition of the Cauchy matrix above. If not, we will continue sampling elements backwards until they satisfy the definition.

Server-aided encryption: We then share the original content-based coding matrix with the key servers via the coding-based secret-sharing scheme (see Section 3.2). For each key server, it receives a vector $V_{i}$ from the client and calculates the corresponding vector $V_{i}^{'}$ based on its secret:

$V_{i}^{'} = E (H a s h (S e c r e t_{i} ∥ V_{i}), c)$

(2)

where $H a s h$ is a cryptographic hash function (e.g., SHA-256), and E is an encryption function (e.g., AES-256) that encrypts a constant-value string c (with the same size as $V_{i}$ ) using hash as the encryption key to align the result to the size of the vector. The coding matrix is then generated based on the vectors $V_{i}^{'}$ received from all key servers (see Section 3.2). The client can then encode the data chunks into the coded chunks using the coding matrix and send the coded chunks to the cloud servers for storage.

Invertibility of generated matrices: We need to ensure that the generated Cauchy matrix is invertible, which is a necessary condition for the RS code to be able to decode the coded chunks back to the original data chunks. We can ensure the invertibility of the generated matrix by checking the determinant of the matrix. If the determinant is not zero, then the matrix is invertible. If the current matrix is not invertible, we perform an elementary transformation and then add 1 to the elements in the pivot positions (diagonal positions) that are 0 to make the matrix invertible. Algorithm 1 details the steps of handling the matrix to be full ranked. Note that only the determinants of the square matrices can be calculated, which means that this generation scheme can only generate the

k \times k

square matrices. This enables the sharing of the matrix to achieve decentralized server-aided encryption, yet it can not generate additional

n - k

parity chunks to achieve fault tolerance. We will discuss how to optimize it and generate n parity chunks in Section 3.4.

Algorithm 1 Handle Full Rank

Input: original matrix M

Output: full rank matrix M’

1:: procedure HandleFullRank(M)
2:: $r a n k, f l a g \leftarrow$ CheckFullRank(M)
3:: while $f l a g = = f a l s e$ do
4:: for $i \leftarrow 0$ to $l e n (M) - 2$ do
5:: $M [(r a n k + i) mod l e n (M)] [(r a n k + i) mod l e n (M)] \leftarrow M [(r a n k + i) mod l e n (M)] [(r a n k + i) mod l e n (M)] + i + 1$
6:: end for
7:: $r a n k, f l a g \leftarrow$ CheckFullRank(M)
8:: end while
9:: return M’
10:: end procedure
11:: function CheckFullRank(M)
12:: $r a n k \leftarrow$ the rank of M
13:: if M is not full rank then
14:: return $r a n k, false$
15:: else
16:: return $- 1, true$
17:: end if
18:: end function

3.4. Optimizing Matrix Generation

While ECDedup achieves data security and storage efficiency via server-aided encryption based on the coding-based secret sharing (see Section 3.2) and the content-based coding matrix (see Section 3.3), there remain two issues that need to be optimized: (i) decentralized server-aided matrix generation may make the generated coding matrix not invertible; and (ii) poor scalability for the matrix generation. To address these issues, we design the following two methods:

Method 1: Element-based matrix generation: As stated in Section 3.3, the elements $x_{i}$ and $y_{j}$ of the coding matrix can be generated from the content-based hash of the data chunk. However, the matrix after encryption cannot be guaranteed to be invertible, such that it requires additional calculations to ensure its invertibility (see Section 3.3). Thus, we propose an element-based matrix generation to couple the server-aided encryption and the generation of elements, and we further accelerate the matrix generation process via pipelining the generation of the elements.

Specifically, the element-based matrix generation has three steps: (Step 1) The client first generates the set of elements

{y_{i}}

based on the hash value of the data chunk (see Section 3.3). (Step 2) The client then shares

{y_{i}}

among the key servers and each key server generates the elements

y_{i}^{'}

based on the received element and the key server’s secret. (Step 3) The client receives

y_{i}^{'}

from all key servers and generates the set of elements

{x_{i}}

based on the hash value of the data chunk and

{y_{i}^{'}}

. This makes

{x_{i}}

and

{y_{i}^{'}}

automatically satisfy the definition of Cauchy matrix, which ensures that the generated matrix is invertible. (Step 4) The client finally generates the coding matrix based on the elements

{x_{i}}

and

{y_{i}^{'}}

Figure 4 shows the difference between original matrix generation and element-based matrix generation. In the original matrix generation, the client first generates the set of elements

{x_{i}}

and

{y_{i}}

based on the hash value of the data chunk. Then, the client generates the original coding matrix using

{x_{i}}

and

{y_{i}}

and share the original matrix among multiple key servers to generate the encrypted coding matrix. However, in the element-based matrix generation, the client only needs to send

{y_{i}}

to the key servers, which reduces the communication overhead. Moreover, the matrix directly generated by

{x_{i}}

and

{y_{i}}

in the element-based matrix generation can benefit from the acceleration schemes that are hard-coded in the SIMD computation libraries [21] for RS codes, while the matrix handled by Algorithm 1 cannot be accelerated during encoding or decoding.

We can further improve the efficiency of the matrix generation by pipelining the matrix generation of multiple data chunks. Figure 5 shows the pipeline of matrix generation and server-aided encryption, where

C_{1}, C_{2}, \dots, C_{n}

are the data chunks to be encoded. ECDedup divides the matrix generation process into two stages: (i) generate

{y_{i}}

and (ii) generate

{x_{i}}

. Since both of the above two stages and the server-aided encryption can be performed chunk-by-chunk, they can be also pipelined in a chunk-by-chunk manner. For example, we can generate the elements

{y_{i}}

of the chunk

C_{1}

. When

{y_{i}}

of

C_{1}

is sent to the key servers for encryption, the client can generate

{y_{i}}

of

C_{2}

in parallel.

Method 2: Rolling-based matrix generation: To support large $(n, k)$ values, which are often deployed in storage systems to save storage costs [22], we design the rolling-based matrix generation scheme. We leverage a fixed-size sliding window to generate the elements $x_{i}$ and $y_{j}$ based on the hash value of the data chunk. The sliding window moves 1 bits each time, and we record the w bits as an element (as in Section 3.3).

Figure 6 shows how ECDedup generates

x_{i}

and

y_{j}

via a 8 bit sliding window as follows: for each bit moved by the sliding window, ECDedup records 8 bits as an element and checks if it satisfies the definition, along with previously recorded elements. If it satisfies the definition, then this element is saved as the corresponding

x_{i}

or

y_{j}

.

3.5. Discussion on Security

Unlike the state-of-the-art decentralized server-aided encryption schemes, ECDedup does not change the underline encryption logic of the centralized server-aided encryption schemes. We only deploy a coding-based secret-sharing scheme atop the original encryption logic to share the coding matrix among multiple key servers to enable decentralized server-aided encryption, while the coding matrix is generated based on the content-based hash of the data chunk and the secrets within the key servers (same as the state-of-the-art centralizd server-aided encryption schemes [2,3,4,5]). Thus, the security of ECDedup is comparable with previous methods.

Specifically, we analyze the security of our ECDedup in comparison with that of the state-of-the-art encrypted deduplication schemes from the following two aspects:

Security of encryption key: Similar to the centralized encrypted deduplication schemes, ECDedup calculates the cryptographic hash of the chunk’s content and sends it to the key server to derive the encryption key. With the help of the key server, ECDedup can defend against offline brute-force attacks. The difference between ECDedup and the centralized schemes lies on the client side, where ECDedup generates the corresponding coding matrix based on the cryptographic hash and shares the coding matrix among multiple key servers. In this way, only part of the coding information (i.e., vectors or elements) of the coding matrix is sent to each key server, such that a single key server cannot infer the complete encryption key. Meanwhile, the information sent to each key server is actually the short hash, which is more likely to incur many hash collision (i.e., multiple chunks are mapped to the same short hashes). Thus, the key server cannot readily guess a chunk from the short hashes [1,4].

In the centralized encrypted deduplication schemes, the security will be reduced to that of the original MLE when the centralized key server is compromised. Thus, compared with the centralized schemes, our ECDedup can provide a stronger security guarantee, since the compromise of a single or several key servers will not reduce the the security as analyzed above.

Data confidentiality: Li [23] presented the confidentiality of the erasure-coded-based (e.g., RS codes) secret-sharing algorithm, which meets the following condition: if any sub-square matrix of the coding matrix is nonsingular, then it has confidentiality. For example, if a $(n, k)$ erasure-coded-based secret-sharing algorithm satisfies the condition, the adversary cannot reconstruct any part of the plaintext data (which is divided into k chunks) with less than k ciphertext chunks.

ECDedup adopts the Cauchy matrix as the coding matrix, while any sub-square matrix of the Cauchy matrix remains a Cauchy matrix [24,25] (i.e., nonsingular). According to the confidentiality guarantee provided by Li [23], an adversary cannot reconstruct any part of the original data without having k ciphertext chunks. Moreover, the coding matrix of ECDedup is generated via multiple key servers using an additional secret-sharing scheme. Taking the

(n, k) = (12, 8)

Cauchy matrix as an example (in Galois Field

G F (2^{16})

), the smallest sub-square matrix space can be up to

S = 2^{2 \times k \times w} = 2^{256}

, which is at the same level as that of the state-of-the-art encrypted deduplication storage systems [5,14]. Note that for the compromised key server, the adversary only obtains part of coding information of the coding matrix, such that the adversary also needs to obtain the coding information of other key servers to infer the original data of the target ciphertext chunk.

4. Implementation

We implement ECDedup (Section 3) in Golang on Ubuntu-22.04 with about 3.9K SLoC. We use OpenSSL (version 1.0) [26] to implement cryptographic operations. Specifically, we use SHA-256 and AES-256 for the hash operations in matrix generation (Section 3.3).

Chunking and fingerprinting: We first implement content-defined chunking based on Rabin fingerprinting [12] on the client, which takes the minimum, average, and maximum chunk sizes as input (by default as [2,3,4,5,6,14], we set them as 4 KB, 8 KB, and 16 KB, respectively) and computes rolling hashes over chunks via a fixed-size sliding window to identify the boundaries of each chunk. To implement coding-based encryption, the client further generates a content-based key for each data chunk via SHA-256.
Encoding and decoding: We implement the encoding and decoding operation based on Klauspost Reed–Solomon [27], an open-source Reed–Solomon library for Golang which supports SIMD instruction sets [21] (including AVX2, SSE2, and NEON) that can accelerate the encoding and decoding speed. We use the Cauchy matrix as the coding matrix, such that the matrix can be generated based on the chunk’s content.
Server-aided matrix generation: Recall that a client generates a content-based coding matrix for a original data chunk by generating the elements of the Cauchy matrix with the chunk’s content-based key, as in Section 3.3. For each key server, we implement a matrix generation module, which receives the vectors from clients and generates the server-aided encrypted vectors to form the content-based coding matrix.
Deduplication: Each server implements a fingerprint index to map the fingerprint of each ciphertext chunk to the physical address where the ciphertext chunk is stored, using a key-value store based on LevelDB [28]. We also use the unique ciphertext chunks (average in KBs) as a fixed-size container (e.g., 4MB [9]) based on the locality within those chunks, such that the chunks with locality can be prefetched to mitigate the disk I/O overhead.
Blockchain integration: We integrate blockchain into ECDedup to improve the security of the system. We manage the decentralized key servers using a private blockchain and employ zero-knowledge proofs [29] to ensure that the secret key in the key servers cannot be tampered with. We also implement hierarchical classification access control via the smart contracts of blockchain to manage the access control of the data chunks.

5. Evaluation

We conduct testbed experiments to study the performance of our ECDedup. The main results of our evaluation are summarized as follows:

ECDedup achieves the decentralized server-aided encryption with a better encryption (including the key/matrix generation and encoding) performance than the traditional decentralized server-aided encryption schemes.
ECDedup can directly index the ciphertext chunk during deduplication on the cloud server, which is not achievable for the traditional decentralized server-aided encryption schemes, so that it can reduce the indexing time.
ECDedup improves the upload and download performance and can speed up the whole encrypted deduplication storage system.

5.1. Setup

Testbed: We conduct all experiments on Alibaba Cloud with a ecs.g7.xlarge instance as the client and four ecs.g7.xlarge instances as the servers. Each instance is equipped with four vCPUs and 16 GB RAM. All instances are connected via a 10 Gbps network.
Datasets: We use four well-known deduplication datasets for evaluation, as shown in Table 1. These datasets represent various typical workloads, including website snapshots, tarred source-code files, and database snapshots.
Configurations: In our evaluation, we compare four encryption schemes: (i) centralized server-aided encryption based on MetaDedup [3], which has only a centralized key server to handle the key generation requests from clients; (ii) traditional decentralized server-aided encryption based on GDH-Dedup [8], which has multiple key servers, and each key server handles a portion of the key generation requests from clients; and (iii) our ECDedup with two different matrix generation schemes in Section 3. Since our ECDedup only changes the encryption scheme on the client side, we set the same number of 4 servers and 4 key servers in the experiments. For the parameter of erasure coding, we set $(n, k) = (4, 3)$ , as in the state-of-the-art centralized server-aided encryption schemes [2,3,4,14].

5.2. Experiments

Experiment #1 (MLE key/matrix generation time): We first evaluate the key/matrix generation time of different encryption schemes on four fixed-size (e.g., 1 GB, 2 GB, 4 GB, 8 GB and 16 GB) datasets sampled from GCC dataset in Table 1. Figure 7a compares the key/matrix generation time for ECDedup, centralized server-aided encryption, and GDH-Dedup. ECDedup (pipeline) demonstrates the best performance in key/matrix generation time. Compared to centralized server-aided encryption, ECDedup (pipeline) reduces the key/matrix generation time by 14.4%, 26.9%, 33.7%, 32.1% and 32.8% for the 1 GB, 2 GB, 4 GB, 8 GB and 16 GB dataset, respectively, with a maximum reduction of 33.7%.

From Figure 7a, we can also see that our pipeline schemes can effectively accelerate the matrix generation. ECDedup (pipeline) is almost twice as fast as ECDedup (vanilla). Compared to ECDedup (vanilla), ECDedup (pipeline) reduces the key/matrix generation time by up to 53.9% at 16 GB dataset, where

(n, k)

is set to

(4, 3)

by default [14].

Experiment #2 (Encryption and encoding time): We then evaluate the encryption and encoding time of different encryption schemes under the same setup as in Experiment #1. Figure 7b shows the encryption and encoding time of ECDedup (vanilla) and ECDedup (pipeline). We can see that the encoding time of ECDedup (pipeline) is faster than all other encryption schemes. Compared to centralized server-aided encryption, ECDedup (pipeline) reduces the encryption and encoding time by up to 69.27% for the 2 GB dataset. ECDedup (pipeline) can also reduce the encoding time by up to 51.19% compared with ECDedup (vanilla). This is because the pipeline scheme can leverage the built-in acceleration strategy of the encoding library by reusing element sequences $x_{i}$ and $y_{i}$ during encoding.
Experiment #3 (Impact of $(n, k)$ for ECDedup): We evaluate the impact of $(n, k)$ on both the matrix generation time and encoding time of ECDedup (vanilla) and ECDedup (pipeline). We set $(n, k)$ to the commonly used parameters of RS codes (e.g., $(6, 4)$ , $(9, 6)$ , $(12, 8)$ , and $(14, 10)$ ). Figure 8a shows the matrix generation time of ECDedup (vanilla) and ECDedup (pipeline) with different $(n, k)$ settings. We can see that ECDedup (pipeline) is always faster than ECDedup (vanilla) in all cases. Both the matrix generation time and the encoding time of ECDedup (vanilla) and ECDedup (pipeline) increase with the increase in k, and the time of ECDedup (vanilla) increases much faster than ECDedup (pipeline). For example, when $(n, k)$ increases from $(6, 4)$ to $(14, 10)$ , the matrix generation time of ECDedup (pipeline) increases from 8.06 s to 14.45 s, while the matrix generation time of ECDedup (vanilla) increases from 17.96 s to 205.42 s, which shows that ECDedup (pipeline) is 14 times faster than ECDedup (vanilla) when $(n, k) = (14, 10)$ .
Experiment #4 (Client throughput): We also evaluate the client throughput of different encryption schemes under the same setup as in Experiment #1. Figure 9 shows the client throughput of different encryption schemes. We can see that ECDedup (pipeline) achieves the highest client throughput among all encryption schemes. The throughput of all encryption schemes increases with the size of the dataset and eventually stabilizes. This is because the system resources can be fully utilized by the pipeline of ECDedup (pipeline) to continuously process data when processing large amounts of data (e.g., 16 GB). Thus, ECDedup (pipeline) can effectively improve the client throughput by 43.3% to 51.9% compared to the centralized scheme.
Experiment #5 (Average indexing time per chunk): We evaluate the average indexing time per chunk (the average chunk size is set to 8 KB, as in Section 2.1) of GDH-Dedup and our ECDedup. Figure 10 shows the average indexing time per chunk of GDH-Dedup and ECDedup. We can see that the indexing time of ECDedup is much lower than the time of GDH-Dedup. This is because different key servers in GDH-Dedup encrypt the same plaintext chunk into different ciphertext chunks, such that it requires the complex cryptographic computation for indexing a ciphertext chunk.

Table 2 shows the average indexing time per chunk of GDH-Dedup for varying average chunk sizes. We can see that the indexing time of GDH-Dedup decreases exponentially when the average chunk size increases, since GDH-Dedup can only be applied to file-level deduplication [8].

Experiment #6 (Uploads and downloads): We evaluate the upload and download speed of the centralized scheme and our ECDedup on the four real-world datasets in Table 1. Figure 11 shows the upload and download performance of different encryption schemes. We can see that ECDedup (pipeline) outperforms the centralized scheme in all cases. Compared to the centralized scheme, ECDedup (pipeline) can improve the upload and download speed by up to 15.2% and 34.6%, respectively.
Experiment #7 (Blockchain integration): We evaluate the integration of blockchain into ECDedup. Recall that we only use the blockchain to verify the secret key possessed by the key servers (see Section 4). In this experiment, we only focus on the performance overhead incurred from the blockchain integration. Table 3 shows the upload and download performance of ECDedup with and without blockchain. We can see that the blockchain integration only introduces a small overhead to the system, which is less than 0.1% of the total system overhead.

6. Related Work

Decentralized server-aided encryption in encrypted deduplication: MLE-based encrypted deduplication are widely deployed in outsourced storage. Traditional encrypted deduplication storage systems realize the MLE construction via convergent encryption (CE), which is vulnerable to offline brute-force attacks. DupLESS proposes server-aided MLE to perform MLE key generation in a dedicated key server to defend against offline brute-force attacks. Follow-up studies enhance server-aided MLE with decentralized key generation. For example, Duan [33] leverages RSA-based threshold signatures and generate convergence keys among multiple users to address single node failure and key leakage issues via eliminating centralized key servers. Miao et al. [7] allow each key server to independently generate keys and calculate shared the keys to all other key servers via blind signature. GDH-Dedup is based on the Gap–Diffie–Hellman (GDH) group. It reduces the computational overhead by solving the decisional Diffie–Hellman problem, which is considered to be equivalent to the computational Diffie–Hellman problem, considered difficult in cryptography.

However, all the above implementations of decentralized server-aided encryption build on expensive cryptographic primitives and incur high computational overhead or communication overhead. In this paper, we are the first to apply secret sharing to share the keys among decentralized key servers and achieve efficient decentralized server-aided encryption.

Secret sharing: Secret sharing is first formalized by Shamir [34] to provide security and reliability for shared data. Shamir’s secret sharing scheme (SSSS) can be considered as Reed–Solomon (RS) codes, which encode k shares (including one data share and $k - 1$ random shares) into n shares to ensure the data confidentiality of the shared data. Ramp secret-sharing scheme (RSSS) [35] improves the storage efficiency atop SSSS via dividing a secret into $k - r$ pieces and generate r additional random pieces for fault tolerance and security. Dekey [36] uses RSSS to disperse and store the convergent keys of data blocks in multiple key storage servers, which ensures the reliability of the keys while reducing the storage overhead of the keys. AONT-RS [15] combines Rivest’s all-or-nothing transform (AONT) [16] for confidentiality and RS codes for fault tolerance to further improve the storage efficiency while maintaining the highest confidentiality as SSSS. CAONT-RS [14] uses a deterministic to replace the random key used in AONT-RS to enable deduplication on shared data.

However, existing secret-sharing schemes only consider sharing data or keys among distributed servers. In this paper, we are the first to leverage secret sharing to realize decentralized server-aided encryption, aiming to achieve better performance and ensure data confidentiality.

7. Conclusions

In this paper, we present ECDedup, a new decentralized server-aided encrypted deduplication approach for outsourced storage, which performs deduplication to eliminate the duplicate data within encrypted data for both storage efficiency and data confidentiality. ECDedup leverages secret sharing to achieve secure and efficient key management. We are the first to use the coding matrix as the encryption MLE key to couple the encryption and encoding processes in encrypted deduplication. We further improve the encoding performance of our ECDedup via pipelining the matrix generation and the server-aided encryption. We implement ECDedup, and our experimental results on the real-world backup datasets demonstrate the effectiveness of our ECDedup.

Author Contributions

Conceptualization, C.G. and Y.H.; Formal analysis, C.G.; Funding acquisition, W.W. (Weichun Wang), Y.H., W.W. (Wei Wang) and H.H.; Investigation, Q.X.; Methodology, C.G.; Project administration, Y.H.; Software, X.Z. and S.D.; Supervision, Y.H.; Validation, X.Z. and S.D.; Writing—original draft, C.G. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program of Hubei Province (No. 2021BAA189).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to (privacy restrictions).

Conflicts of Interest

Authors Weichun Wang, Wei Wang, and Huadong Huang were employed by the company Hikvision Digital Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bellare, M.; Keelveedhi, S.; Ristenpart, T. DupLESS: Server-Aided Encryption for Deduplicated Storage. In Proceedings of the USENIX Security, Washington, DC, USA, 14–16 August 2013. [Google Scholar]
Qin, C.; Li, J.; Lee, P.P. The design and implementation of a rekeying-aware encrypted deduplication storage system. ACM Trans. Storage 2017, 13, 1–30. [Google Scholar] [CrossRef]
Li, J.; Lee, P.P.; Ren, Y.; Zhang, X. Metadedup: Deduplicating metadata in encrypted deduplication via indirection. In Proceedings of the IEEE MSST, Santa Clara, CA, USA, 20–24 May 2019. [Google Scholar]
Li, J.; Yang, Z.; Ren, Y.; Lee, P.P.; Zhang, X. Balancing storage efficiency and data confidentiality with tunable encrypted deduplication. In Proceedings of the EuroSys, Heraklion Greece, 27–30 April 2020. [Google Scholar]
Ren, Y.; Li, J.; Yang, Z.; Lee, P.P.; Zhang, X. Accelerating Encrypted Deduplication via SGX. In Proceedings of the USENIX ATC, Online, 14–16 July 2021. [Google Scholar]
Yang, Z.; Li, J.; Lee, P.P. Secure and lightweight deduplicated storage via shielded {deduplication-before-encryption}. In Proceedings of the USENIX ATC, Carlsbad, CA, USA, 11–13 July 2022. [Google Scholar]
Miao, M.; Wang, J.; Li, H.; Chen, X. Secure multi-server-aided data deduplication in cloud computing. Pervasive Mob. Comput. 2015, 24, 129–137. [Google Scholar] [CrossRef]
Shin, Y.; Koo, D.; Yun, J.; Hur, J. Decentralized server-aided encryption for secure deduplication in cloud storage. IEEE Trans. Serv. Comput. 2020, 13, 1021–1033. [Google Scholar] [CrossRef]
Xia, W.; Jiang, H.; Feng, D.; Douglis, F.; Shilane, P.; Hua, Y.; Fu, M.; Zhang, Y.; Zhou, Y. A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 2016, 104, 1681–1710. [Google Scholar] [CrossRef]
Zhu, B.; Li, K.; Patterson, R.H. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the USENIX FAST, San Jose, CA, USA, 26–29 February 2008. [Google Scholar]
Wallace, G.; Douglis, F.; Qian, H.; Shilane, P.; Smaldone, S.; Chamness, M.; Hsu, W. Characteristics of backup workloads in production systems. In Proceedings of the USENIX FAST, San Jose, CA, USA, 14–17 February 2012. [Google Scholar]
Rabin, M.O. Fingerprinting by Random Polynomials; Technical Report; Scientific Research Publishing: Wuhan, China, 1981. [Google Scholar]
Douceur, J.R.; Adya, A.; Bolosky, W.J.; Simon, P.; Theimer, M. Reclaiming space from duplicate files in a serverless distributed file system. In Proceedings of the IEEE ICDCS, Vienna, Austria, 2–5 July 2002. [Google Scholar]
Li, M.; Qin, C.; Lee, P.P. CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal. In Proceedings of the USENIX ATC, San Jose, CA, USA, 8–10 July 2015. [Google Scholar]
Resch, J.K.; Plank, J.S. AONT-RS: Blending Security and Performance in Dispersed Storage Systems. In Proceedings of the USENIX FAST, San Jose, CA, USA, 15–17 February 2011. [Google Scholar]
Rivest, R.L. All-or-nothing encryption and the package transform. In Proceedings of the Springer FSE, Haifa, Israel, 20–22 January 1997. [Google Scholar]
Reed, I.; Solomon, G. Polynomial Codes over Certain Finite Fields. J. Soc. Ind. Appl. Math. 1960, 8, 300–304. [Google Scholar] [CrossRef]
Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Suh, C. A Survey on Network Codes for Distributed Storage. Proc. IEEE 2011, 99, 476–489. [Google Scholar] [CrossRef]
Ateniese, G.; Burns, R.; Curtmola, R.; Herring, J.; Kissner, L.; Peterson, Z.; Song, D. Provable data possession at untrusted stores. In Proceedings of the ACM CCS, Alexandria, VA, USA, 28–31 October 2007. [Google Scholar]
Plank, J.S.; Luo, J.; Schuman, C.D.; Xu, L.; OHearn, Z. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage. In Proceedings of the USENIX FAST, San Jose, CA, USA, 24–27 February 2009. [Google Scholar]
Plank, J.S.; Greenan, K.M.; Miller, E.L. Screaming fast Galois field arithmetic using intel SIMD instructions. In Proceedings of the USENIX FAST, San Jose, CA, USA, 12–15 February 2013. [Google Scholar]
Hu, Y.; Cheng, L.; Yao, Q.; Lee, P.P.C.; Wang, W.; Chen, W. Exploiting combined locality for Wide-Stripe erasure coding in distributed storage. In Proceedings of the USENIX FAST, San Jose, CA, USA, 23–25 February 2021. [Google Scholar]
Li, M. On the confidentiality of information dispersal algorithms and their erasure codes. arXiv 2012, arXiv:1206.4123. [Google Scholar]
Roth, R.M.; Lempel, A. On MDS codes via Cauchy matrices. IEEE Trans. Inf. Theory 1989, 35, 1314–1319. [Google Scholar] [CrossRef]
Blomer, J. An XOR-Based Erasure-Resilient Coding Scheme; Technical Report at ICSI; International Computer Science Institute (ICSI): Berkeley, CA, USA, 1995. [Google Scholar]
OpenSSL. Cryptography and SSL/TLS Toolkit. Available online: https://www.openssl.org/ (accessed on 1 November 2024).
Reed-Solomon Erasure Coding in Go. Available online: https://github.com/klauspost/reedsolomon (accessed on 1 November 2024).
LevelDB. Available online: https://github.com/google/leveldb (accessed on 1 November 2024).
Fiege, U.; Fiat, A.; Shamir, A. Zero knowledge proofs of identity. In Proceedings of the STOC, New York, NY, USA, 25–27 May 1987. [Google Scholar]
Zhang, Y.; Xia, W.; Feng, D.; Jiang, H.; Hua, Y.; Wang, Q. Finesse:Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression. In Proceedings of the USENIX FAST, Boston, MA, USA, 25–28 February 2019. [Google Scholar]
Linux Archives. Available online: https://www.kernel.org (accessed on 1 November 2024).
Redis. Available online: http://redis.io/ (accessed on 1 November 2024).
Duan, Y. Distributed key generation for encrypted deduplication: Achieving the strongest privacy. In Proceedings of the ACM CCSW, Scottsdale, AZ, USA, 3–7 November 2014; pp. 57–68. [Google Scholar]
Shamir, A. How to share a secret. Commun. ACM 1979, 22, 612–613. [Google Scholar] [CrossRef]
Blakley, G.R.; Meadows, C. Security of ramp schemes. In Proceedings of the CRYPTO, Santa Barbara, CA, USA, 18–22 August 1985. [Google Scholar]
Li, J.; Chen, X.; Li, M.; Li, J.; Lee, P.P.; Lou, W. Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 1615–1625. [Google Scholar] [CrossRef]

Figure 1. System architecture of ECDedup.

Figure 2. Sharing the coding matrix among the key servers.

Figure 3. An example of the fixed-size sampling.

Figure 4. Two ways of generating matrix in ECDedup.

Figure 5. Pipelining matrix generation and server-aided encryption.

Figure 6. Sliding-window-based matrix generation.

Figure 7. Comparison of different encryption schemes.

Figure 8. Exp 3: Impact of

(n, k)

for ECDedup.

Figure 8. Exp 3: Impact of

(n, k)

for ECDedup.

Figure 9. Exp 4: Comparison of client throughput for different encryption scheme.

Figure 10. Exp 5: Average indexing time per chunk.

Figure 11. Exp 6: Comparison of upload and download performance of different encryption schemes.

Table 1. The workload characteristics of the tested datasets.

Name	Size	Workload Descriptions
Webs	347 GB	129 days’ snapshots of the website: news.sina.com.cn [30]
Linux	109 GB	253 versions of Linux kernel source code [31]. Each version is packaged as a tar file.
RDB	11 GB	10 backups of the Redis key-value store database [32].
GCC	28 GB	89 backups of the GNU compiler collection source code files [32].

Table 2. Average indexing time of GDH-Dedup.

Average chunk size	8 KB	16 KB	32 KB	64 KB
Indexing time (s)	1726.10	813.92	312.96	137.15

Table 3. Exp 7: Time breakdown of ECDedup on RDB dataset.

	Upload Time (s)	Download Time (s)	Verification Time (ms)
ECDedup	79.05	67.86	9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gan, C.; Wang, W.; Hu, Y.; Zhao, X.; Dun, S.; Xiao, Q.; Wang, W.; Huang, H. Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication. Appl. Sci. 2025, 15, 1245. https://doi.org/10.3390/app15031245

AMA Style

Gan C, Wang W, Hu Y, Zhao X, Dun S, Xiao Q, Wang W, Huang H. Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication. Applied Sciences. 2025; 15(3):1245. https://doi.org/10.3390/app15031245

Chicago/Turabian Style

Gan, Chuang, Weichun Wang, Yuchong Hu, Xin Zhao, Shi Dun, Qixiang Xiao, Wei Wang, and Huadong Huang. 2025. "Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication" Applied Sciences 15, no. 3: 1245. https://doi.org/10.3390/app15031245

APA Style

Gan, C., Wang, W., Hu, Y., Zhao, X., Dun, S., Xiao, Q., Wang, W., & Huang, H. (2025). Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication. Applied Sciences, 15(3), 1245. https://doi.org/10.3390/app15031245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coupling Secret Sharing with Decentralized Server-Aided Encryption in Encrypted Deduplication

Abstract

1. Introduction

2. Background and Motivation

2.1. Encrypted Deduplication

2.2. Secret Sharing

2.3. Limitations

2.4. Motivation and Challenges

2.5. Threat Model

3. Design

3.1. Architecture

3.2. Coding-Based Secret Sharing

3.3. Content-Based Coding Matrix

3.4. Optimizing Matrix Generation

3.5. Discussion on Security

4. Implementation

5. Evaluation

5.1. Setup

5.2. Experiments

6. Related Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI