1. Introduction
Image data transmission has the dual requirements of compression and encryption, like any other type of data. Compression is a process that reduces the data size by exploiting redundancies (such as spatial and psycho-visual redundancies) present in an image, whereas encryption makes an image unintelligible by adding randomness to it. Thereby, both are related but inverse processes, and the order in which they are coupled together results in a tradeoff between compression and security efficiencies. The conventional order is to perform compression prior to encryption, compression-then-encryption (CtE) methods, as completing encryption before compression will destroy the image correlation. In this regard, traditional number theory and chaos theory-based encryption algorithms are proven to be secure for the protection of multimedia content [
1,
2]. The CtE methods perform pixel scrambling or stream encryption and are mainly applicable for the encryption of raw images. However, they are not adequate to encrypt compressed images while preserving the compression savings, image format, and providing the necessary level of security. For example, when encrypting a JPEG image, this operation can disturb JPEG format identifiers, which may lead to certain issues such as format incompatibility and an increment in the file size. Any changes to the JPEG markers may render them uninterpretable and re-encoding the cipher text as a JPEG image will increment the image size. Image format compliancy is necessary for cloud-based photo storage services (CPSS), social networking services (SNS), reversible data-hiding applications, and image processing in the encryption domain (such as image retrieval, privacy-preserving machine learning (PPML), etc.).
Another way to couple compression and encryption is the joint operation of performing encryption within compression, encryption-in-compression (EiC) methods. However, there are certain limitations based on the position of the encryption algorithm in the compression. For example, encryption can be achieved by using multiple new orthogonal transforms during the transformation stage, as proposed in [
3,
4]. The schemes deliver a better tradeoff between compression and encryption. However, in [
3], the security strength is limited, as only the transformation is modified and the lack of diffusion property makes them vulnerable to differential attacks [
5]. On the other hand, the scheme proposed in [
4] has better coding efficiency than [
3]; however, the block-level processing limits the decorrelation abilities and makes it vulnerable to statistical attacks [
5]. An alternative way is to perform encryption in the quantization step either by scrambling quantized DC and AC coefficients [
6] or by changing the magnitudes of entries in the quantization table [
7,
8]. The main advantage of these methods is format compliancy. The scheme proposed in [
6] preserves almost the same compression savings; however, it is vulnerable to non-zero-counting attack. The methods proposed in [
7,
8] are not secure enough, and the compression ratio also suffers. In [
9], efficient security and format compliancy is achieved by encrypting only DC coefficients and the first 14 AC coefficients in the zigzag scan. However, the compression savings are heavily compromised. Alternatively, encryption can be achieved in the intermediate encoding step of the JPEG compression. For example, Ref. [
10] proposed that instead of using the standard zigzag scan, the DCT coefficients can be scanned by using different patterns. Such methods provide better security with good diffusion and confusion properties. However, their main limitations are a high computation cost and an increment in file size. To achieve a better compression and encryption tradeoff, Ref. [
11] proposed the encryption of selected coefficients specified by a range and the shuffling of the block identifier positions. However, this leads to a format incompatibility issue. Finally, in the entropy encoding stage, one way to achieve encryption is to use multiple Huffman tables [
12]. Because the probability distribution of the image is left unaltered by the encryption process, the compression savings are preserved. However, during decoding of the cipher image, all the Huffman tables should be made available to the decoder, which results in a format compliancy problem. In addition, such schemes are vulnerable to known- and chosen-plaintext attacks [
13]. An alternative method is proposed in [
8] to encrypt the output bitstream of the Huffman encoder while keeping the Huffman codes unmodified. The method preserves both the files size and image format. However, leaving the Huffman codes in plain makes the method vulnerable to image contour reconstruction attack. In [
14], the authors proposed to assign each Huffman codeword to another codeword with the same code length to carry out encryption. The method is compression friendly; however, adopting their mapping strategy leads to a format compliancy problem. Refs. [
15,
16] proposed selective encryption of the DCT coefficients; however, it requires knowledge of the important coefficients beforehand. A hybrid compression encryption is proposed in [
17] based on chaos theory, but it does not consider JPEG compression standard.
JPEG image encryption has requirements of format compliance, reasonable security and small file size increment. The CtE and EiC methods are unable to meet these requirements, as discussed earlier. An alternative approach is to perform encryption before compression, encryption-then-compression (EtC) methods. The main challenge of reversing the CtE order is the preservation of compression savings, as the encryption process disturbs the image correlation [
18]. However, the methods proposed in [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29] have shown that compression of the encrypted images can be achieved with a slight degradation or even with the same compression savings. The methods provide a necessary level of security, but they are not JPEG compatible. In recent years, a new image encryption algorithm has been proposed to hide only the perceptual details of an image while retaining its intrinsic properties necessary for compression. The methods belong to the EtC class, and in this paper, we referred to them as compressible perceptual encryption methods—CPE for short. The encryption algorithm is block based and performs four steps: block permutation, block rotation, block inversion, and pixel level negative–positive transformation. Nonetheless, these schemes are robust against various types of attacks including brute-force attack and cipher-text only attack. The encryption algorithm is computationally inexpensive and is JPEG compatible, thereby suitable for CPS and SNS services, image retrieval systems, and PPML applications and medical image services.
Several studies have improved the encryption efficiency of the CPE schemes. For example, Ref. [
30] proposed a CPE scheme with an additional step to permute the blocks in the color channels for improved encryption efficiency. However, this scheme has a limitation on the keyspace size resulting from the choice of block size. The smallest block size that can be used is 16 × 16 to avoid distortion in the recovered image when JPEG chroma subsampling is being used. In [
31], the authors proposed to process each color component independently for a larger keyspace size. However, the methods are only compatible with the JPEG lossless compression standard. To deal with these issues, Refs. [
32,
33] proposed to represent the input image as a grayscale image by combining the color channels along the horizontal or vertical direction. Such representation allows the use of a smaller block size of 8 × 8, thus improving the encryption efficiency. The methods proposed in [
30,
31,
32,
33] have color image input as a prerequisite for better encryption efficiency. In [
34], the authors proposed sub-block processing for the efficient encryption of grayscale images. However, in the CPE schemes, there is a tradeoff between encryption and compression efficiency because of the block size. For efficient encryption, a larger number of blocks is desirable to expand the keyspace [
35].
In this paper, we present a comprehensive analysis of the JPEG-compatible CPE schemes in terms of their encryption and compression efficiencies. The existing surveys in the literature are either focused on the image encryption techniques that are applicable for raw image protection [
1,
2,
36,
37] or nonstandard image compression formats [
38,
39]. To the best of our knowledge, Refs. [
5,
40,
41] are the most related surveys to the current study that deal with the JPEG-compatible perceptual encryption schemes. In [
40,
41], the authors studied CPE and noncompressible perceptual encryption methods mainly from a PPML application point of view. On the other hand, the authors in [
5] focused on joint compression and encryption algorithms in general and covered only two CPE schemes. Different from the existing surveys, the main contributions of the current survey can be summarized as follows: (1) An evaluation of compression performance under various conditions, such as input image representation, colorspace conversion, quantization table choice, and compression with and without chroma subsampling, is performed in this study. (2) In the literature, the compression savings of the methods were subjectively analyzed using only peak signal-to-noise ratio (PSNR)-based rate distortion (RD) curves. On the contrary, the current study uses better image quality metrics, such as multiscale structural similarity index measure (MS–SSIM), and objectively compares the RD curves using Bjøntegaard delta (BD) metrics. (3) In the literature, security efficiency of the CPE schemes was analyzed by showing robustness against a jigsaw puzzle solver (JPS) attack only. In contrast, the current study compares the CPE methods using differential attack analysis, histogram variance analysis, entropy analysis, and correlation coefficient analysis along with the keyspace size analysis and robustness against the JPS attack.
The rest of the paper is summarized as follows:
Section 2 presents the related work on CPE schemes along with their applications.
Section 3 provides preliminary details including the JPEG image standard.
Section 4 gives an overview of the CPE methods. In
Section 5, several CPE schemes were implemented under different conditions and compared for their compression and encryption performance efficiencies.
Section 6 discusses the CPE scheme advantages with respect to the application requirements and gives future research directions.
Section 7 concludes the paper.
2. Related Work
Figure 1 shows a taxonomy of image encryption methods, which classifies them into full encryption and partial encryption methods. The full encryption methods hide all the information of an image and comprise the traditional number theory- and chaos theory-based algorithms. The partial encryption methods hide only selected information in an image, for example, the selective encryption algorithms only protect the region of interest in an image, whereas the perceptual encryption algorithms only hide the human perceivable and identifiable information in an image. The perceptual encryption algorithms can be further classified as incompressible methods, which perform pixel level scrambling, and compressible methods, which process image blocks. In
Figure 1, from left to right, the encryption algorithms computational complexity decreases and security is traded to enable other multimedia applications such as format compliant storage and even processing the encryption domain. The main focus of the present study are the perceptual encryption methods, specifically, the block-based compressible methods.
In general, the encryption algorithm of a CPE scheme is block-based and consists of four steps: block permutation, block rotation, block inversion, and negative and positive transformation. There is an optional color-channel shuffling step that is used when the input is a color image. The existing CPE methods can be classified based on their input image representation, such as Color CPE, Extended CPE, inter and intra block processing-based CPE (IIB–CPE) and pseudo-grayscale-based CPE (PGS–CPE) methods. In the Color CPE, Extended CPE, and IIB–CPE methods, an input color image is represented by its three color components, whereas in PGS–CPE methods, the color components of an input color image are concatenated along the horizontal or vertical direction to form a pseudo-grayscale image. An alternative classification of CPE methods is based on their mode of processing, for example, methods that transform an entire block include the Color CPE, Extended CPE, and PGS–CPE methods, and methods that incorporate sub-block processing include the IIB–CPE methods. This CPE classification is beneficial when the input is a grayscale image. The following subsections present the related work on each category along with their applications.
2.1. Color CPE Methods
Watanabe et al. proposed a Color CPE method that performs a color-channel shuffling step for better security, and their method is compatible with the JPEG 2000 standard [
42] and the motion JPEG 2000 standard [
43]. The applications of their method have been further extended by Kurihara et al. to the JPEG standard [
30], the motion JPEG standard [
44], the JPEG XR standard [
45], and lossless image compression standards [
46]. The Color CPE methods process image blocks with the same key in each color channel. The methods use a block size of 16 × 16 in the encryption algorithm to take advantage of the JPEG chroma subsampling step for better compression savings without any adverse effects. These methods preserve the JPEG file format and almost the same compression savings. However, the use of the common key to encrypt each channel leaves the color distribution unaltered, and the larger block size results in a smaller keyspace. This information makes the Color CPE schemes vulnerable to JPS attack [
31].
2.2. Extended CPE Methods
To alter the color distribution in the Color CPE methods efficiently, Imaizumi et al. [
31,
47] proposed to process each color component individually in the permutation, rotation, inversion, and negative–positive transformation steps. This independent processing expands the keyspace size and modifies the color distribution significantly; however, this results in JPEG format compatibility issues. The main reason is that the JPEG standard requires colorspace conversion prior to compression and the Extended CPE methods are not suitable for this conversion function.
2.3. PGS–CPE Methods
In order to deal with the issue of Extended CPE methods, Chuman et al. proposed in [
33] to perform the JPEG colorspace conversion prior to the encryption process. In addition, they proposed to concatenate the color components along the horizontal or vertical direction to form a pseudo-grayscale image. This grayscale representation can benefit from the smallest allowable block size, i.e., the JPEG performs a grayscale image compression on an 8 × 8 block size. This use of a small block size results in a larger keyspace size than the Color CPE and Extended CPE schemes. However, the PGS–CPE method proposed in [
33] is not suitable for the JPEG chroma subsampling function. To deal with this issue, Sirichotedumrong et al. proposed in [
32,
48] to perform both the JPEG colorspace conversion and chroma subsampling functions prior to the encryption. The idea is to downsample the color components after the colorspace conversion and concatenate them with the luminance component. In addition, they proposed custom quantization tables in [
48] that can be used in the JPEG standard for better compression performance.
2.4. IIB–CPE Methods
The Extended CPE and PGS–CPE methods have improved the security efficiency of the Color CPE methods, as the color distribution is scrambled significantly and the keyspace is expanded (especially in PGS–CPE methods). However, these schemes have a prerequisite of a color image as an input, for example, to achieve a large number of blocks, the individual color component processing (Extended CPE methods) and the pseudo-grayscale image representation (PGS–CPE methods) are only possible when the input is a color image. This advantage of these methods diminishes when the input image is a grayscale image with only one channel [
49]. To overcome this limitation, Ahmad et al. proposed in [
34,
49,
50] an inside-out transformation function that performs the rotation and inversion step on a sub-block level. Compared to the CPE methods that transform an entire block, these methods have a larger keyspace size for grayscale image processing. However, the methods are not suitable when the JPEG algorithm is implemented with the chroma subsampling function for color image compression.
Overall, in the CPE schemes—block-based perceptual encryption methods—there is an efficiency tradeoff between encryption and compression efficiencies because of the choice of block size. Specifically, a block size of no smaller than 16 × 16 and 8 × 8 should be used when considering the compression efficiency of the JPEG standard for color and grayscale images, respectively.
2.5. CPE Scheme Applications
The CPE schemes are suitable for privacy-preserving applications such as privacy-preserving photo sharing and storage services, privacy-preserving image retrieval systems, and PPML applications. In addition, the CPE schemes can also be used for reversible data-hiding applications.
Privacy-preserving photo sharing and storage applications: A privacy-preserving image trading system was proposed in [
51] that uses the Color CPE algorithm of [
30] for image copyright protection. In [
52,
53], the authors extended the applications of the Color CPE scheme in [
30] to privacy-preserving photo sharing over third-party provided SNS. The main challenge in such applications are the artifacts resulting from the recompression of images by the SNS provides. The authors in [
53] determined some parameters that can be used in order to resist such manipulations. Similarly, photo-sharing schemes based on an extended algorithm of the Color CPE and of the PGS–CPE were proposed in [
54,
55] and [
56], respectively. The main advantage of the schemes was the identification of images re-encrypted with different keys. In [
34,
50], the authors proposed privacy-preserving photo storage for medical image applications based on an IIB–CPE scheme.
Privacy-preserving image retrieval applications: The CPE scheme’s cipher images preserve the image local contents on a block level; this information can be exploited for image retrieval applications without revealing the visual information of the image, as demonstrated in [
57,
58,
59,
60]. To achieve security, they used a Color CPE scheme with the JPEG and JPEG–LS standards.
Privacy-preserving computations applications: In [
61,
62], the authors identified a novel property of the CPE schemes that allows the computation of machine learning algorithms, such as support vector machines (SVM), in the encryption domain. They have shown that under different transformation functions of the CPE schemes, both the Euclidean distance and inner product of two vectors are preserved. In their experiments, they used a Color CPE algorithm without the color shuffling step for face recognition in a grayscale image dataset. Their analysis showed that the CPE schemes have no effect on the performance of the SVM algorithm. In similar work presented in [
63], the authors used an Extended CPE method for face recognition in a color image dataset. Besides face recognition tasks, CPE-based privacy-preserving image classification has been performed in [
49,
64,
65,
66]. Specifically, in [
64], the authors implemented an isotropic network such as vision transformers with the Color CPE scheme for natural image classification. In [
65], the authors implemented four different extensions of IIB–CPE and analyzed their effect on a CNN model’s accuracy. The same authors implemented a CNN-based model with a IIB–CPE scheme for natural image classification in [
49] and for COVID-19 diagnosis in chest X-ray images in [
66].
Reversible data-hiding applications: In [
67,
68,
69,
70,
71], the authors have proposed reversible data-hiding schemes using CPE cipher images. Retrieving the original image reversibility is an essential requirement of any data-hiding algorithm [
69]. Therefore, to meet this requirement, the lossless JPEG standard should be used. Though both Color CPE and Extended CPE schemes are suitable for these applications, the data-hiding methods proposed in [
67,
68,
69,
70,
71] are based on the Extended CPE methods to benefit from the larger keyspace size for efficient encryption.
3. Preliminaries
3.1. Notation Convention
Throughout this paper, scalars are denoted by italic letters , row vectors by boldface letters , and matrices by capital boldface letters , where represents the entry of at row , column . The transpose of a matrix/vector is denoted by . Matrices are sometimes expressed in the compact form , where is the row. Sets are denoted using script letters .
3.2. Image Block Partition
For a convenient representation of image partitioning, the number of rows and columns of an image can be represented as a product of two integers such as rows and columns. The image, therefore, can be divided into blocks each with pixels. The blocks can be represented in this image as with where the pair corresponds to entry of the original image with some offset. For sub-block partitioning of a block , its number of rows and columns can be represented in the same way as . Consequently, this block will have sub-blocks, each with elements and denoted as , where the pair corresponds to entry of the block with some offset.
3.3. The JPEG Image Standard
The JPEG compression standard is one of the most widely used image formats. A block diagram of the JPEG algorithm is illustrated in
Figure 2. The JPEG compression and decompression procedures can be described in the following steps.
In the first step, the luminance component of an input image is separated from its color component, which is necessary to achieve more compression savings. The human visual system (HVS) is less sensitive to color than the image luminosity; therefore, the JPEG algorithm represents the color component in a smaller resolution; thus, it achieves more savings [
72]. This process is called color or chroma subsampling. The ratio for chroma-subsampling depends on the application requirements; however, the most commonly used ratios are 4:2:2 (half of the color) and 4:2:0 (quarter of the color). The image luminance component (
Y) can be separated from the image color components (
Cb and
Cr) by a colorspace conversion function defined as
where
is the red,
is the green, and
is the blue color channel of the image. The Equation (1) converts an image from the RGB colorspace to the YCbCr colorspace. During decoding, an inverse operation is performed that converts the YCbCr image back to an RGB image, and this operation is defined as
Note that when chroma–subsampling is performed during compression, then it is necessary to up sample the color components before the YCbCr to RGB conversion function during decompression to recover the full resolution image.
The YCbCr image is divided into non-overlapping blocks, and each block is then transformed using the DCT function [
73]. The goal here is to represent a large amount of information from a few data samples by exploiting the correlations among the adjacent pixels. In natural images, the pixels are usually high correlated up to 8 pixels neighbors in either direction [
17]. Therefore, in the JPEG standard, a block size of 8 × 8 is used. The forward DCT function for the image block
can be defined as [
72]
The result of the DCT function for an 8 × 8 image block is a 64 coefficient matrix that contains the 2D spatial frequencies. The element (0,0) in the matrix is called “DC coefficient” and has zero frequency in both directions. The remaining 63 elements are called the “AC coefficients”, for which the frequencies increase from left top corner to the right bottom corner in the matrix [
72]. The inverse function of Equation (3) during decompression can be defined as
As a result of the DCT function, most of the image contents are preserved in a few coefficients (low frequency), mostly in the top left corner of each block. The rest of the DCT coefficients corresponding to the higher frequencies are visually insignificant psycho-visual redundancies and can be discarded. Therefore, the next step in the JPEG compression is quantization, which divides each DCT coefficient by its corresponding element given in a 64-element quantization table (
QT). The quantization step is controlled by a scalar value known as the JPEG quality factor (
qf). The range is [0, 100], where 0 represents the lowest and 100 represents the highest quality image. The quantization function of the JPEG compression can be defined as
The JPEG standard includes two quantization tables, one for each of the luminance and chrominance components given in
Table 1 and
Table 2. The standard tables are specified for
qf = 50, from which other tables can be calculated. In addition, these tables can also be user-defined input to the encoder. Examples of custom quantization tables proposed in [
48] that are used for the PGS–CPE cipher image compression are given in
Table 3 and
Table 4. During decoding, the inverse function of Equation (5) simply performs a multiplication operation to estimate the closest representation of the original DCT values as
In this step, the quantized DCT coefficients are represented in such a way that more compression savings can be achieved in the final step. First, the coefficients of each block are scanned in a zigzag order onto a vector called the Minimum Code Unit (MCU). As a result, zeros corresponding to the higher frequencies end up together and can be encoded in an efficient way, i.e., an End of Block (EOB) symbol is added to the MCU after the last non-zero coefficient. The DC and AC coefficients have different properties; thus, the DC coefficient is treated differently from the rest of the 63 AC coefficients. The DC coefficients of adjacent blocks have a higher correlation; therefore, the coefficients are differentially pulse code modulated (DPCM) with each other. A prediction error between the adjacent DC coefficients is encoded as the amplitude value of the coefficient in ones complement form. The size category of the prediction error is included in the head of the coefficient. The quantized AC coefficients are run-length encoded (RLC) such that the consecutive zero coefficients are compressed. The non-zero coefficients are encoded as [(run length, size), amplitude], where run length is the number of zeros between two consecutive non-zero AC coefficients and size is the number of bits required to represent the amplitude. The run length together with size are encoded as head of the coefficient. The value of the coefficient is encoded as an amplitude in ones complement form. The head parameter of each coefficient is entropy encoded, as discussed below.
In the previous step, the quantized DCT coefficients are represented in such a way that they can be efficiently compressed with an entropy encoder such as the Huffman encoder. The Huffman encoding scheme assigns a variable length code (VLC) to each symbol based on its probability. The main idea of VLC is to assign shorter codes to the most probable symbols and longer codes to the less probable symbols. During decompression, a Huffman decoder along with the coding tables are used to recover the symbols from the compressed bitstream.
4. Block-Based Compressible Perceptual Encryption Methods
The main idea of the CPE methods is to divide an image into blocks, as discussed in
Section 3.2, and perform some geometric and color transformations on them in order to protect the image global contents. Such block-level processing preserves the image local contents such as the spatial correlation of the neighboring pixels within a block. This correlation can be exploited by an image compression algorithm to compress the cipher images. A careful consideration of the block size is required to achieve the best tradeoff between the compression and encryption efficiencies. For example, in the JPEG standard, the smallest allowable block sizes are 16 × 16 and 8 × 8 for color and grayscale image compression, respectively. In general, CPE methods consist of the following three steps:
An input color image , whose dimensions are specified by rows, columns, and components, can either be represented as a true color image or a pseudo-grayscale image by concatenating the color components in either the vertical direction as or the horizontal direction as . On the other hand, when the input is a grayscale image , this step is omitted.
CPE methods perform geometric transformations to change block positions (block permutation) and block orientations (block rotations and inversions), and color transformations (color channel shuffles and negative–positive transformations) to alter pixel values in the blocks. Each of the transformation functions is controlled by a randomly generated key. The set of all these keys serves as the secret key of the CPE scheme. The encryption algorithm of the CPE schemes is a symmetric-key algorithm, where the same set of keys is used for both the encryption of plain images and the decryption of cipher images. The encryption and decryption processes are shown in
Figure 3, where
is the secret symmetric key used in the
ith step.
The final step is to compress the cipher image using the JPEG image standard. The JPEG color or grayscale image compression mode is chosen based on the input image representation in Step 1.
The PE methods can be classified into two categories based on their preprocessing step: methods that represent the input as a color image and methods that represent the input as a pseudo-grayscale image. The basic form of the first category is to process each color component with the same key; we named these Color CPE methods. These methods can be extended to process each color component independently (Extended CPE) and to introduce sub-block-level processing (IIB–CPE). The second category, where the input is represented in grayscale, is named PGS–CPE methods.
4.1. Color CPE Methods
A Color CPE algorithm was proposed in [
30,
44] for SNS and CPSS applications. In the algorithm, an image
with
pixels in
color channels is divided into
blocks, where
and
. A cipher image can be generated as shown in
Figure 4, and the procedure described is below:
An input color image , whose dimensions are specified by rows, columns, and components, is represented as a true color image in the RGB colorspace.
Divide the image into blocks where and , and each block has color channels with pixels.
Shuffle the block positions in the image using a secret key generated randomly. The key size is equal to the number of blocks, where each of its entries represent a block’s new position in the scrambled image.
Change the block orientations in the shuffled image by a composite function of rotation and inversion transformations. This transformation is controlled by a randomly generated key where its entries represent rotation and inversion axis.
Change the pixel values by applying a negative–positive transformation function to each pixel in a block randomly chosen by a key
. The
is a binary key where the elements are uniformly distributed. The negative–positive transformation function for a block
is defined as
where
(
) is a pixel value in the block and
is its modified value, and
is the ith element of the key
.
Shuffle the color components of each block using key . Each element of the represents a unique permutation of the color channels.
The final step is to JPEG compress the cipher image obtained in the previous step. Because the input was represented as a color image in the RGB colorspace (Step 1), the JPEG compression can be carried out in the color mode either using RGB or YCbCr colorspace. When a suitable block size is used during encryption, such as then a user can benefit from the JPEG chroma subsampling for additional compression savings.
4.2. Extended CPE Methods
An extension of Color CPE method is proposed in [
31,
47] to better alter the color distribution. The principal idea is to process each color component independently. The Extended CPE methods can be implemented using the same steps as described in
Section 4.1. The main difference between the Color CPE and Extended CPE methods lies in the encryption keys. In Color CPE methods, the same keys are used to encrypt the color components of the image, such as
where
and
. However, in the Extended CPE methods, the encryption keys used in each color component are different, such as
where
. Because of this independent processing, the spatial information in each color channel is modified differently, as shown in
Figure 5.
In addition, the JPEG compression can be carried out in the color mode as the input was represented as a color image. However, because of the independent color component, the process of the compression of the cipher image should be carried out in a lossless mode, such as in RGB colorspace and without chroma subsampling.
4.3. IIB–CPE Methods
An IIB–CPE scheme is proposed in [
34,
49,
50] to expand the keyspace of Color CPE methods. The core idea is to perform sub-block processing. A cipher image can be generated as illustrated in
Figure 6, and the procedure is described below:
An input color image , whose dimensions are specified by rows, columns, and components, is represented as a true color image in the RGB colorspace.
Divide the image into blocks.
Perform inside-out transformation on each block. It is carried out in two steps: First, each block is divided into sub-blocks, and then, each sub-block orientation is changed. For example, a block can be divided into sub-blocks, where , and each sub-block has pixels. Change the sub-block orientations in a given block by a composite function of rotation and inversion transformations by using a random key .
Shuffle the whole block position in the image using a randomly generated secret key .
Change the pixel values by applying a negative–positive transformation function to each pixel in a block randomly chosen using a random key as in Equation (7).
Shuffle the color components of each block using key . Each element of the represents a unique permutation of the color channels.
The final step is to JPEG compress the cipher image obtained in the previous step. Because the input was represented as a color image in the RGB colorspace (Step 1), the JPEG compression can be carried out in the color mode.
4.4. PGS–CPE Methods
A PGS–CPE scheme is proposed in [
32,
33,
48] to deal with format compatibility and chroma-subsampling issues in color-based CPE methods. The principal idea is to represent the input color image in a pseudo-grayscale form in order to benefit from the allowable smallest block size in the JPEG standard for better encryption efficiency. A cipher image can be generated as illustrated in
Figure 7, and the procedure is described below:
An input color image
in the RGB colorspace, whose dimensions are specified by
rows,
columns, and
components
, is converted into YCbCr colorspace. The three components
and
are concatenated either in a horizontal direction to form an image
or a vertical direction to form an image
, as shown in
Figure 8. However, for the color-subsampling function (for example, a ratio of 4:2:0), the chroma components are downsampled as
and
. The three components
and
are concatenated either in a horizontal direction to form an image
or a vertical direction to form an image
. Here, we assumed that the input image
is represented in pseudo-grayscale form without the chroma subsampling as
.
Divide the image into blocks where and , and each block has pixels.
Shuffle the block positions in the image using a secret key generated randomly.
Change the block orientations in the shuffled image by a composite function of rotation and inversion transformations. This transformation is controlled by a randomly generated key .
Change the pixel values by applying a negative–positive transformation function to each pixel in a block chosen using a random key as in Equation (7).
The final step is to JPEG compress the cipher image obtained in the previous step. Because the input was represented as a grayscale image, the JPEG compression can be carried out in the grayscale mode by using either the luminance or chrominance standard table in the quantization step.
4.5. Extension to Grayscale Image Processing
Besides color image encryption and compression, the CPE methods presented above can also be used with grayscale images. A grayscale image consists of only one component as opposed to a color image which has three components. The CPE methods consist of the following two steps for grayscale image encryption and compression:
The CPE methods perform geometric transformations to change block positions (block permutations) and orientations (block rotations and inversions), and intensity transformation (negative–positive transformation) to alter pixel values.
The final step is to compress the cipher image using the JPEG image standard in the grayscale mode either using the standard luminance or chrominance quantization tables.
For the grayscale input, the image representation step is omitted (Step 1 in
Section 4) and the PE methods can be classified as methods that transform an entire block (GS–CPE) and methods that incorporate sub-block processing (GS–IIB–CPE). The methods Color CPE, Extended CPE, and PGS–CPE are of class GS–CPE and IIB–CPE is of class GS–IIB–CPE. The following subsections provide an overview of these methods.
4.5.1. GS–CPE
A cipher image can be generated by following the procedure described below:
Divide the grayscale image into blocks where and , and each block has pixels.
Shuffle the block positions in the image using a secret key generated randomly.
Change the block orientations in the shuffled image by a composite function of rotation and inversion transformations. This transformation is controlled by a randomly generated key .
Change the pixel values by applying a negative–positive transformation function to each pixel in a block randomly chosen using a random key as in Equation (7).
The final step is to JPEG compress the cipher image obtained in the previous step. Because the input image is a grayscale image, the JPEG compression is carried out in the grayscale mode with either of the standard quantization tables.
4.5.2. GS–IIB–CPE
A cipher image can be generated by following the procedure described below:
Divide the grayscale image into blocks where and , and each block has pixels.
Perform inside-out transformation on each block. Divide each block into sub-blocks and then change the orientation of each sub-block. For example, a block can be divided into sub-blocks where and each sub-block has pixels. Change the sub-block orientations in a given block by a composite function of rotation and inversion transformations with a random key .
Shuffle the whole block position using a secret key generated randomly.
Change the pixel values by applying a negative–positive transformation function to each pixel in a block randomly chosen by using a random key as in Equation (7).
The final step is to JPEG compress the cipher image obtained in the previous step. Because the input image is a grayscale image, the JPEG compression is carried out in the grayscale mode with either of the standard quantization tables.
4.6. CPE Encryption Level
For multimedia applications where the security requirement is flexible, the encryption level of the CPE schemes described in
Section 4.1,
Section 4.2,
Section 4.3,
Section 4.4 and
Section 4.5 can be adjusted accordingly. This can be achieved by performing the CPE steps on selected blocks. For example, to preserve the global contents of the plain image during encryption, the block permutations can be applied selectively to certain blocks of the image. Similarly, the composite function of rotation and inversion, negative–positive transformation function, and color-channel shuffling function can be set as identity functions for the selected blocks to preserve the local contents of the image on a block level.