1. Introduction
As an interdisciplinary research field, information hiding (IH) is generally modeled as a covert communication problem involving three participants:
Alice,
Bob and
Eve [
1]. Alice plays the role of the data sender. Bob plays the role of the data receiver. However, Eve serves as the attacker. Given a digital media object such as digital video and image, Alice first embeds a secret message into the digital media object (also called
cover) by modifying the content of the cover without introducing noticeable artifacts. The resulting object concealing the secret message will be sent to Bob via an insecure channel that will be monitored by Eve. Eve will attempt to detect the existence of the secret message within the conveyed object or alter the conveyed object to remove the possibly embedded information. Once Bob receives the probably altered object containing hidden information, he will try to extract the secret message from the received object. Thus, IH is successfully realized if the secret message can be extracted without error. Otherwise, it is deemed failed. Compared with cryptography that may leave noticeable marks on the encrypted data, IH even conceals the existence of the present communication activity, which has good potential in applications and will become increasingly important in modern information security [
2,
3].
For IH, one of the most important requirements is that the difference between the cover media and the media containing secret information caused by the data embedding operation should be low so that any adversary will not notice the existence of the secret message within the embedded object [
4]. Along this line, many IH algorithms have been introduced in the past two decades [
5,
6,
7,
8]. Given a secret payload to be embedded, most IH algorithms either embed the secret payload by minimizing a well-designed distortion function or embed the secret payload by preserving the selective model of the cover source [
9]. On the other hand, for a pre-specified distortion, we expect to embed as many secret bits as possible. This is typically referred to as the payload-distortion problem. Nevertheless, these algorithms inevitably distort the original cover content. In other words, though the secret message can be successfully extracted without any error, the original cover media cannot be perfectly rebuilt from the object containing the secret message, which is not applicable to sensitive application scenarios that require no degradation of the original cover object [
10]. This has motivated researchers to study reversible information hiding (RIH) [
11] or say reversible data hiding (RDH) [
12], reversible watermarking [
13] to ensure that both the secret message and the original cover media can be reconstructed at the receiver side. Many RDH algorithms can be found in the literature such as [
13,
14,
15,
16,
17,
18,
19,
20].
RDH can be applied to various cover sources. For example, due to the popularity over social networks and the ease of handling, digital imagery is still one of the most popular sources for RDH. Other cover sources such as video sequences [
21], speech signals [
22] and texts [
23] are also of increasing interest to researchers recently due to the fast development of multimedia technologies and social networking services. From the viewpoint of the data embedding mechanism, most advanced RDH algorithms use the so-called prediction errors (PEs) of the elements in the cover to be embedded to realize RDH. There are two main reasons [
14]. First, the PEs are noise-like, meaning that modifying them will not produce obvious artifacts of the cover since the prediction process significantly reduces the impact caused by the cover content. Second, the PEs are collected to construct a prediction error histogram (PEH), which is a pooled vector that can be easily handled for RDH based on histogram shifting (HS) [
12] or its variants [
13,
24,
25,
26,
27]. In terms of data embedding positions, mainstream RDH works could be roughly divided into spatial domain and transform domain. The former uses the values of spatial pixels to carry secret bits, whereas the latter often uses the transformed values as the cover elements such as discrete cosine transform (DCT) coefficients to carry secret bits. In brief summary, regardless of the data embedding domain, exploiting PEs for RDH is efficient.
From the viewpoint of system design, many existing PE-based works mainly consist of five steps, i.e.,
content prediction,
content selection,
data embedding,
data extraction and
cover reconstruction [
14]. Taking a gray-scale image as the cover for example, content prediction aims to predict the pixels and obtain the PEs. For content selection, its target is to sort the collected PEs by a local complexity function so that smooth pixels are embedded preferentially since smooth pixels have a small PE that can result in superior payload-distortion performance. Thereafter, the aforementioned HS or its variants can be applied to the sorted PE sequence to embed secret bits. Once the secret bits are embedded, the resulting new image (or
marked image) will be sent to the receiver, who will perform secret data extraction and cover image reconstruction. Extracting the embedded secret data and recovering the original image can be roughly viewed as an inverse process of the data hider. Therefore, it is straightforward to draw out that, in order to provide superior payload-distortion performance, the content prediction, content selection and data embedding procedures can be optimized. For content prediction, it is required for us to design a predictor that can accurately estimate the pixels to be embedded. Since it is hard to model all natural images, even a well-designed predictor cannot predict all pixels accurately, e.g., many existing RDH algorithms use a fixed predictor for content prediction, which does not take into account the statistical characteristics of local context. Therefore, a trade-off strategy is to further select the pixels with a small PE out for data embedding preferentially, which is referred to as content selection. However, during the process of constructing an ordered sequence of PEs, many existing algorithms ignore the fact that adjacent PEs may have the identical priority of data embedding. As a result, there is still room for improving the payload-distortion performance. In addition, for data embedding, once the operation is pre-specified, we should further optimize the data embedding parameters so that the distortion will be low for a given payload.
Motivated by the above analysis, in this article, we are to study the optimization of content prediction and content selection so that the payload-distortion performance can be further improved. Meanwhile, since we use the PEs of image pixels to carry secret data, the optimized HS operation is used. In the proposed work, the pixels are adaptively predicted according to their local context, leading to a sharp prediction error histogram to be embedded. Furthermore, the main idea of content selection is to construct a graph containing multiple connected components for the cover image so that the PEs of the pixels within a connected component are close to each other. Accordingly, the sorted connected components can be used for carrying additional information. Experimental results have demonstrated that the proposed method outperforms a part of advanced works, displaying superiority and applicability of the proposed method.
The remainder of this article is organized as follows. First, we detail the proposed method in
Section 2. Experiments and analysis are then provided in
Section 3. Finally, we conclude this work and provide discussion in
Section 4.
2. Proposed Method
In this section, we first describe the general framework of the proposed method. Then, we will introduce each important part in detail. Before a detailed introduction, we list all the important symbols used in this section and their meaning in
Table 1.
2.1. General Framework
Throughout this paper, gray-scale images are used to act as the cover image.
Figure 1 shows the general technical framework for the proposed algorithm. We describe it as follows. We first pre-process the given cover image adjusting all pixel values into the usable range to avoid the pixel overflow and pixel underflow problem during pixel modification. The modified pixels should be recorded to construct a so-called location map which will be self-embedded into the cover image together with the secret data. Moreover, we self-embed some embedding parameters into the cover image so that the receiver has the ability to extract secret data and recover the original image.
After pre-processing, we predict the cover pixels to be embedded based on the local context, by which we can generate a set of PEs that will be used for carrying secret bits. In order to provide good payload-distortion performance, we construct a graph, whose nodes correspond to the pixels to be embedded and edges represent the adjacent relationship between pixels. We sort the PEs by determining the connected components of the graph and a local complexity function. Therefore, by applying optimized HS, we embed the secret data into the sorted PEs. For the data receiver, they first extract the embedding parameters from the marked image and then perform an inverse process of the data hider to reconstruct the embedded information and the original cover content without error. In this way, efficient RDH can be realized. Below, we provide the details.
2.2. Pre-Processing and Pixel Prediction
Let
denote a gray-scale image whose size is denoted by
, where each pixel
. Our mission is to embed a secret binary stream
into
to generate a marked image
such that the distortion between
and
is low. In addition, both
and
can be reconstructed from
without any error. Mathematically, we have
and
where
represents the secret key shared between the hider and the receiver.
The proposed method modifies the spatial pixels to embed secret data. In order to avoid the above-mentioned pixel underflow and pixel overflow problem, the values of boundary pixels to be embedded should be adjusted into the usable range. Since each pixel is modified by
during data embedding, a boundary pixel always has a value of 0 or 255. Therefore, we need to modify each pixel with a value of 0 or 255 as a pixel with a value of 1 or 254 to avoid the pixel underflow and pixel overflow problem. To ensure reversibility, the positions and original values of these boundary pixels are recorded to construct a location map which will be compressed by an efficient lossless compression technique as a binary stream to act as the side information that will be self-embedded into the cover image. This strategy has been applied by many RDH algorithms such as [
13,
16,
17]. The losslessly compressed location map can be considered as a part of the secret message. Since the number of boundary pixels in a natural image is very small, the size of the losslessly compressed location map is also very small. This indicates that the impact of the losslessly compressed location map on the pure embedding payload can be ignored. In addition, the data embedding parameters should be self-embedded into the cover image in advance so that the data receiver is capable of extracting the secret data and reconstructing the original image. This can be achieved by using the least significant bits (LSBs) of some specified pixels to store the parameters. These specified pixels should be unchanged in the subsequent process. The original LSBs of these pixels should be recorded and embedded to ensure reversibility [
14].
Given the cover image
x, we divide the pixels to be embedded in
x into two disjoint sets
and
. The pixels in
will be used to predict the pixels in
. The pixels in
will be used for embedding secret data. Since the data embedding operation is reversible, once
has been embedded, we can further use the (modified) pixels in
to predict the pixels in
. Thus, the pixels in
can be used thereafter for data embedding. Without the loss of generalization, in the following, we will use
for pixel prediction and
for data embedding unless otherwise specified. It is noted that when we use
to predict
,
should be unchanged during the process of embedding secret data into
, and vice versa. We are free to determine
and
. In this paper,
and
are determined as:
Many existing RDH algorithms use a fixed predictor for pixel prediction, which does not take into account the difference between different local contexts and therefore may not accurately predict the pixels. In this paper, we propose a content-adaptive strategy for pixel prediction. Suppose that the pixels in
are used to predict the pixels in
, for each pixel
, as shown in
Figure 2, we determine three candidate prediction values by:
where
returns the largest integer that is no more than
x. The operation of “+0.5” is to adjust the prediction value to the corresponding nearest integer. Taking Equation (
4) for example, if
, then
will be equal to 11, rather than 10. The final prediction value of
, denoted by
, is selected from the candidate-set
according to the local context of
which is shown in
Figure 3. To this end, we determine a
priority for each element in
. In detail, let
denote the priority of
. We determine
by the following equation:
where
,
if
, and 0 otherwise. The feasibility of Equation (7) relies on the fact that adjacent pixels in natural images tend to have similar statistical characteristics, which inspires us to use the difference between the prediction values of adjacent pixels to determine which pixels can be embedded preferentially. In this way,
can be finally determined by:
where
. Thus, the PE of
is determined by:
which will be used for carrying secret data in the subsequent process. It can be concluded that the final prediction value of a pixel is actually selected from multiple candidate values, resulting in that the final PE value of the pixel is essentially selected from multiple candidate PE values as well. It can be said that, as an effective PE adjustment strategy, the proposed method for determining PE is more applicable in practice compared with many existing methods that use a fixed predictor. This is why we term the proposed pixel prediction method as
prediction error adjustment.
2.3. Connected Component Construction
After pre-processing and pixel prediction, we need to generate an ordered PE sequence that will be used for data embedding. We propose a novel method to construct the PE sequence. Suppose that
is used for pixel prediction and
is used for data embedding, our goal is to sort the pixels in
so that a pixel sequence and the corresponding PE sequence can be determined. We achieve this goal by applying connected component construction. Clearly, for any two different pixels
and
, they are adjacent to each other if we have
where
is a small integer threshold that needs to be pre-determined. If we model every pixel in
as a graph node and the adjacent relationship between two pixels as a graph edge connecting the corresponding two graph nodes, we are able to construct a graph. Without loss of generality, let
be the constructed graph, where
denotes the node set and
represents the edge set. Clearly,
G is a non-directed graph, which means that two edges
and
are equivalent to each other.
Figure 4 provides an example for constructing
, from which it is easily inferred that
G may contain multiple connected components. A connected component of
G is defined as such a sub-graph
that for any two different nodes
and
, there is at least one
path between
and
. The detailed pseudo-code to collect all the connected components of
G is shown in Algorithm 1, which is based on the classical graph search technique called depth-first search (DFS). It can be inferred from Algorithm 1 that the computational complexity of collecting all the connected components given
G is
, which is very efficient. Clearly, the number of connected components and the number of nodes of a connected component are both affected by
. Specifically, a larger
allows more pixels to be adjacent to each other in the graph, resulting in that the number of nodes in a connected component becomes larger, but the number of connected components becomes smaller. A smaller
makes the adjacent condition become more strict. As a result, the number of nodes in a connected component becomes smaller, but the number of connected components becomes larger.
Algorithm 1: The pseudo-code to collect all the connected components. |
Input:: a non-directed graph. |
Output:t: the number of connected components, : all the connected components.
- 1:
Initialize - 2:
for each node do - 3:
if v has been previously processed then - 4:
Continue - 5:
end if - 6:
Set - 7:
Initialize by and - 8:
Call DFS(v, , ) - 9:
end for - 10:
Sub-procedure DFS(v, , ) - 11:
Mark v as processed - 12:
for each do - 13:
if has been previously processed then - 14:
Continue - 15:
end if - 16:
Update and - 17:
DFS(, , ) - 18:
end for - 19:
End sub-procedure - 20:
returnt, , , …,
|
By setting a small
, the prediction values of most pixels in the same connected component will be close to each other. Based on this, it is reasonable to further assume that the original values of most pixels in the same connected component are close to each other. This indicates that the PEs of most pixels within a connected component are close to each other. Therefore, in order to generate an order PE sequence, it is natural to treat the pixels in the same connected component as equally important. For two different connected components, the one containing more nodes (i.e., pixels) has a higher priority for data embedding since the connected component containing more nodes is likely to have more smooth pixels which is more helpful for data embedding [
13,
14].
Based on the above analysis, we determine all the connected components of
, denoted by
, where
t is the total number of connected components. It is required that
and
We sort the connected components according to the number of nodes of a graph. To this end, we determine a permutation of
as
so that
where the
-th connected component has the maximum number of nodes, whereas the
-th connected component has the minimum number of nodes. It is noted that
can be easily determined by sorting
, whose computational complexity is
. In order to generate the ordered PE sequence, we process each of the sorted connected components in an orderly manner. First of all, we initialize the PE sequence as an empty sequence. Then, for each
, we sort all the pixels corresponding to
according to the local complexity function defined in [
13]. After sorting the pixels, we orderly append the corresponding PEs to the end of the above PE sequence. By processing all connected components, we can finally generate an ordered PE sequence. Clearly, during the construction of the PE sequence, for each PE in the sequence, we can easily identify the position of the corresponding pixel, which enables us to modify the corresponding pixel value in the subsequent data embedding procedure. It is worth mentioning that one may not use the local complexity function defined in [
13] for sorting the PEs of a connected component. It is free to define other local complexity functions to order the PEs. In summary, sorting the PE sequence in this paper requires us to build a graph and find all connected components. Therefore, we term this process as
connected component construction.
2.4. Data Embedding
We are now able to embed secret data into the sorted PE sequence by applying HS. Mathematically, we express the sorted PE sequence to be embedded as
. Two pairs of peak-zero bins, denoted by
and
, where
, are used to embed the secret data
into
by applying HS as mentioned above. Here, the peak bins
and
are used to carry secret bits. The bins in range
will be shifted to ensure reversibility. The remaining bins will be unchanged. For a given bit
to be embedded and a PE
, the PE carrying secret information
(also called marked PE) is determined by [
28]:
The sum of
and the prediction value of the corresponding pixel will be used as the final value of the marked pixel. We terminate the procedure of embedding secret bits when the secret data
is entirely embedded. In other words, there must be a PE position
where all the PEs
are unchanged. It is necessary to optimize the two pairs of peak-zero bins so that the distortion introduced by embedding
into
can be kept low. To this end, we apply the optimized method introduced in [
16] for determining the near-optimal
and
. It is highlighted that one may exhaust all possible
and
and find the optimal solution, given sufficient computational resources. Nevertheless, suppose that we have found
and
, as mentioned previously, we should self-embed
and
, as the data embedding parameters, into the cover image so that the data receiver can extract them before extracting secret data and recovering the cover image. In addition, the parameter
in Equation (
10) should be self-embedded as well.
2.5. Data Extraction and Image Recovery
By extracting the data embedding parameters from the marked image, the data receiver can successfully extract secret data from the marked image and meanwhile recover the original cover image. First of all, the receiver performs pixel prediction and connected component construction in the same way as the data hider, by which a sorted PE sequence carrying the secret information can be obtained. Then, with the data embedding parameters, the secret data can be fully extracted by processing the marked PEs in an orderly manner. In this way, the original secret information and the side information can be retrieved. With the side information, the cover image can be reconstructed without error since the embedding operation is reversible.
2.6. Effectiveness and Complexity Analysis
The technical motivation behind many existing RDH algorithms is that embedding secret bits into smoother pixels will result in better payload-distortion performance. This is based on the fact that smoother pixels are likely to be predicted with a higher prediction accuracy based on their local context. As a result, the prediction error histogram follows a Gaussian-like distribution centered at zero, which is very helpful for data embedding. In this paper, instead of directly exploiting smoother pixels for data embedding, we propose a connected-component-based method to collect the pixels with close PEs for data embedding preferentially. Though the PE values of the pixels in a connected component may not be closer to zero compared with many existing works, the resultant prediction error histogram is still Gaussian-like distributed, meaning that by optimizing the embedding parameters, superior payload-distortion performance can be achieved. In other words, the proposed method has better applicability and generalization ability.
The main contributions of the proposed method include two aspects. One is optimization of the pixel predictor. Unlike many existing methods which use a fixed pixel predictor, the proposed method predicts a pixel in such a way that the final prediction value of a pixel is adaptively adjusted to the most suitable value according to the local context. As a result, the prediction can be more accurate. Since only three prediction modes are used in the proposed method and the local context of a pixel only consists of four neighboring pixels, the complexity to determine the final prediction value for a single pixel is very low. In other words, the overall complexity to determine all the prediction values is linearly proportional to the number of pixels to be predicted, which is very suitable for practice. On the other hand, as mentioned previously, given the graph , the procedure of constructing all the connected components requires a complexity of , which is linearly proportional to the size of the node set and the size of the edge set. By using a small , can be reduced to , where k is a small coefficient. In other words, the complexity to determine all the connected components is also linearly proportional to the number of pixels to be predicted. Therefore, based on the above analysis, it can be concluded that the time complexity of the proposed method is low.
3. Performance Evaluation and Analysis
In this section, we conduct experiments for evaluating the performance of the proposed method. To this end, we take six standard test images
Airplane,
Lena,
Tiffany,
Peppers,
Baboon, and
Sailboat shown in
Figure 5 varying from smooth to complex for simulation. The test images are all sized at
. Furthermore, all values of the pixels are in the range
. As described in the previous section, the proposed method divides the cover pixels into two disjoint subsets
and
. Both subsets can be used for data embedding. That is, after data embedding with
,
can be used for data embedding as well. Therefore, given the secret data
(in the form of binary stream), we can use
to carry
secret bits. The remaining secret bits can be embedded into
. This type of payload partition strategy has been used in existing methods [
13,
16].
In order to demonstrate the superiority of the proposed method, we first show some visual examples for the proposed method.
Figure 6 shows the marked images of the test images with an embedding rate of 10,000 bits and 20,000 bits. It can be seen that the proposed method does not introduce noticeable visual artifacts. The reason lies in that the proposed method either keeps the pixels unchanged or modifies the pixels by
, which will not introduce significant distortion to the cover image and therefore provides very good visual quality of the marked images. To quantitatively evaluate the payload-distortion performance of the proposed method, we use peak signal-to-noise ratio (PSNR, in dB) to measure the visual quality of the marked image, i.e.,
where
and
represent the original value and the marked value of the pixel at position
. It can be seen that PSNR evaluates the difference between the original cover image and the corresponding marked image. A higher PSNR indicates that the difference between the original cover image and the corresponding marked image is lower, accordingly indicating that the marked image is visually better. When the size of the payload is specified, we expect to achieve as high a PSNR value as possible.
We use a threshold
to control the number of connected components. Since the construction of the sorted PE sequence is dependent on the constructed connected components, we need to analyze the impact of the threshold
on the payload-distortion performance. To this end, we take two representative test images
Lena (smooth image) and
Baboon (complex image) for necessary analysis.
Figure 7 shows the payload-distortion performance for the image
Lena and the image
Baboon due to different
. The abscissa represents the embedding rate, i.e., the size of the embedded payload, in bits. The ordinate represents the PSNR value. It can be inferred from
Figure 7 that different images have different payload-distortion performance and different
also result in different payload-distortion performance. For the smooth image
Lena, a smaller
is superior to a larger
at a low embedding rate, which is due to the reason that a smoother image enables us to collect more smooth pixels for data embedding by setting a small
. In contrast, for the complex image
Baboon, a relatively smaller
is not a good choice for data embedding since the difference between adjacent pixels in
Baboon is significantly larger than that in
Lena. As a result, using a relatively larger
is a better strategy at a low embedding rate so that the number of relatively smooth pixels in a connected component can be increased, thereby benefiting data embedding. From the overall trend, as the embedding rate increases, the performance difference due to different
can be controlled within a small range. This indicates that, from the viewpoint of payload-distortion optimization, we can optimize
in a small range so that the payload-distortion performance is superior and the computational complexity is acceptable for practice. To this end, in the following experiments, for all test images, we limit
to the range
and use the integer resulting in the highest PSNR (for a fixed embedding rate) as the final value of
. It is worth mentioning that one may use a very large
for RDH. However, this will reduce the performance, as shown in
Figure 8. The reason is that, a larger
will relax the constraint on the adjacent relationship between pixels, which can be easily inferred from Equation (
10). For example, in the extreme case that
, all the pixels to be embedded will be in the same connected component. In other words, there is only one connected component for
and the only one connected component is the graph itself. In this case, the proposed method degenerates to the traditional method.
Figure 8 further shows the payload-distortion performance for a large
, from which we can infer that it is desirable to use a small
. This is why we optimize
in range
in this paper, which not only provides good payload-distortion performance but also keeps the computational complexity low.
The above analysis shows that the proposed method has the potential to provide superior payload-distortion performance. To evaluate the payload-distortion performance of the proposed method, we compare the proposed method with some advanced PE-based RDH methods, i.e., DSP [
14], PPE [
14], GP [
29], SP [
13], MPE [
15] and BFS [
28]. Both DSP and PPE are introduced in [
14]. It is fair to compare the proposed method with these related works since they all focus on either improving the prediction accuracy or improving the pixel sorting procedure. For comparison, in our experiments, we use a randomly generated binary stream to represent the secret data to be embedded.
Figure 9 shows the payload-distortion performance for the proposed method and the related works. It can be seen that different images have different payload-distortion performance, which is reasonable because different images have different statistical characteristics. As the embedding rate increases, the PSNR will decline, which is also reasonable since a larger embedding rate means that more modifications to the pixels will be performed, thereby resulting in a larger distortion between the marked image and the original image. It can be inferred from
Figure 9 that the proposed method significantly outperforms the related works in terms of payload-distortion performance for all test images. We explain the reason as follows. DSP [
14] uses a dynamic predictor for pixel prediction, which is efficient for smooth images but not suitable for complex images. Moreover, since the prediction mode of one pixel is affected by the prediction mode of the adjacent pixels, the prediction procedure for DSP is time-consuming. PPE [
14] exploits the PE of a PE for data embedding, which is efficient for complex images. However, since the predictor is fixed for all pixels to be embedded, there is large room for further performance improvement. GP [
29] uses a gradient-based prediction strategy for pixel prediction, which is efficient for RDH. However, processing the pixels from top to bottom and from left to right may result in many pixels not carrying secret bits being modified, accordingly introducing a high distortion of the marked image. Though SP [
13] sorts the pixels to be embedded according to a well-designed local complexity function, the used data embedding parameters are not optimal for low embedding rates. Thus, it will introduce large distortion at low embedding rates. Similarly, for MPE [
15], the data embedding parameters need to be optimized so as to provide better payload-distortion performance. In BFS [
28], the prediction value of a pixel is determined as that of the adjacent pixel, which is not suitable for complex images. Moreover, the BFS method is time-consuming since it needs to optimize many parameters. Therefore, based on our experiments and analysis, it is true that the proposed method achieves better trade-off between embedding payload and embedding distortion compared with related works.