1. Introduction
Interest in information security technologies, such as image steganography/steganalysis, has significantly grown because of the universalization of digital multimedia and communication. Image steganography is a technique in which a secret message is embedded into an image, called cover image, and the message-embedded image, called stego image, is transmitted through a public channel without gaining the attention of a third party, thereby implementing covert communication. The image steganalysis is the reverse process of image steganography, which aims to determine whether or not the image to be tested contains a secret message and then finds out the hidden message.
The performance of image steganographic methods depends on two conflicting parameters: embedding capacity, which represents how many messages we can hide, and the image quality after embedding, which is closely related to message concealment. Therefore, most image steganographic methods have achieved a high embedding capacity at the expense of low image quality after embedding, and vice versa.
Early image steganographic methods include the least significant bit (LSB) substitution method [
1], which replaces the least significant bits of image pixels by secret messages, and the pixel value differencing (PVD) methods [
2,
3,
4] that determine the amount of secret messages to be embedded in proportion to the difference between adjacent pixels. These early image steganographic methods sequentially embed secret messages into all pixels of an image, although they have been recently extended to embed messages in randomly selected pixels using pseudo-random generators for secure message hiding [
5,
6].
Sequentially embedding secret messages into all pixels of an image is well known to change the statistical characteristics of the image. In
Figure 1, the solid line refers to a probability density function (PDF) of the differences between two adjacent pixels on a cover image. The dot lines refer to the different PDFs on the stego images created by different image steganographic methods. The PDF of the LSB stego image is significantly different from that of the cover image in the section where the differences are small. This statistical difference is easily detected by statistical attacks, such as the RS analysis in [
7]. Thus, image steganographic methods have come to consider more how not to be detected by steganalytic attacks than how many messages to embed. To avoid statistical attacks, image steganographic methods began to consider where the message would be embedded. Methods such as HUGO [
8], WOW [
9], and UNIWARD [
10] tried to embed a message into only pixels with a small distortion, mainly on image edges, by analyzing the distortion caused by embedding a message into each pixel. For example, HUGO measured the embedding distortion by reverse-engineering the processes of the subtractive pixel adjacency matrix (SPAM) [
11], a steganalytic method that calculated a co-occurrence matrix for the differences of the adjacent pixels in eight directions of vertical, horizontal, and diagonal to analyze the statistical changes in the pixel values caused by the message embedding. HUGO could reduce the probability of being detected by the SPAM by 1/7.
The performance of image steganalysis in detecting image steganography has greatly improved with the development of image steganography to more covertly and skillfully hide a message. Image staganalytic methods generally try to extract traces of image steganography in the image by using high-pass filters (HPF) and identify images to which image steganography has been applied through classification. Early steganalytic methods extracted image features using manually designed HPFs (those features are called handcrafted features hereafter) and detected image steganography using classifiers based on machine learning algorithms, such as support vector machines (SVM) [
12] and random forest [
13]. A representative method using handcrafted features is the spatial rich model (SRM) [
14].
With the great success of convolutional neural networks (CNN) in object detection and recognition [
15,
16], using CNNs for steganalysis has been actively investigated [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27]. Unlike handcrafted feature-based methods, a CNN can automatically extract and learn the features that are optimal or well suited for identifying steganographic methods. Therefore, CNN-based steganalytic methods have demonstrated a better performance compared to handcrafted feature-based methods.
However, most existing image steganalytic methods, regardless of whether or not CNNs are used, have focused on identifying whether or not a secret message is hidden in an image (i.e., the binary classification between a normal (or cover) image in which any message has not been embedded and a stego image in which a message has been embedded). Discriminating stego images created by different steganographic methods has been less considered; thus, the binary classifiers are not suitable for discriminating these stego images. Discriminating the stego images created by WOW and UNIWARD that embed a message in a similar and skillful manner is very difficult.
The classification of stego images created by different steganographic methods plays an important role in restoring embedded messages beyond judging whether or not a message is embedded. In this study, as the first step to restore messages embedded by steganographic methods, a CNN-based steganalytic method is proposed to classify the stego images created by different steganographic methods. The structure of a ternary classifier is specially designed to distinguish between the stego images created by WOW and UNIWARD and the normal images without messages. Through comparative experiments with the existing binary classifiers, the reason why multiple steganographic methods should be classified in a single ternary classifier, and various methods for improving the performance of the proposed ternary classifier are presented.
Compared to existing image steganalytic methods, the primary contributions of this study are as follows:
- –
a single framework is provided for identifying multiple steganographic methods;
- –
a CNN-based ternary classifier is proposed for image steganalysis; and
- –
effective methods for extending a CNN to discriminate similar WOW and UNIWARD stego images are proposed and evaluated.
This study is an extension of [
28] and differs from the previous study in the following respect:
- –
a CNN-based ternary classifier with a new preprocessing filter is proposed;
- –
more details for designing it are provided; and
- –
the performance of the proposed classifier is intensively evaluated.
The remainder of this paper is organized as follows:
Section 2 briefly reviews the conventional image steganographic and steganalytic methods;
Section 3 explains the proposed steganalytic method;
Section 4 experimentally evaluates its performance using images from a database available online; and
Section 5 presents the conclusions and suggestions for future work.
4. Experimental Results and Discussion
All the experiments presented in the previous sections and in this section were conducted with the following conditions: 10,000 gray scale images of 512 × 512 in BOSSBase 1.01 [
30] were quartered, and the resulting 40,000 images were divided into the training and testing sets, each comprising 30,000 and 10,000 images, respectively. The stego images for both sets were generated with a random payload of
(In most steganalytic studies, 0.1, 0.2, and 0.4
have been used for testing steganalytic methods. However, when using adaptive steganographic methods, 0.1 and 0.2
are too small to identify the stego images, even in binary classification [
31]. The average PSNRs of the WOW and UNIWARD stego images of 0.4
are 58.76 and 59.36 dB, respectively; thus, the image quality of the stego images of 0.4
is still very high.) using WOW and UNIWARD. As a result, 90,000 (30,000 for cover, WOW stego, and UNIWARD stego images each) training images of 256 × 256 and 30,000 (10,000 for cover, WOW stego, and UNIWARD stego images each) testing images were used. For training, a momentum optimizer [
32] with a momentum value of 0.9 was used. The learning rate started at 0.001 and decreased to 90% in every 5000 iterations. The minibatch size was 64 (32 pairs of cover and stego images). The other hyperparameters were set the same as in the conventional method [
17]. All CNNs were implemented using the TensorFlow library [
33].
The proposed classifier was evaluated with different preprocessing filters. As a new preprocessing filter set, 16 Gabor filters were used together with the 10 selected SRM filters, as has been done in [
19]. The results in
Table 9 are the classification rates for the cover, WOW stego, and UNIWARD stego images obtained using different preprocessing filters.
Unlike the base CNN, using more filters and feature maps increased the classification rates; however, utilizing too many and different types of filters was not good. The results of the 10 selected SRM filters (i.e., the proposed one) were the best. The experimental results demonstrated that the cover, WOW stego, and UNIWARD stego could be classified with an accuracy of approximately 72% through the single CNN-based ternary classifier proposed herein.
We also attempted to change the tanh functions of the first two convolutional layers to TLU functions, as has been done in [
20], and the ReLU functions of the subsequent convolutional layers to leaky ReLU functions, but the classification rates were not good (
Table 10).
5. Conclusions and Future Works
This study proposed a CNN-based ternary classifier to identify cover, WOW stego, and UNIWARD stego images. The existing binary classifiers were designed to learn and detect a specific steganographic method; hence, they were not suitable for discriminating different steganographic methods. Adaptive steganographic methods, such as WOW and UNIWARD, embed a small amount of the secret message in a similar manner; therefore, discriminating their stego images using the existing binary classifiers or combining them was very difficult. However, the proposed ternary classifier could effectively learn the difference between both steganographic methods and discriminate them with high accuracy. The classification between different steganographic methods using the proposed ternary classifier was the first step in restoring the embedded message instead of simply determining whether or not a message has been embedded.
It was experimentally confirmed that, in designing a CNN-based ternary classifier for image steganalysis, simply expanding the width or depth of the CNN does not guarantee performance improvements. In other words, the CNN width and depth need experimental optimization. This study demonstrated the results of such an experimental optimization.
The proposed method had an accuracy of approximately 72%, which is not very high. Therefore, ways to improve the accuracy by further highlighting the differences between WOW and UNIWARD must be explored in the future. Ways to design a CNN-based classifier suitable for classifying a larger number (≥3) of steganographic methods, including those with other embedding domains (e.g., DCT and wavelet domains), must also be explored.