1. Introduction
Data are an essential part of different processes, including manufacturing [
1,
2], social communication [
3,
4], economics and finances [
5,
6], medicine [
7] and healthcare [
8], media and education [
9,
10], sport, [
11] and entertainment [
12], etc. Analyzing the current trends, one may emphasize the following features:
an explosive increase in data volumes [
13];
high data protection requirements [
14,
15];
widespread use of machine learning [
16,
17];
fast growth in edge computing [
18,
19].
Storing, processing and transferring big data via networks requires great memory, time, computational, and traffic resources. In order to reduce these expenses, numerous data compression techniques and algorithms are applied [
20]. However, in the most cases, such algorithms do not provide data protection, which makes applying additional encryption methods the must. Various approaches to solving this task have been developed [
21]. One problem is that their use increases computational resources.
Digital images occupy a particular place in big data. They are widely used in remote sensing [
22,
23], environmental monitoring [
24,
25], agriculture development [
26,
27], etc. Modern communication via popular messengers and social networks is impossible without digital photos [
28].
Due to the wide availability of smartphones and various gadgets, the total number of photos is huge [
29]. Moreover, modern sensors provide images of a very high resolution with millions of pixels [
30], which makes their processing and analyzing a challenging task, especially when performing computations on edge devices or by autonomous robots [
31,
32]. Hence, a large size of image data samples increases the load of communication networks. In addition, this feature is the reason for high encryption costs.
A natural solution to the high resource expenses problem could be the application of image compression algorithms that have built-in data encryption features. Discrete atomic compression (DAC) is such an algorithm [
33,
34].
The DAC algorithm is based on the use of a special class of atomic functions introduced by V. Rvachov [
35]. It has low time and spatial complexity [
36,
37]. For this reason, DAC can be considered as a light-weight self-encrypted image compression technique. Similar to many other coders, there are two compression modes for DAC: lossless and lossy. The former ensures an exact reconstruction of compressed data, whilst the later one provides a significantly higher compression ratio at the expense of a certain quality loss [
38]. However, due to designed distortion control mechanism, one can achieve visually lossless compression [
33] or, at least, appropriate losses. Then, the following question naturally arises: what is the impact of quality loss introduced by DAC on the efficiency of subsequent image analysis, and in particular, classification?
Image classification is a particular task in computer vision [
39]. It processes image data and assigns a class label or category to a whole image or its parts. There are many methods that provide solutions to this task, and the use of convolutional neural networks (CNNs) has recently demonstrated a very high efficiency [
40].
CNNs extract image features that ensure classification. When applying lossy compression, distortions are introduced, and some particular features might be lost or considerably deteriorated. The aim of this paper is to study the impact of quality loss, produced by the DAC algorithm, on the performance of MobileNetV2 [
41], VGG16 and VGG19 [
42], ResNet50 [
43], NASNetMobile and NASNetLarge [
44]. These networks can be treated as good, modern and state-of-the-art image classifiers. They have common features in combination with a set of major differences. Further on, we discuss them briefly.
There are several versions of DAC algorithms having certain peculiarities. In this paper, we consider three preprocessing modes of DAC including classic, block-splitting and chroma-subsampling [
37] and explore their impact on the efficiency of the classifiers mentioned above, which constitute the novelty of the research. We show that the minor negative effects on classification performance are usually produced, which means that lossy compression by the DAC algorithm preserves the principal features of a processed image. This is the main contribution.
The paper is organized as follows. First, the DAC algorithm is presented, and its particular properties are discussed. Second, a test image dataset is compressed, and the impact of quality loss on classification accuracy is investigated. Next, the obtained results are discussed, and, finally, the conclusions follow.
2. Discrete Atomic Compression
The DAC algorithm is based on the classic lossy image compression approach, involving discrete data transform, quantization and encoding.
Figure 1 shows its main stages [
33] under the assumption that it is applied to conventional RGB color images or other types of three-channel data represented as RGB images.
The input for DAC is a 24-bit full color image given by a matrix of red (R), green (G) and blue (B) color intensities. The output is a byte array.
At the first step, the preprocessing is applied. It involves applying the YCrCb color space transform that computes three matrices of one luma (Y) and two chroma (Cr, Cb) components [
45]. As said above, there are three modes:
“classic”—the obtained matrices Y, Cr and Cb are moved to the next stage without any changes;
“block-splitting”—the components Y, Cr and Cb are split into blocks of the same size (the common size is 512 × 512);
“chroma-subsampling”—the components Cr and Cb are processed using the 4:2:0 subsampling scheme [
45].
Further, the obtained matrices are processed independently.
Next, a discrete atomic transform (DAT) is applied. DAT is a discrete wavelet transform based on the application of V.A. Rvachev [
35]’s atomic functions, up
s(x). The output of this step is a set of matrices of DAT coefficients. Their number is equal to the channel count of the source image. Each of these matrices consists of blocks that contain DAT coefficients corresponding to specific frequency band. In
Figure 2, a sample structure is shown.
Further, DAT coefficients are quantized. The following formula is applied:
In (1), w and v are, respectively, source and quantized DAT coefficients, and q is a coefficient of quantization; rounding-off is performed to the nearest integer.
It is just the data-quantization step that produces distortions. In DAC, the choice of quantization coefficients is specified by the parameter denoted by upper bound of maximum absolute deviation (UBMAD), which is used in the quality-loss control mechanism [
33]. Varying UBMAD, one may obtain results with the desired distortions measured by maximum absolute deviation (MAD), root mean squared error (RMSE), and/or peak signal-to-noise ratio (PSNR). A larger UBMAD leads to larger MAD and RMSE and smaller PSNR.
Finally, quantized DAT coefficients are compressed in the lossless manner by a combination of Golomb codes and binary arithmetic coding [
20]. This step produces a byte data array with a compressed image.
To provide lossless compression by DAC, in [
34], it has been proposed to apply one extra step which complements byte array, which is obtained previously, with additional information ensuring image reconstruction without distortions. So, DAC has two modes: lossy and lossless. The lossless mode has a slightly higher complexity and provides a lower compression ratio (CR) than the lossy one. In this research, we concentrate on lossy image compression by the aforementioned DAC algorithms.
Besides, in DAC, image encryption is provided by varying a structure of DAT [
38]. There are more than
different structures of this discrete transform, and its exact specification must be known for correct reconstruction of compressed image. This means that data protection is a built-in feature of DAC.
3. Classifying Compressed Images
Lossy compression by DAC produces distortions, and some particular image features might be lost. Our reasonable expectation is that it might decrease the further classification accuracy, when analyzing decompressed images. In this Section, we explore such an impact.
We start with a brief description of the selected neural networks. Then, we perform the following study. A large set of digital images from different content classes is taken. This is carried out to check the generality of observed tendencies and to obtain statistics. Next, each image from this dataset is compressed by DAC with various preprocessing modes and quality loss settings. After this, the obtained data are decompressed. Further, image classifiers are applied to each decompressed image and the corresponding source images, and classification results are compared. Finally, aggregation is performed.
3.1. Models
In this research, we apply the MobileNetV2, VGG16, VGG19, ResNet50, NASNetMobile and NASNetLarge models with pre-trained weights. Their parameters have been fitted using the ImageNet database [
46]. This image collection contains more than 1.2 million training samples of one thousand classes. We use the TensorFlow Keras tools that provides out-of-the-box application of many state-of-the-art neural networks, including the selected ones [
47].
Each of the selected models is a deep CNN. They have different architectures, in particular regarding the number of layers and parameters counted. However, there is a set of common features. We will consider them in more detail.
VGG16 and VGG19 are neural networks constructed using the so-called conv-blocks followed by fully connected layers. Each of these blocks consists of several sequential convolutional layers combined with the pooling one. The VGG16 classifier has 16 layers and 138.4 million parameters. The VGG19 classifier, which is a bigger version of VGG16, has 19 layers and 143.7 million parameters.
Next, the ResNet50 model belongs to the residual networks. They are built using residual blocks that consist of convolutional layers and residual connections, which ensure training of very deep networks. In the current research, we use the model with 107 layers and 25.6 million parameters.
Further, the MobileNetV2 presents MobileNet networks designed for on-device computer vision. MobileNetV2 consists of 55 layers and has 4.3 million parameters. Its building blocks combine the expansion, depthwise and projection operators with residual connections. Such structures, which are called bottlenecks, significantly reduce the number of operations that are of particular importance when deploying a model in edge computing systems.
Finally, NASNetMobile and NASNetLarge are CNNs obtained using neural architecture search (NAS). These models consist of 389 and 533 layers, respectively. The NASNetMobile has 5.3 million parameters, and NASNetLarge has 88.9 million parameters.
So, in this research, we explore the impact of lossy compression by the DAC algorithm on image classifiers that are neural networks with sufficiently different properties. The number of layers and parameters counted belong to the wide ranges. This explains the motivation behind the model’s choice. We stress that we use models with pre-trained parameters fitted on the training set of the ImageNet database.
3.2. Test Data
In the current research, a set of 131 images from the ImageNet database [
46] are used. Samples are taken from 10 classes (see some examples in
Figure 3). Source images are in the JPEG format. The total size is 11.3 MB (JPEG) and 94 MB (raw). The number of image pixels varies from 43,200 to 1,707,200.
In this research, we use open-source digital images with military content. This choice is due to the necessity for developing and applying autonomous systems for the detection and identification of damaged or destroyed military equipment [
31]. The use of unmanned aerial vehicles and computer vision (CV) methods for explosive ordnance search is promising. Training highly efficient CV models requires huge amounts of data and computational resources. Therefore, it can be very useful to apply already trained CNNs in combination with transfer learning and fine tuning. As stated above, lossy compression introduces distortions that might impair features extracting property. So, this impact investigation is of particular importance, especially when processing military content images.
3.3. Compression
Each sample from the selected image dataset is compressed using DAC. The “classic”, “block-splitting” and “chroma-subsampling” preprocessing modes are applied. The following values for UBMAD, which specify quality loss, are used: 36, 63, 95 and 155.
Source and reconstructed images are available at the link:
https://drive.google.com/drive/folders/1u9m3amxV-7kKmxMfyhlIXWIeGzw_iqSJ?usp=sharing (accessed on 27 June 2024). This Google Drive folder contains four sub-folders: “0. CLASSIC”, “1. BLOCK”, “2. CHROMA” and “RESULTS” that contain, respectively, the results obtained using “classic”, “block-splitting”, and “chroma-subsampling” preprocessing modes, as well as data regarding their further analysis.
Additionally, the distortions produced are evaluated using the MAD, RMSE and PSNR metrics (quality loss indicators). In addition, compression efficiency measured by compression ratio (CR) is analyzed. Also, the total time of compression and decompression of the whole test dataset is computed. We note that image processing by DAC was performed using the AMD Ryzen 5 5600H CPU.
Analyzing the obtained results, we see that for any UBMAD:
the “chroma-subsampling” mode produces greater distortions than “classic” and “block-splitting”, but a higher compression ratio is provided;
the “classic” and “block-splitting” modes introduce nearly the same quality loss measured by MAD, RMSE and PSNR;
“classic” mode compresses slightly better than the “block-splitting” one;
“block-splitting” has the best time performance; decompression is performed slightly faster than compression;
for the considered range of UBMAD, we basically deal with visually lossless compression (average PSNR exceeds 35 dB; although, the introduced distortions can be noticed (by visual inspection) for some particular images in UBMAD = 155).
Further, we analyze each pair of source and decompressed images by MobileNetV2, VGG16, VGG19, ResNet50, NASNetMobile and NASNetLarge, and explore the impact of the distortions produced on their performance.
3.4. Classification
The following investigation procedure is used for each selected model. First, each source image is classified, and the computed class label is compared with the true one. Then, we apply classification to each decompressed image corresponding to the correctly classified source sample. The results obtained are stored in CSV files available at the link to the Google drive folder given above. Due to page limitations, only the aggregation results are presented. In
Table 4,
Table 5 and
Table 6, the percentage of correctly classified decompressed images is given for each preprocessing mode of the DAC and the UBMAD parameters specifying quality loss.
Further, we analyze the obtained classification results.
3.5. Analysis
Analyzing the data in
Table 4,
Table 5 and
Table 6 and
Figure 11,
Figure 12 and
Figure 13, we see that, in all cases, the percentage of those correctly classified is less than 100%, which, in general, indicates the expected negative impact of distortions produced by the DAC algorithm. Meanwhile, the percentage is higher than 94%, which means that the effect of quality loss can be considered as insignificant. Moreover, the difference between the computed classification performance indicators is minor. These results might be explained by such models’ features regarding depth and the number of parameters. Indeed, data in
Table 7 show that the considered networks are deep.
Combining the classification results with architectural features given in
Table 7, this implies that, in most cases, deeper networks provide slightly better performance. At the same time, the number of parameters does not have such an impact. Indeed, the VGG16 model has 32 times more parameters than MobileNetV2. However, these models have nearly the same percentage of correctly classified decompressed images. In addition, we see that, when applying the “chroma-subsampling” mode, which produces the highest distortions (see
Figure 4,
Figure 5 and
Figure 6), the MobileNetV2 and VGG16 indicate the highest robustness to the distortions introduced.
Finally, the behavior of models’ performance may not be monotonic with respect to quality loss. In the case of the VGG19 network, the distortion increase even has a positive impact. This feature is also observed in several other particular cases.
4. Discussion
From the results obtained in the previous Section, it is implied that distortions, which are produced by the DAC algorithm with various quality loss settings and different preprocessing modes, have minor negative effects on the classification performance of MobileNetV2, VGG16, VGG19, ResNet50, NASNetMobile and NASNetLarge. In our opinion, this is due to the combination of several factors.
First, the core of DAC is the DAT based on atomic functions that have good constructive properties in terms of approximation theory [
35,
36]. Indeed, the corresponding functional spaces are asymptotically extremal for approximation of wide classes of smooth functions. For this reason, data, in particular digital images, are well presented by DAT coefficients. Therein, high frequency coefficients can be quantized with large quantization coefficients without significant impact on the particular features of the image processed.
Second, the neural networks considered are based on applying convolutions that are robust to different distortions. In combination with the previous factor, this preserves the high efficiency of the models explored, at least, for the considered range of UBMAD variation. This is of particular importance, since many other models use the considered CNNs as image feature extractors. So, the following statement seems to be correct: lossy image compression by DAC has minor negative impact on any model, which is constructed on the base of the selected models, in particular by applying transfer learning and fine-tuning techniques.
Next, comparing the applied preprocessing modes, one may conclude that the “block-splitting” mode has the best time performance. The main reason is that this mode possesses memory localization features due to the use of small data buffers employed for compression and decompression [
37]. Such algorithms ensure a very high performance due to efficient use of memory caches [
48]. Also, if we compare memory expenses required for storing source and compressed data (see
Figure 14a), we see that they can be significantly reduced with minor impacts on the further classification accuracy.
The “block-splitting” mode of DAC can be recommended for systems with low computational capabilities, for instance, edge computing [
19].
Further, for each UBMAD, the best compression is provided by the “chroma-subsampling” mode of the DAC algorithm. However, the quality loss measured by the MAD, RMSE and PSNR indicators is larger than for other modes. Comparing the source and decompressed samples, one can see that distortions are hard to see for a human eye (
Figure 15). In other words, visually lossless compression is obtained when UBMAD is not greater than 155. Taking into account memory expense reduction (
Figure 14b) in combination with the minor impact on the classifiers’ performance, this preprocessing mode of DAC can be recommended for application on digital photos and other types of still-image compression.
We stress that the source images are given in the JPEG format.
Figure 14 shows memory expenses required for storing raw data reconstructed from JPEG files. For this reason,
Figure 14 should not be considered as a comparison of DAC with JPEG in terms of lossy image compression. An appropriate exploration has been carried out in our previous research [
49]. In particular, it has been shown that DAC provides better compression than JPEG with the same loss of quality measured by PSNR.
Finally, the “classic” mode of DAC does not demonstrate superiority compared to other modes. Indeed, in terms of image compression metrics, this mode is similar to “block-splitting”. But it is slower and has higher spatial complexity. Nevertheless, when using the “classic” mode, matrices of DAT coefficients contain representation of the whole image, not its separate blocks. This feature might be of particular importance in constructing DAT-based machine learning methods.
5. Conclusions
In this research, the absence of a significant impact from the distortions produced by the DAC algorithm with different preprocessing modes and quality loss settings (UBMAD ≤ 155), on the performance of the MobileNetV2, VGG16, VGG19, ResNet50, NASNetMobile and NASNetLarge classifiers has been shown. Hence, these deep convolutional neural networks are robust compared to the lossy (at least, visually lossless) compression by DAC. So, this algorithm can be recommended for reducing memory expenses if further decompressed image classification is applied. It has been shown that considerable memory savings can be obtained using the “classic”, “block-splitting” and “chroma-subsampling” preprocessing modes of DAC.
The “block-splitting” mode demonstrated the best time performance which, in combination with its low spatial complexity, makes it preferable for application in edge computing. Also, taking into account the data protection features of DAC, we state that this algorithm can be recommended for using in imaging systems where protection is important, in particular in systems installed on unmanned aerial vehicles [
31,
50].
It has been demonstrated that DAC with “chroma-subsampling” ensures the highest compression. In addition, despite high values in quality-loss metrics, this mode can be considered near lossless due to the absence of visual distortions.
Finally, it follows that DAT coefficients contain image features required for highly effective classifying by deep convolutional neural networks. So, the image representation by DAC can be positioned as machine-learning oriented.