1. Introduction
A general tendency in modern remote sensing (RS) imaging is to acquire data with higher resolution, for larger territories and more frequently [
1]. This leads to collecting data having a very large size and running into typical problems of big data, i.e., difficulties with image processing, transferring, storage, and dissemination [
2]. These difficulties with the RS data can be partially solved by their compression [
3]. As is widely known, there are lossless and lossy image compression techniques that may be applied for general-purpose images [
4]. The former group of methods allows original (undistorted) data to be obtained after decompression. However, in this case, the compression ratio (CR) is relatively small (limited by image entropy) and may reach values of around 5:1 only for hyper-spectral images with very high inter-band correlation [
5]. Nevertheless, a larger CR is often needed in practical applications, particularly for RS images; hence, the only possibility is the use of lossy compression with inevitable distortions [
6].
An interesting analysis of the influence of the lossy compression on the quality of aerial images using a weighted combination of qualitative parameters is presented in the paper [
7], where the multi-criteria decision-making framework has been proposed for quality evaluation. Although there are many different requirements for lossy compression, their priority and type (obligatory or desired) usually depend on the application [
8], e.g., there may be an initial condition to provide the minimum required CR. This may happen if a channel bandwidth and/or time of data transferring are limited. In such a situation, a coder should provide a simple way to define the desired CR (similarly to JPEG2000-like methods) and diminish introduced distortions to prevent their negative influence on further image processing, as well as the results of image analysis or classification studied in the paper [
9] and object recognition effectiveness depending on the RS image quality [
10]. For multi-channel data (e.g., color, multi-spectral, hyper-spectral), the compression performance may be improved by a preliminary decorrelation and the use of three-dimensional (3D) approaches for compression [
11,
12,
13]. Another recent direction of research is the application of deep neural networks for the optimization of observer-dependent image compression towards a trade-off between the human visual system and classification accuracy [
14]. However, in some applications, there may be another priority related to the introduced distortions (characterized by a specified measure or metric) below a given level, keeping the highest possible CR. Some of the issues may be related to the fast and efficient control (providing) of distortions’ level [
15], and the choice of an adequate metric to characterize distortions, considering a task to be further solved using compressed data [
16]. Some additional requirements may concern a certain format (according to some standards), compression speed, or, e.g., limited power consumption [
17].
To satisfy the above requirements concerning the metrics, some pre-requisites have appeared recently, and considerable attention has been paid to metrics able to characterize the quality of RS data [
16,
18,
19], including the artificial visible-like images based on SAR data generated with the use of deep CNNs [
20]. In particular, special attention has been paid to the so-called visually lossless compression for RS image browsing and other applications [
12,
13,
21,
22,
23]. This is important since compressed RS images are often subject to visual inspection. Meanwhile, JPEG2000 is not the best compression technique, similarly to the well-known full-reference image quality metric SSIM used in [
12,
13], which is clearly not the best among existing visual quality metrics. Perceptual quality has also received attention in the papers [
24,
25]. The necessity to provide a desired compression ratio and quality quickly enough is important in practical applications where processing time and resources are limited [
26]. No-reference metrics potentially can also be used for quality assessment purposes [
27]; however, to date, no-reference metrics are less adequate in characterizing image quality than full-reference ones.
It has been demonstrated recently [
16] that the Mean Deviation Similarity Index (MDSI) [
28] and some other elementary metrics can perform well in the characterization of three-channel RS images with distortions typical for remote sensing imagery, including distortions caused by lossy compression. It has also been shown that lossy compression, under certain conditions, can lead to practically the same or even better performance of image classification compared to the classification of original uncompressed data [
29,
30,
31,
32,
33]. This may happen when noise suppression is observed or if distortions cannot be detected visually [
33]. This means that two benefits can be provided simultaneously—one obtains the CR that sufficiently differs from unity and an improved (or, at least, not worse) classification is observed. Additionally, sufficient work has been carried out towards accelerating lossy compression while attaining a predefined desired quality. For this purpose, an iterative compression [
34] has been found accurate but requiring an unpredictable number of iterations that might cause problems with the time and computational efficiency of compression. To solve this issue, two-step methods and algorithms have been proposed and studied [
35,
36]. It has been demonstrated that lossy compression providing a given quality according to a chosen quality metric can be carried out with the appropriate accuracy in two steps.
For the first step, one needs an average dependence of a chosen metric on a compression controlling parameter (sometimes referred to as the
PCC [
35,
36]; however, in this paper, the abbreviation
CCP is used to avoid confusion with the Pearson Correlation Coefficient, typically used in image quality assessment), obtained in advance for a set of basic images (see details in
Section 4). Knowing such a dependence, it is possible to determine the initial
CCP that corresponds to a desired quality metric value according to the average rate-distortion curve, i.e., Peak Signal-to-Noise Ratio (PSNR) on quantization step (QS) or the number of bits per pixel (bpp). The value of the metric for the first step may be obtained after compression and decompression of an image with the initial
CCP. After this, the
CCP is corrected using a linear interpolation, and the average rate-distortion curve to obtain the final
CCP is used for final compression in the second step. Such a two-step procedure is quite fast, accurate, and universal, working well particularly for coders based on the Discrete Cosine Transform (DCT) and wavelets (e.g., SPIHT).
Nevertheless, an important feature of the proposed two-step approach is that it may be effectively applied for Better Portable Graphics (BPG)—a novel compression method tending to replace JPEG due to its considerably better performance compared to JPEG, JPEG2000 [
21], and some other popular lossy compression techniques. Compression characteristics can be varied by the so-called quality parameter Q; its increase leads to a larger compression ratio but more distortions introduced for the BPG. However, the same value of the Q parameter leads to different quality if it is characterized by a certain quality metric [
36], e.g., the PSNR-HVS-M metric [
37], where HVS denotes the human vision system and M stands for masking. This means that Q should be adjusted depending on the individual image subject to compression, particularly for grayscale images, as shown in the paper [
36]. However, in remote sensing practice, many modern systems acquire multi-channel images, for which the use of 3D compression is expedient.
Thus, the first research goal is to check whether or not it is possible to apply the two-step method for compressing multi-channel RS images—more precisely, three-channel images that include color images and vision range data of multi-spectral imagery. Since there are no commonly accepted quality metrics for an arbitrary number of RS data components, there is also a need to evaluate some recently proposed metrics in terms of their applicability for the quality control of such images. Hence, the second goal of the paper is to investigate some important properties of the MDSI metric, pre-selected as the most appropriate, and to verify its usefulness for the proposed two-step approach. We also analyze the degree of accuracy of the MDSI that should and can be achieved in practice.
The original contributions of the paper are related to the following:
To the best of our knowledge, MDSI has been never used and analyzed for lossy image compression in general and lossy compression of remote sensing data in particular;
MDSI is shown to be very useful for the considered application due to several benefits it provides—in particular, high linear and rank-order correlation of MDSI with MOS values for the types of distortions under interest and fast calculation;
The main areas of MDSI values have been determined and the behavior of the MDSI metric for them has been analyzed;
The analysis has been carried out for the BPG coder that outperforms known standards and provides performance comparable to the state-of-the-art compression techniques;
the two-step procedure of providing the desired quality has been tested for the considered metric and coder, showing peculiarities dealing with the CCP (integer values of the Q parameter).
The paper structure is as follows.
Section 2 considers some properties of three-channel images. Some important properties of the MDSI metric are obtained using color images from the TID2013 database [
38]. Then, the proposed two-step procedure is described in
Section 3 and its implementation for the BPG is studied in
Section 4. Some simulation results are provided in
Section 5 and the obtained accuracy is discussed in
Section 6, followed by the conclusions.
2. Metrics for the Assessment of the Visual Quality of Three-Channel RS Images
2.1. Properties of Three-Channel RS Images
Many modern RS systems produce multi-channel images, where the term “multi-channel” concerns multi-spectral, hyper-spectral, dual and multi-polarization radar data, etc. In this paper, we concentrate on three-channel images due to three main reasons. Firstly, together with dual-polarization radar data, three-channel images are the simplest example of multi-channel ones, being also convenient to process and analyze. Having some methods and results obtained for three-channel images, they can be generalized for images with a higher number of channels. Secondly, three-channel RS images can be easily visualized as color ones, leading to the simplicity of visual analysis in comparison to the case of multi-channel image representation in pseudo-colors. Although the wavelengths of the channels of multi-spectral images usually do not coincide with the wavelengths in traditional RGB representations, visualized three-channel images usually look quite clear for an observer. Finally, it is expected that the BPG compression can be extended for images with a higher number of channels. However, at the moment, only the BPG version designed for the compression of three-channel color images is available.
Compared to conventional color images (photos), RS images have some specific features. First of all, they are usually more highly structured and each object has a semantic meaning [
39], whereas natural images are more chaotic. These objects present in RS images need to be analyzed in further stages of data processing, particularly target recognition, classification, segmentation, and parameter estimation. For example, the main goal of the image segmentation process related to partitioning an image into a set of homogeneous segments, in terms of chromatography or texture, is highly important for remote sensing data [
40]. Meanwhile, the fact that RS images often include large areas of background, which is much less important than foreground objects [
41], may also be taken into account in compression.
Secondly, the correlation between component images of three-channel RS data can differ from the correlation of red, green, and blue components of color images. Whilst, for color images, the cross-correlation factor is usually around 0.7 [
42], the correlation factor between components of three-channel RS data may be significantly higher [
5]. This might influence noticeably the final compression performance if 3D approaches are to be applied.
Since individual channels of RGB images are represented as 8-bit data and such components of RS images might be initially represented in another way, e.g., as 16-bit data, it may be difficult to adequately compare the compression of color and three-channel RS images. Therefore, three-channel images with the 8-bit representation of channels (also referred to as component images) are considered in a further part of the paper. Additionally, if component images originally have more than 8 bits, it is supposed that they are pre-normalized to 8 bits before lossy compression. Certainly, this leads to the introduction of some additional errors, although relatively small, due to rounding to the nearest integer. The PSNR of images subject to normalization, rounding-off, and re-normalization is usually around 59 dB, whereas distortions introduced during lossy compression usually lead to a much smaller PSNR. Thus, distortions caused by normalization (if applied) may be ignored in further analysis.
Furthermore, there are no commonly accepted databases of “pristine” (reference, distortion-free) RS images. Moreover, types of distortions inherent for RS images and color images partly coincide but are partly different. For example, image dithering is not met in practice in RS images. Meanwhile, speckle noise is not typical for color images but it might be an important factor for RS images of a special kind (synthetic aperture radar ones). This obstacle prevents the direct use of color image databases for making conclusions and recommendations for RS images. However, recently, the TID2013 dataset has been indirectly used to analyze the subsets of distortion types that might be present in RS images. This has allowed the determination of good elementary visual quality metrics for the adequate characterization of RS image quality and the design of combined metrics. Already known visual quality metrics, used in a combined metric [
16] as one of the inputs, are referred to as elementary metrics. In particular, the MDSI [
28] has been presented as one of the best elementary metrics [
16]. Hence, more details about this metric are provided below, together with an explanation of why it has caught our attention.
It should be kept in mind that the visual quality metrics describe the quality of data from a specific viewpoint and the relation between visual quality metrics and, e.g., text recognition from document images or image classification accuracy is not fully known [
29]. Nevertheless, preliminary results of the classification of compressed images have already demonstrated that visual quality has a high correlation with classification accuracy, especially for classes represented by small-sized, prolonged, and textural objects, i.e., for classes that are quite heterogeneous [
16]. Since high-frequency information can be lost due to lossy compression for large CR, it might harm the classification as well.
Our task of providing desired quality in compressing a given image can be treated as a particular case of applying the theory of generative adversarial nets [
43,
44,
45] since the image at hand is supposed to belong (according to its basic properties) to the set of images used at the method training stage when the average rate distortion curve has been obtained. Although there are some papers studying how distortions affect image classification and object recognition tasks, particularly with the use of neural networks, it should be kept in mind that lossy compression should not lead to an essential degradation in performance for image classification and object recognition compared to uncompressed original data.
2.2. Analysis of Some Elementary Image Quality Metrics
Image quality assessment (IQA) plays a significant role in numerous image processing applications—for example, image acquisition, lossy compression, restoration, denoising, etc. IQA techniques and metrics can be divided into three categories according to the availability of the original image, namely Full-Reference (FR), Reduced-Reference (RR), and No-Reference (NR). Different methods may be chosen according to the requirements, their priority, and their application. In this paper, the FR IQA methods are utilized to evaluate the compressed images and to provide the desired visual quality in RS lossy compression. The main reason is the knowledge of the reference image (this is simply the image to be compressed). Another reason is that the FR IQA metrics are usually simpler and more adequate (accurate) than the metrics that belong to the two latter groups.
The simplest classical FR IQA metric is the Mean Squared Error (MSE), computed by averaging the squared intensity differences of distorted and reference images pixel-wise. Another well-known metric strictly related to MSE is the Peak Signal-to-Noise Ratio (PSNR). The significant advantages of MSE and PSNR are that their calculation is simple and their physical meaning is clear since the contrasting is based on the pixel level. Nevertheless, the most relevant weakness of these metrics is that they are not very well matched to perceived visual quality [
46]. Because of this, numerous metrics based on the MSE and other principles, particularly calculated locally using the sliding window approach, have been proposed and intensively studied in the last three decades, i.a., Structural Similarity (SSIM) [
47,
48,
49], Feature Similarity (FSIM) [
50], or the MDSI [
28].
In this paper, the MDSI is adopted as the visual quality metric for three-channel RS images. To explain the reasons for this choice, some aspects and requirements for IQA should be recalled. As is widely known, subjective image visual quality is assessed during quite complicated experiments involving a large number of participants and test images of different complexity [
38]. The result of such testing consists in obtaining Mean Opinion Score (MOS) or Differential Mean Opinion Score (DMOS) values. A metric is considered good if, for different databases, it has high absolute values of correlation factor between a given metric and MOS, where both conventional (Pearson) and rank-order (Spearman and/or Kendall) correlations can be taken into consideration (ideally, it is desired that both Pearson and Spearman correlation coefficients have absolute values close to unity). As the first requirement, monotonicity of dependence of a metric on image quality characterized by Spearman Rank Order Correlation Coefficient (SROCC) is required. Meanwhile, linearity of this dependence, better characterized by the Pearson Linear Correlation Coefficient (PLCC), is desired as well. Note that the above-mentioned FSIM [
50] has sufficiently nonlinear behavior on image quality and MOS [
51]. During some experiments conducted for 50 metrics [
16], SROCC values have been calculated as one quantitative criterion of metric performance for all distortion types and three subsets of the Tampere Image Database (TID2013). According to these calculations, the SROCC value determined for MDSI versus MOS is 0.8897 for all types and levels of distortions, being higher than for most other metrics, whereas the SROCC for the subset Noise&Actual is 0.9374. It is the highest value among all the considered elementary metrics. Meanwhile, the statistics of average calculation time demonstrate that the computational efficiency of the MDSI is very high. Additionally, the SROCC between MDSI and MOS has been calculated for images with three types of distortions related to lossy compression in the TID2013 dataset. It is equal to 0.966, i.e., very high, meaning that the MDSI is able to adequately characterize the visual quality of lossy compressed images.
Although the detailed SROCC values for 50 elementary metrics obtained for three subsets and the whole TID2013 dataset may be found in the paper [
16], an additional verification of these metrics may be conducted for the Konstanz Artificially Distorted Image quality Database (KADID-10k) [
52], containing 81 pristine images, each degraded by 25 distortions in five levels—particularly for the subsets containing distortions characteristic for RS images. Considering the JPEG and JPEG2000 compressed images, only two metrics—HaarPSI and MDSI—achieve SROCC values over 0.925 and PLCC over 0.94 (after nonlinear fitting) simultaneously, demonstrating both high prediction monotonicity and accuracy. Nevertheless, it is worth noting that the HaarPSI metric is around 1.4 times slower than MDSI, being one of the fastest elementary metrics (detailed results are presented in the paper [
16]). The performance comparison of individual metrics for the lossy compressed images from the KADID-10k dataset is presented in
Appendix A (
Table A1).
2.3. The MDSI Metric and Its Properties
An important property of the MDSI metric is that, during its computation, a gradient magnitude is used to measure structural distortions, whereas chrominance features are used to measure color distortions (recalling that both these types of distortions are equally important for three-channel RS images). Subsequently, the two obtained similarity maps are combined to form a gradient-chromaticity similarity map. Differently than for SSIM and FSIM, the deviation pooling strategy is used to compute the final quality score. In comparison to previous research, this new gradient similarity map is more likely to follow the human visual system (HVS).
Providing the desired visual quality in lossy compression is a challenging task; however, it would be possible if a metric value was associated with a certain level of quality. It could also be useful to know a range of metric values for which distortions are practically invisible. In lossy compression, the desired visual quality is often within a certain range, also for RS images. As illustrated by some already completed analyses based on other metrics [
33], the lower limit is such that lossy compression has no negative impact on further image processing. It means that the introduced distortions may be noticeable or even visible but not annoying. Concerning the upper limit, the lossy compression should provide a higher CR than possible to achieve by the lossless compression (limited by entropy). A reasonable threshold should be set in such a way that the introduced distortions are invisible, so the visual quality of compressed data should be identical to lossless compressed images but higher CR can be achieved. This threshold is around 40 dB in terms of the PSNR-HVS-M metric multi-channel RS images [
33].
To provide a reasonable range for the metric MDSI, we have tested 3000 color images of the database TID2013 to obtain the statistical data results [
38] and put them into three categories, namely visible, just noticeable, and invisible distortions. Since MOS values have been provided for each image in the TID2013 dataset, the scatter plot for MDSI vs. MOS for the three mentioned classes of images is shown in
Figure 1.
Combining the statistical results and MOS values, it can be approximately stated that there are three gradations of image quality according to MDSI:
excellent quality (MDSI ≤ 0.15), the distortions are mostly invisible;
good quality (0.15 < MDSI ≤ 0.25), the distortions can be just noticeable;
middle and bad quality (MDSI > 0.25), the distortions are visible or they can be annoying.
Therefore, the reasonable range (under interest in this paper) is set as the range from 0.10 to 0.25. It is also worth noting that the relation between MDSI and MOS is almost linear and this should be considered as one more advantage of the MDSI metric.
3. The Two-Step Method for Lossy Compression
The two-step image compression method has been recently proposed to control the visual quality in lossy compression, and further provide the desired visual quality for handling images in terms of a chosen visual quality metric. The previous research has proven that this method works well for DCT-based coders (such as AGU, ADCTC) [
35,
53] as well as for the DWT-based coder (SPIHT) [
54]. The latest conference paper [
36] demonstrates also some initial results and the advantages of the two-step compression method for the BPG coder.
Although lossy compression can easily achieve higher CR than lossless compression, usually, it reduces the visual quality of an image noticeably. However, the visual quality is also important and it can be even the most important requirement in some cases. Consequently, in such applications, lossy compression should be applied with additional control of introduced distortions. To control the visual quality in lossy compression, the CCP can be adjusted (or just properly set), considering the so-called rate-distortion curve, representing the dependence of visual quality on the CCP. However, for a given lossy compression coder, visual quality dependence on the CCP varies, depending also on image characteristics. Although it is difficult to know the rate-distortion curve for each image to be compressed in advance, it is still possible to obtain the general trend for particular image categories, e.g., three-channel RS images.
Since the provided approach should be possible to apply for images of different terrains or, in other words, of different complexity containing various types of objects, a relatively high number of test images should be used in experiments. This requirement is one of the main reasons for the methodology of design and analysis applied at different stages of the study presented in the paper.
First of all, a certain number of images are chosen to be compressed/decompressed assuming a series of
CCP values, further referred to as the basic image set (sample images are shown in
Figure 2). Each value of MDSI for each test image obtained after compression/decompression using given
CCP values should be registered to evaluate the distortions. As a consequence, a model of visual quality metric dependence on the
CCP may be obtained from these statistic data. At this stage, it is possible to obtain individual dependencies of the metric on the
CCP, further averaged for all basic images corresponding to each
CCP (more details are provided in
Section 4). In this way, it is also possible to obtain an averaged dependence, the so-called average rate-distortion curve, which reflects the monotonous change in visual quality with the
CCP. This process is performed offline; hence, it does not influence the time efficiency of the two-step method. Based on this average rate-distortion curve, the second step of the method can be carried out.
The block diagram of the two-step compression method is illustrated in
Figure 3; in the first step, the initial
CCP is determined using the desired visual quality and the average rate-distortion curve. In general, it is calculated using the following equation:
where
is the desired visual quality pre-set by the user, and the other three parameters come from the average rate-distortion curve presented in a tabular form. The
is the value closest to the desired
at the right end of the corresponding interval of the average rate-distortion data array, whereas
is the value corresponding to
. The curve derivative for the corresponding
is denoted as
. These calculations do not require image compression, so the
value is the same for all images being compressed, assuming a given desired visual quality.
The first step of the proposed method is the compression and decompression of the original image with the initial
CCP; subsequently, the visual quality value
of the decompressed image could be calculated using the original image as the reference. Since the metric value
is close to the desired
, the absolute error is calculated as
further evaluated as acceptable or not to undertake a decision concerning further actions. It is worth noting that, for some images, the above
error can achieve the required level, so the second step is not needed and the compressed image in the first step can be treated as the final output.
To improve the accuracy of provided visual quality, the
CCP value needs to be corrected before the second step using the following equation:
This corrected CCP value may be different for different images. Finally, the second step, compression, is carried out using the , and the compressed image file obtained after the second step is considered as the final output with the desired quality.
5. The Experimental Results
To evaluate the performance of the proposed two-step method for the BPG coder for three-channel RS images, some additional test experiments are necessary since the average rate-distortion curve model has been obtained only for the basic image set, which might be not representative enough. Consequently, the two-step compression method has been applied firstly for the basic image set to provide four typical values for the MDSI metric, representing the four classes provided in
Section 4.2. These four typical values have been set as 0.1, 0.15, 0.2, and 0.25, respectively, and the achieved statistical data are shown in
Table 3, where
denotes the desired value of the MDSI metric,
stands for the variance of MDSI provided in the first step, and
is the variance of MDSI provided in the second step. For a better understanding of the data, the mean MDSI values finally provided in the second step are provided as well, denoted as
.
The analysis of the data provided in
Table 3 leads to the conclusion that the variance after the second step of compression has decreased by approximately one order of magnitude for each desired value. It proves that the proposed two-step procedure works well in the considered conditions. It can also be noticed that both variances
and
tend to increase if the desired MDSI increases. This means that the task of providing the desired MDSI is more important for larger
values, e.g., 0.2 or 0.25 (for
= 0.1, the distortions are invisible and they remain invisible if the desired MDSI is provided with the error of around 0.01).
The mean absolute error of the desired quality, calculated as
, does not exceed 0.034, and its value increases as the desired visual quality decreases, which is similar to the trend observed in previous works with the other coders [
35,
54].
To verify the representativeness of the basic set, the other 12 RS images have been chosen as the test image set, shown in
Figure 8 (further referred to as RSI #13–#24), which is also a part of the USC-SIPI dataset. Then, the two-step compression method has been applied to these images to verify the correctness and universality of the previously obtained curve model, leading to the statistical data shown in
Table 4.
As shown in
Table 4, for each desired MDSI value, the variance after the second step of compression has also decreased by approximately an order of magnitude, and the mean error does not exceed 0.035. As one can see, the tendencies and values are similar to those observed in
Table 3, so the basic image set has been chosen correctly to obtain the average rate-distortion curve and this model works well for other three-channel RS images.
To analyze the data for 12 test images (RSI #13–#24) in detail, results obtained for
= 0.25 are presented in
Table 5, where
denotes the parameter
Q used for the first step of compression. It comes from the average rate-distortion curve and equals 45 for all images. As may be seen in
Table 5, although the initial MDSI values are different for individual test images, their variance after the second step has decreased significantly, being only 1/15 of its value after the first step. The mean value of the MDSI after the second step is also noticeably closer to 0.25; hence, its average relative error has also decreased (from 2.08% to 1.4%).
In general, the accuracy has radically improved due to the second step of compression. Meanwhile, there are cases when the MDSI after the second step is the same as for the first step, e.g., this happens for RSI #13. This means that there is no need to correct the
Q parameter and apply the second step of compression in such cases. As shown in
Table 5, 5 out of 12 images needed only one-step compression to meet the quality requirements, whilst the other seven images needed the second step to improve the accuracy. For all verification experiments (carried out for all 24 images and four desired MDSI values), 28.1% of images needed only the first step of the two-step compression to provide the desired visual quality. The other tables with more detailed results, both for the basic and the test image sets, achieved for different desired MDSI values, are provided in
Appendix A (
Table A3,
Table A4,
Table A5,
Table A6,
Table A7,
Table A8 and
Table A9).
6. Discussion
To analyze the accuracy of the provided visual quality for the BPG-based lossy compression of three-channel remote sensing images, three images (RSI #13, #14, and #16) have been selected as representative examples with the desired visual quality (MDSI) equal to 0.25.
The decompressed images for the two-step compression method are shown in the middle (third) column in
Figure 9. For the desired MDSI value equal to 0.25, the initial
Q is equal to 45; the calculated
values are different for different images (equal to 44, 45, and 47, respectively). The two images on the left from the third column are the images obtained if the parameter
Q is set as
and
, and the two images on the right are the images when
Q is set as
and
.
For RSI #14,
appears to be the appropriate value as for its change (increase or decrease), the error
increases. For RSI #13, the parameter
Q is corrected to 44, and in comparison to the four other values, compression with the
produces
, which is the closest to the
. In contrast, for RSI #16, the initial
Q is corrected to 47, and compression with this
produces the smallest error between
and
. Concerning the error, for RSI #16 considered as an example, the provided
is 0.2447, and the error between
and
is 0.0053.
Figure 9 shows five decompressed images resulting from RSI #16, where images compressed with two values of
Q differing by unity seem to be practically identical, but if
Q differs by 2 or more, e.g., for RSI #16 (45, 0.2185, 233) and RSI #16 (49, 0.2817, 527) compared to RSI #16 (47, 0.2447, 347), the difference is much easier to observe. Hence, if the difference
is approximately 0.015, it is difficult to observe the changes in the decompressed images. However, it becomes noticeable if
is approximately 0.03, e.g., for RSI #16 (46, 0.2303, 280) and RSI #16 (48, 0.2586, 424). Therefore, in practical applications, it is enough to ensure errors of providing the desired MDSI less than ≈0.01. Consequently, it can be drawn that the accuracy of the two-step method for the BPG coder is good enough.
In summary, for images where the second step is necessary, regardless of whether the correction is forward or reverse (initial Q is increased or decreased), it gives a positive impact and eventually provides the visual quality that is the closest to the desired one. Additionally, the CR values provided for image lossy compression in the neighborhood of the distortion invisibility threshold are considerably higher than possible to achieve using a lossless compression.
To analyze the computational efficiency of the proposed approach, some tests have also been performed using a notebook with an Intel® Core™ i7-4710HQ CPU @2.50 GHz and 16.0 GB RAM, controlled by the 64-bit Windows 10 Pro operating system for the x64 processor architecture. For 512 × 512 pixel images, the compression time is from 0.02 s to 0.05 s depending on image complexity and the value of the parameter Q (a larger time is needed for more complex structure images). The decompression time is from 0.006 s to 0.019 s (more time is spent on the decompression of more complex structure images). For 1024 × 1024 pixel images, the compression time is from 0.06 s to 0.12 s; the decompression time is sufficiently smaller (from 0.02 s to 0.06 s). The MDSI values can be calculated very quickly (the time for their calculation is only around 1.5 times longer than for the calculation of MSE).
7. Conclusions
In this paper, a two-step algorithm for providing the desired visual quality for the BPG-based lossy compression of three-channel remote sensing images has been proposed. The MDSI metric has been applied to evaluate the visual quality of the decompressed image. The main contributions of this paper concern the extensions of the basic two-step algorithm utilizing the MDSI metric and some features of the BPG encoder, e.g., integer form of the Q parameter, as well as the properties of three-channel RS images, such as high correlation of multi-channel data.
The MDSI metric has been studied to evaluate the quality of three-channel images, providing a reasonable operation range for the lossy compression of three-channel RS images. Three visual quality levels have been proposed, corresponding to the MDSI values appropriate for excellent quality, good quality, as well as middle and bad quality, respectively.
Experimental results have demonstrated the superiority of the proposed algorithm. It allows images to be compressed quickly and with appropriate errors concerning the desired quality characterized by the MDSI metric. If the parameter Q, calculated after the second step, is equal to the initial Q, the second step could be skipped and the procedure may be accelerated for some similar images or video frames. Otherwise, the second step is needed to improve the accuracy with the corrected Q value. Statistical data show that, due to the second step, the accuracy is considerably improved and the provided visual quality is very close to the desired one. Our methodology is quite general and can also be applied to some other metrics having similar performance to MDSI.
In the future, it is expected that BPG-based lossy compression can be applied to provide the desired characteristics of the classification of decompressed images. A study concerning the impact of visual quality evaluated by the MDSI on the accuracy of classification is planned, as well as a discussion of the applicability of the two-step algorithm for the BPG coder in the classification task of high-resolution multi-channel remote sensing images.