1. Introduction
Scattering media, such as the atmosphere [
1,
2,
3], underwater environments [
4,
5], and biological tissues [
6,
7], are among the important factors affecting imaging quality in reality. When light passes through a scattering medium, the ballistic light decays rapidly, and target information will be severely corrupted. In order to get imaging results that are as clear as possible, many typical imaging techniques have been proposed, including transmission matrices [
8,
9], wavefront shaping [
10], light storage effects [
11,
12], and ghost imaging [
13,
14,
15,
16]. However, these methods have certain limitations and so do not work better in complex scattering media situations. Moreover, they cost a lot of time and money.
Following developments in polarization transmission theory in recent years [
17,
18], polarization technology now plays an important role in solving target imaging from scattering media [
19,
20,
21]. In recent years, physical models and image processing methods based on polarization information have been proposed to improve the clarity of imaging in scattering media [
22,
23]. In 1996, J.S. Tyo et al., proposed the Polarization Difference (PD) method for imaging through scattering media [
24]. In 2001, Y.Y. Schechner et al., added polarization effects to the atmospheric defogging model [
25]. Liang et al. proposed that the estimated parameters of the angel of polarization (AoP) can be used in defogging [
26], which not only significantly improves the clarity of blurry images but also can be applied to dense fog environments [
27]; they also tried to fuse visible and infrared polarized images together to defog and improve target recognition efficiency [
28]. Hu et al., proposed a recovery algorithm based on the estimation of polarization differential imaging that takes into account the previously overlooked polarized light radiated by the target itself and proves that it is feasible to improve and optimize the quality of the recovered image [
29]. In addition, they also proposed a method based on corrected transmittance to clearly improve the quality of the underwater image [
30]. Shao et al. developed an active polarization imaging technology based on wavelength selection [
31], which uses the dependence of scattering light at different wavelengths in a turbid underwater situation. In addition, Guo et al. obtained the Muller matrix (MM) of a scattering medium based on the Monte Carlo (MC) algorithm [
3] and proposed a polarization inversion method to study polarization transmission characteristics in layered dispersion systems [
15,
17,
32,
33,
34], a layered atmosphere [
19], and underwater environments [
5].
At the same time, deep-learning (DL) techniques have been verified to be a very effective method for damaged image recovery by researchers who used DL to find a mapping relationship between speckled images caused by scattering and original targets [
35]. A “one to all” convolutional neural network (CNN) can learn the characteristic information in speckle patterns obtained in the same scattering medium [
36]. Li et al. established the “IDiffNet” network structure, which is made up of a tightly connected CNN architecture, to learn the characteristics of the scattering medium and proved that the network’s superior generalization capability through the network still works in spite of input data from other scattering media [
37]. Lyu et al., proposed a hybrid neural network based on computational imaging in thick scattering media to reconstruct target information hidden in the scattering medium [
38]. Sun et al., reconstructed the scattered spot image using the DL algorithm in the low-light environment, which cannot be imaged using a traditional imaging method because the resulting spot contains limited information and has the influence of Poisson noise [
39]. Zhu et al., used the autocorrelation imaging of scattered spots to learn the generalized statistical invariants of the scattering medium using DL networks, which improves the applicability of the network model [
40]. The combination of polarization information and the DL method has also become an important direction of imaging reconstruction. Li et al. used Q information to train the network and prove that the model-Q has superior generalization and robustness in different aspects [
41]. Li et al. proposed the PDRDN to achieve the removing of the underwater fog effect using four angle polarization pictures (0°, 45°, 90°, 135°) [
42]. In addition, DL based on polarization is applied to target detection [
43,
44,
45], underwater imaging [
46], image denoising [
47], and image fusion [
48], etc., which can get higher detection accuracy, significant noise suppression, and effective removal of the scattered light, and can obtain more detailed target information. However, data-driven network models depend too much on the data, resulting in limited generalization capabilities, which is also a major difficulty in applying deep learning to reality. On the one hand, training the network with stable target features will improve the stability of the network. Even if the external environment changes within a certain range, it will not affect the reconstruction results of the trained model. Therefore, in order to improve the stability of the model, we use the polarization information of the target as the training set, which carries stable target features during transmission. The stable target feature carried by the polarization information is capable of adapting to many changes of environment, thereby improving the generalization ability of the network model.
Effective physical priors can prompt networks to find an optimal solution for different situations. The degree of polarization (DoP) is the ratio of the polarization to the total light intensity, and it can be considered as the most intense polarization state. Thus, in this paper, we use DoP to focus the polarization characteristics of the scattering system and then utilize the powerful DL to obtain the polarization characteristics from the scattering system, which can solve the generalization problem of single material objects in scattering scenes and reduce the dependence of deep learning on data. Experimental results demonstrate that the network model trained by DoP has a better recovery performance, and for targets that are not in the training set, the model can still recover them with high accuracy. What is more, the model can still work when there is a mismatch distance between the training set and the testing sample. Moreover, the influence of polarization characteristics also provides a certain basis for the application of deep learning in polarization information-based remote sensing. Finally, we present the quantitative-evaluation results with multiple indicators, which show the accuracy and robustness of the scheme, and reflect the great potential of combining physical knowledge and deep learning technology.
2. Materials and Methods
2.1. Physical Foundation
Light can be represented by the Stokes vector S = (I, Q, U, V)
T whether it is polarized or non-polarized [
49]. Elements in a Stokes vector can be obtained from the intensity of four angles (0°, 45°, 90°, 135°):
where
I is the total light intensity,
Q is the difference between horizontal and vertical components,
U is the difference between 45° and 135° components, and
V represents the difference between right-handed and left-handed components. The components in Stokes vector satisfy:
The Stokes vector is relative to the light intensity. An existing focal-plane polarization camera can directly obtain polarization pictures of four angles (0°, 45°, 90°, 135°). Therefore, we can easily get three elements of (I, Q, U), minus the V component.
Polarization information of light can be destroyed by scattering media during the transmission process, and the process can be expressed as:
where
M is the Muller matrix (MM) of the scattering media, S
out represents the Stokes vector of output light, and S
obj represents the object’s Stokes vector in the incident light. The aim is to reconstruct targets using the S
obj; therefore, Equation (3) is transformed and expressed as follows:
where
M−1 is the inverse of
M, which contains the polarization characteristics of the scattering media. For scattering media, the larger optical thickness (OT) becomes, the more damaging target polarization information will be; therefore, the detector can only capture spots which contain limited information from the target. For targets, when the difference of polarization characteristics between target and background is slight, the receiver cannot completely distinguish them.
Reconstructing the target can be regarded as an inverse process of imaging in scattering media. The DL as an excellent method can be used to solve the inverse process. Inspired by this, we utilized the powerful fitting capacity of DL to obtain the map between speckles and the original images. In order to solve the inverse problem better, it is necessary to make full use of the polarization physical priors. Specifically, the learning framework consists of the pre-physical step and post-neural network step based on a physical prior, which can be seen in
Figure 1. Firstly, the pre-physical step is used to acquire the linear-polarization images, and the
DoLP can be expressed as:
As the ratio of the linear-polarization component to the total light intensity, DoLP is a common polarization parameter and can be used to describe the polarization characteristics of the scattering systems. Therefore, when we use DoLP images as a training set to train the network, we can filter out redundant information with more effective characteristics for training the network.
In addition, the polarization information is very sensitive to the material and the structure of the targets. Therefore, the generalization performance of the model must be most closely related to the scattering medium and the polarization properties of targets. The model trained by the target with the same material will have a broader generalization about materials.
2.2. Measurement System
To get the dataset, we set up a polarization scattering imaging scene in experiments; the schematic of the experimental setup is shown in
Figure 2. In order to capture more target information, we placed a polarizer in the front of the LED light source to provide polarization illuminance. The light of S = (1, 1, 0, 0)
T can be modulated by the polarizer, which facilitates the implementation of the polarization algorithm [
50]. Then, the polarized light irradiates to the target and is reflected from it. Finally, the reflective light transmits through the ground glass and is captured by the polarization camera (DoFP). In our experiments, the targets are a series of handwritten digits with the ink on the white paper. We put the target at a certain distance behind the ground glass of 5 mm and define the distance between the target and the ground glass as “d”.
The polarization camera in our experiment is a commercial DoFP (division of focal plane) polarization camera (LUCID, PHX055S-PC) with pixel counts of 2048 × 2448, whose pixel array surface is covered with a polarization array consisting of four micro-polarizers with four different polarization orientations of 0°, 45°, 90°, and 135°, respectively. The polarization image of the four angles can be used to calculate the image of DoLP. Here, we have captured 200 images with the DoFP cameras and expanded the dataset to 1000 training sets by data enhancement, such as rotation, clipping, etc.; of them, 900 are used for training and 100 for verification.
2.3. Neural Network Design
With the developments of DL technology, many excellent network structures have been built in the field of imaging reconstructions. U-Net, as a fully convolutional neural network structure, has been also proposed for semantic segmentation of medical images. Now it has been showing its superior effects on image reconstruction. The principle of U-Net is similar to that of the self-coder model. Our goal is to extract and reconstruct the target information from the polarization speckles. This process can be regarded as the process of encoding and decoding. Moreover, the skip-connection structure contained in the U-Net solves the problem of gradient explosion and gradient disappearance during training in deeper networks, which is one of the reasons for its excellent performance. DenseNet is a network structure proposed in 2017 [
51], and it is a composite layer composed of multiple dense blocks, each of which is connected to the next layer by means of a connection operation. That makes the transmission of features and gradients more efficient and the training process of the network easier [
52].
In our scheme, we change the number of convolutional layers and channels of the original U-Net network to form an improved U-Net based DL network, as shown in
Figure 1. We replace a single convolutional layer with dense blocks for feature extraction, which will improve the network performance. In the dense block, we use a 3 × 3 convolutional kernel and a circle of padding to ensure that the input and output feature map size is unchanged. Each dense block is connected to the batch normalization and linear activation functions. As the number of network layers and filters increase, the max pooling layer with a step size of 2 × 2 is used to reduce the image length and height to half of the original. In addition, the decoder acts as the inverse of the encoder, and the last layer of each decoder is an up-pooling layer. Throughout our network model, the activation function uses a rectified linear unit (ReLU) that enables fast and efficient training of the network. Meanwhile, in order to reduce the occurrence of overfitting, we add a dropout layer. After that, the images with 256 × 256 pixels can be reconstructed by convolutional layers. In addition, we calculate parameters and floating-point operations (FLOPs) to assess the complexity of the network, which are 53.86 M and 68136.58 M, respectively. During training, the loss function reflects the model’s ability to fit the data. Here, we use MAE as the loss function:
where
X(
i,
j) and
Y(
i,
j) represent the values of (
i,
j) pixel in the reconstruction image and in the ground truth, respectively, and
M and
N are the size of the image.
We trained the model in an image processing unit (NVIDIA RTX 3080) using a Pytorch framework with Python 3.6, training 200 epochs. The optimizer is the Adam (Add Momentum Stochastic Gradient Descent) with a learning rate of 0.001.
3. Results
The polarization characteristics of the target are not easily affected by the scattering media during transmission. Therefore, the model trained with the polarization information of the target is more stable. Therefore, in this section, we designed different test experiments to verify the stability of the trained model with the polarization information of the target.
3.1. Subsection
Unlike the speckled images obtained by laser irradiations, the images obtained by emitting natural light do not have obvious light and dark distributions, and the whole of them is cloudy. Moreover, the greater the distance between the ground glass and the target, the more blurred the outline of the target. At the same time, the spectral width of light reduces the associated length of scattered light and the FoV of the imaging system in a real-world experiment [
53,
54]. The experiment is set up without ambient light, and we get images only by irradiating the targets with a white-light LED. The results for the circumstance of d = 4.0 cm are shown in
Figure 3.
With increasing d, the energy of light reaching the ground glass will decrease; therefore, the target information passing through the scattering medium will also be decreasing. We collected the data at a distance of d = 4 cm, where the target profile was completely obscured by the noise from the ground glass, and the calculated DoP image also cannot distinguish the target from the background. Under this condition, we chose the DoP images of the targets as the training set and used the images of targets without the scattering medium as the related labels. The sizes of the scattering images and labels are the same, 256 × 256. After collecting and classifying the data, the proposed methods can be used for training and testing.
In the case of d = 4 cm, we prepared 200 scattering images used as the training sets, which are the DoP imaging results of different structural targets (10 handwriting digits: 0~9) transmitting through the ground glass. Original images without scattering served as the respective labels. We also expanded 200 scattering DoP images to 1000 DoP images, in which 900 and 100 images served as training set and validation set, respectively; the trained network can be called the Model-DoP.
3.2. The Results of Reconstructing Untrained Structural Targets with DoP
In this section, we set targets with different structures, which have not been trained. If the trained Model-DoP can reconstruct those untrained targets, it proves that our proposed method has superior stability on the structure of targets. As shown in
Figure 4a, the scattering images are not the samples used to train the Model-DoP, and after transmitting through the ground glass, the corresponding scattering DoP images cannot be distinguished, as depicted in
Figure 4b. However, they can be reconstructed well by the trained Model-DoP (as shown in
Figure 4), in which the edge of the targets can be identified accurately. The results reflect that the scattering DoP images as training sets can effectively drive the network to learn the polarization characteristics of the different targets, which is helpful to achieve the target reconstruction.
In order to further verify the generalization of the Model-DoP, we changed the structure of the target to test the Model-DoP trained by the digit target. First, we replaced digit targets to English alphabet targets while the background remained unchanged. The reconstructed results are shown in
Figure 5.
Figure 5b shows the scattering DoP images, and
Figure 5c the corresponding reconstructed results. Moreover, we also used some graphics as the targets to further demonstrate the diversity and complexity of the generalization of the Model-DoP. The ground truth, results of the scattering DoP images and reconstructed images are shown in
Figure 6. In the case of the limited number of training data, the Model-DoP can reconstruct the untrained targets, including both English alphabets and graphical data, which reflects that the Model-DoP studies not only the mapping relationship between pixels but also the polarization characteristic of different materials. Therefore, the targets with different shapes can be also reconstructed as long as they are made of the same material.
The Structural Similarity Index (
SSIM) is a common indicator to evaluate the image quality and measure the similarity of images [
55]. Here, we also use the
SSIM to evaluate the quality of the reconstructed targets, for quantitatively describing the results of the reconstruction and performance of our network. The
SSIM consists of three parts: brightness, contrast, and structure. Given the original image and the predicted image (
X,
Y), the
SSIM of them can be calculated as follows:
where
μx is the mean of
X,
μY is the mean of
Y,
σx is the variance of
X,
σY is the variance of
Y,
σXY is the covariance of
X and
Y, and
C1 and
C2 are small normal numbers used to avoid the zero denominator. The
SSIM value range is 0 to 1. The higher the
SSIM value, the more similar the image.
The
SSIM of three graphs with different complexity and diversity do not have much difference, which can be seen from
Table 1. Although the
SSIMs have a downward trend with increasing complexity and diversity, the overall fluctuation is not too large. The graphic targets, whose relevance to the target in the training set is the weakest, still have more than 70% similarity.
3.3. The Performance of Model-DoP on the Different Polarization Characteristics
To further investigate the sensitivity of the model-DoP to the polarization properties of the target, we conducted a test using targets composed of other materials that had not been trained. Firstly, the target material was set to steel, and other conditions were unchanged with the background being paper. Therefore, the targets can be called “Steel-Paper” targets, as depicted in
Figure 7a. Then, scattering DoP images under natural light condition were obtained, as shown in
Figure 7b, and images were entered into the original “Ink–Paper” trained model. The specific reconstruction results are shown in
Figure 7c. Due to the high reflectivity and low deflection characteristics of steel, the image obtained through the scattering medium retains a large amount of target information. It can also be seen that the polarization characteristics of steel and paper are quite different from
Table 2 [
19,
56]. Therefore, the model-DoP can also identify the outline of the target.
Figure 7.
The test results of model-DoP for the untrained target materials. (a) Ground truth with target-background as Steel-Paper; (b) Scattering DoP images; (c) Reconstructed images by the Model-DoP.
Figure 7.
The test results of model-DoP for the untrained target materials. (a) Ground truth with target-background as Steel-Paper; (b) Scattering DoP images; (c) Reconstructed images by the Model-DoP.
In addition, the targets can also be set as “Ink–Wood” targets, as shown in
Figure 8a, in which the background material is set as wood. The reconstruction results are demonstrated in
Figure 8c. From
Table 2, because the value of corresponding elements of wood and paper are similar, the model-DoP trained by “Ink–Paper” can distinguish the target and the background. Moreover, the difference between wood and paper impacts the result of the model trained by “Ink–Wood”, but it cannot affect the identification and recovery of the target globally. Although the model does not recover well for letter patterns, this problem should be solved by enriching target structures and materials in the training sets.
Finally, the targets have been set as “Steel-Wood” targets, as shown in
Figure 9a, in which the materials of target and background are set as steel and wood, respectively. The model reconstruction results are demonstrated in
Figure 9c. The wood background can be distinguished, but the texture cannot be restored. The steel target cannot be recovered with the complete structural information, but the difference in polarization characteristics of the edges can be captured. From
Table 2, it can be seen that the difference of corresponding elements of ink and steel is very large. Therefore, the target cannot be recovered very well because of that. When the material is not trained by the DL net, the performance of the target reconstruction will be reduced. The effect of the reconstruction is related to the difference of the polarization properties of the test material and the training material. Therefore, based on the sensitivity of the polarization characteristics of the target, the model, which is trained by the same material target, has a certain cross-material generalization for targets with similar polarization characteristics. It should be noted here that if we train more materials in the DL net, the reconstruction performances would be enhanced for different materials’ targets and backgrounds.
3.4. The Performance of Model-DoP on the Generalization of the Imaging Distance
Different materials have different polarization characters which can be described by a 4 × 4 matrix called MM. At the same time, when targets and scattering media are determined in a system, the MM of those will not change. Therefore, the trained Model-DoP is still able to reconstruct the targets with different imaging distances (the targets move within a certain range). Therefore, we have also explored the influence of targets at different locations by changing the imaging distances between the ground glass and the targets. We capture the scattering DoP images at the distances of d = 3.5 cm, 4.0 cm, 4.25 cm, 4.5 cm, 5.0 cm and 5.5 cm, and reconstruct the target images through the Model-DoP trained in the imaging distance of d = 4 cm. The results are shown in
Figure 10.
It can be seen that the Model-DoP can reconstruct targets at different imaging distances. When d = 3.5 cm, the information of images is enough to provide features for the Model-DoP, and the good retention of the target polarization information strongly improves the imaging quality. Besides, when d is longer than 4.0 cm, the Model-DoP still has a certain generalization ability which is because the model can still obtain some part of the targets’ polarization characteristics, allowing the target hidden behind the noise to still be reconstructed until d = 5.0. However, at the imaging distance of d = 5.5 cm, the model cannot reconstruct the target details, though it can still distinguish between the background and the target.
The trained Model-DoP by polarization information is less affected by scattering media because DoP carries stable target features. Therefore, when the target moves within a range, the Model-DoP can still reconstruct it, proving that our proposed method can be adapted to imaging with telescopic distance.
Table 3 shows the SSIM of recovered images with increasing imaging distances, where the SSIM is gradually decreasing; however, the magnitude of the decrease is relatively small, which verifies that the DoP can retain the transmitting polarization information in scattering media to improve the stability of the network.
3.5. Compared with the Model-I, Model-IX and Model-Q
DoP can filter out redundant information to a certain extent and focus on the polarization characteristics of targets. Then, the model with both accuracy and stability can be obtained with a small number of datasets. In order to prove that the DoP images as training data are better than I, I
X and Q images, we trained the network to obtain Model-I, Model-I
X, Model-Q and Model-DoP, respectively. Unlike before, we needed to exclude the compensatory effect of emitted polarized light on polarized images, making the difference between different results of the model more obvious. We took the polarizer off and got a series of data directly in natural light conditions. The compared results between those of Model-I, Model-I
X, Model-Q and Model-DoP have been investigated and are demonstrated in
Figure 11.
From
Figure 11b, it can be seen that targets and backgrounds obtained from Model-I
X can be distinguished; however, the contrast of recovered images is low, and the target structure is distorted, especially letter and graph targets. The I
X component also carries the polarization information of targets, but it also has too much redundant information, making the useful polarization information of targets less prominent. Therefore, in the case of the same amount of data, the network cannot efficiently capture the target polarization information for model building.
The contrast of the result from Model-I is better than that from Model-I
X from
Figure 11c, because the intensity is obtained by adding I
X and I
Y, which has more information than I
X alone. However, the background of the result of the Model-I has some noise, especially the edge section. It is precisely because the network trained by intensity cannot accurately distinguish different polarization characteristics.
Thanks to the Q component, which is the difference between I
X and I
Y, it will eliminate some effect of scattering. Therefore, in
Figure 11d, the background of the result from the Model-Q has less noise than that from model-I. However, part of the target may not be recovered completely when the gap between the test target and the training target is large, which may contribute to that the Q component may cancel out some target information when the polarization characteristics of the target are not very strong. Therefore, the model-Q has certain restrictions on the material. In
Figure 11e, the results of Model-DoP not only recover the goal, but also accurately distinguish the part with different polarization characteristics, although it is not a full reflection. Meanwhile, there is no need to consider the offsets of the targets’ polarization information in the DoP information. The quantitative comparison of the four models is shown in
Table 4, which further confirms the above information and discussion.