1. Introduction
Digital holography (DH) allows for capturing and reconstructing three-dimensional (3D) information from real-world objects using digital sensors and numerical processing methods [
1,
2,
3,
4,
5,
6]. It differs from conventional holography, which captures the interference pattern on film; DH uses electronic sensors like charged-couple devices (CCD) or complementary metal–oxide–semiconductors (CMOSs) to record these patterns. It offers much flexibility, as the holograms, when captured, can be subjected to numerical processing to remove unwanted noise and artifacts. DH is utilized in fields where high-accuracy 3D imaging is needed, including biomedical imaging and optical metrology [
1,
2,
3]. Recently, DH has become more adaptive and intelligent by integrating artificial intelligence (AI) and machine learning, predicting complex fields and reconstructing high-quality holographic images [
4,
5,
6].
Enhancing the resolution of holographic images in in-line digital holography (DH) is crucial in many applications, as it allows for amplitude and phase imaging. The core issues that make it hard to obtain high-quality images are zero-order and twin images. The phase-shifting digital holography (PSDH) technique is an effective method of improving image quality or removing noise. This method involves capturing multiple hologram images with different phase shifts to reconstruct the complex amplitude of the light field. Studies have shown that PSDH, which combines phase-shifting technology with digital holography, effectively eliminates the interference of zero-order and conjugate images [
7,
8,
9]. Traditional phase-shifting methods utilize mechanical movements of mirrors and components or electro-optic modulators with piezoelectric transducers to introduce phase shifts [
10,
11,
12]. Other studies have demonstrated methods by reducing the number of captures sufficient for accurate extraction of phase information [
13,
14]. They are prone to motion artefacts and misalignment errors between exposures. Researchers have developed parallel phase-shifting digital holography (PPSDH) algorithms to overcome these challenges. They used a phase-retarder array (e.g., phase-shifting array device made of birefringent material) that facilitates the acquisition of polarization-sensitive information [
15]. However, implementing this approach necessitates the design and fabrication of a complex phase mask, precise alignment of the mask within the optical system, and correction of any optical aberrations or diffraction effects introduced during the reconstruction process, which would require interdisciplinary expertise.
Researchers used a polarization-sensitive image sensor to overcome these problems with PPSDH because it simplifies the optical system and enhances performance [
16,
17,
18,
19,
20,
21,
22,
23]. The polarization image sensor eliminates the need to design and fabricate complex phase masks and their precise alignment. These sensors comprise four sub-pixels, each capturing different phase shifts, enabling the simultaneous acquisition of phase-shifted holograms in a single exposure without introducing additional optical aberrations or diffraction effects. This results in a more stable and efficient setup, improving image quality and making it particularly suitable for dynamic imaging applications where speed and accuracy are essential. Polarization sensitivity enables the detection of the polarization state of light, which provides crucial insights into the material properties and structural details of the observed objects [
21,
22,
23]. However, this configuration affects the acquired hologram’s effective spatial resolution or image size. Since four sub-pixels represent the information at a single spatial location, the effective resolution is reduced compared to the total number of pixels on the sensor. Specifically, the spatial resolution is approximately one-fourth of the sensor’s total pixel count.
To address the inherent low-resolution problem in hologram acquisition from the polarized-based image sensor, researchers have employed several hologram demosaicing techniques [
24,
25] to improve the resolution of the reconstructed hologram. Two primary techniques are used: the super-pixel method (SPM) and the hologram interpolation method (HIM). The SPM groups four neighboring pixels—each corresponding to one of the four phase shifts—into a single “unit cell”, effectively ignoring the blank pixels in the sparse sub-phase-shift holograms [
26,
27]; this method is easy to implement but reduces both the maximum lateral spatial resolution and the image resolution of the reconstructed image to half of what is achieved in temporal PSDH in both horizontal and vertical directions. In contrast, the HIM fills in the blank pixels using values from adjacent pixels; although it is more complex than the SPM, it avoids losing image resolution. A simulation is conducted comparing the root mean square error of reconstructed images using HIM with bilinear, bicubic, and B-spline interpolation methods [
25]. To address the limitations of both approaches, researchers have proposed hybrid and advanced techniques: By exploiting sensor-shifting or object-shifting strategies to capture multiple frames with sub-pixel displacements, algorithms can combine the four phase-shifted images to reconstruct a higher-resolution image beyond the sensor’s size.
Conventional demosaicing and interpolation techniques often fall short in context awareness, lack the ability to learn from data, and struggle to recover high-frequency details. While these methods are simple and computationally efficient, they lack the sophistication and adaptability of deep learning (DL)-based approaches. In the domain of polarization imaging, DL techniques have emerged as powerful tools for resolution enhancement. For instance, studies [
28,
29,
30] have proposed DL-based methods aimed at improving overall imaging quality by employing convolutional neural networks (CNNs) to process raw data from polarization sensors. These end-to-end frameworks have demonstrated significant improvements in imaging performance compared to traditional interpolation techniques. Additionally, in [
31], a novel DL-based demosaicing method was introduced, incorporating a unique “mosaiced convolution” operation and a tailored data acquisition approach for sensors with integrated micro-polarizer arrays. This method ensures accurate image reconstruction using CNNs, achieving enhanced spatial resolution and improved polarization fidelity.
Recently, DL has emerged as a pivotal tool in achieving super-resolution in holographic imaging. By training networks on large datasets of paired images, DL effectively addresses various inverse problems. It learns the relationship between input and target output distributions without requiring prior knowledge of the underlying imaging model. A recent approach involves utilizing CNN-based algorithms. The super-resolution convolutional neural network (SRCNN) was a groundbreaking method in the image super-resolution domain and has since undergone continuous improvements. These advancements include enhancements in the choice of loss functions, the design of upsampling modules, and the adoption of innovative network design strategies to tackle issues limiting the imaging resolution of holographic displays [
32,
33,
34]. In conventional digital holographic microscopy (DHM), the spatial resolution is constrained by diffraction, as it is limited by the spatial bandwidth product (SBP) of the imaging system. These neural network-driven techniques have been utilized to surpass the diffraction limit, thereby enhancing the spatial resolution of images beyond traditional optical constraints [
35,
36,
37,
38]. Additionally, generative adversarial networks (GANs) have been employed to improve the quality of holographic reconstruction. By learning the distribution of high-quality holographic images, GANs can generate super-resolved images from LR holograms, leading to significant improvements in both spatial resolution and image quality [
39]. In PSDH, where DH meets with polarization imaging, several studies have been conducted using DL techniques to enhance the reconstructed image quality by mitigating artifacts like artifacts and noises. In [
40,
41,
42,
43], by combining the precision of PSDH with the learning capability of neural networks, the method addresses challenges such as phase wrapping and noise suppression, resulting in high-quality phase reconstructions. This approach demonstrates the potential of deep learning to improve both accuracy and computational efficiency in holographic imaging.
In this work, we present a deep learning-based complex field extraction technique with a resolution equivalent to the total pixel count in the polarized-based image sensor. We utilized U-Net-based architecture [
44] to reconstruct HR images in the object plane by extracting complex field information at the hologram plane. The dataset used for training and testing the neural network was generated numerically in MATLAB considering the sub-pixel structure of the micro-polarized image sensors. Extracting the complex field at the hologram plane offers significant advantages. Since the complex field in this plane closely resembles the interference pattern recorded there, it allows for higher-quality extraction compared to extracting the complex field directly in the object plane. This improved extraction quality relies on four predicted HR phase-shifted interference patterns obtained from the network. Unlike conventional approaches, where extracting the object’s complex field at the object plane requires numerically backpropagating the input interference pattern from the hologram plane to the object plane to generate the training dataset pair (input and ground truth), the proposed method offers a more flexible approach. This conventional process inherently ties the training dataset to a specific numerical propagation algorithm, thereby limiting its generalizability. In contrast, our method directly extracts the object’s complex field at the hologram plane, eliminating the need for numerical backpropagation during dataset preparation and enhancing the adaptability of the trained network. This capability enables the reconstruction of objects at multiple depths without being constrained by DC and conjugate noise artifacts. Once the complex field is extracted at the hologram plane, it can be numerically propagated to any desired distance to reconstruct the object without the presence of DC and conjugate terms. By training the network with real-valued interference patterns, our method enhances holographic imaging, simplifies computational complexity, and avoids the challenges associated with processing complex-valued amplitude and phase information. Notably, the model is trained using only a single pair of LR and HR interference patterns for all examples in the dataset. Once trained, the model can use the LR counterparts as input to predict all four HR phase-shifted interference patterns. Overall, our approach not only improves the quality of complex field extraction but also offers greater flexibility and efficiency in holographic image reconstruction, making it a promising solution for HR holographic imaging applications.
Validation demonstrates the successful reconstruction of higher-quality images at specific depths using the extracted complex field information derived from the predicted four HR interference patterns. Note that the proposed methodology is validated solely through numerical simulations performed in MATLAB and not with optically acquired training or test data from the polarization image sensor. However, the practicality of the proposed approach is demonstrated through the presented numerical simulation results.
2. Methods
Our proposed method utilizes a U-Net-based architecture that considers the sub-pixel structure of a micro-polarized image sensor to extract HR complex field information at the hologram (sensor) plane from predicted HR interference patterns, as illustrated in
Figure 1.
The camera is equipped with linear micro-polarizers oriented at angles of 0°, 45°, 90°, and 135°, each aligned with a corresponding image sensor pixel. This one-to-one correspondence ensures the camera’s full resolution is effectively utilized, capturing detailed phase information across the sensor. When object and reference waves with opposite circular polarization directions interfere, the phase shifts recorded by the polarized image sensor are 0, π/2, π, and 3π/2 radians, corresponding to micro-polarizer orientations of 0°, 45°, 90°, and 135°, respectively. Essentially, each micro-polarizer orientation filters the incoming light to capture a specific phase shift of the interferogram. At the hologram plane, the polarization camera records the intensity distribution resulting from the interference between the object and reference beams for the target image located at the object plane. By extracting pixels that correspond to the same phase shifts, we obtain four LR phase-shifted interference patterns. These extracted patterns are denoted as
corresponding to the phase shifts of 0, π/2, π, and 3π/2, respectively. Due to the spatial arrangement of the micro-polarizers and the subsequent phase delays, these four LR interference patterns have half (
) the actual resolution (
) of the original polarized image sensor matrix. This means each pattern captures only a portion of the total information, resulting in reduced resolution. By applying the four-step phase-shifting digital holography technique to these patterns, we can calculate the complex amplitude distribution of the object wave at the hologram plane. However, the resolution of this complex amplitude distribution is only one-quarter of the total pixel count of the original interferogram captured by the polarized camera. This reduction is due to the combination of the four LR patterns into a single complex field representation. Our method addresses this limitation by leveraging the U-Net-based neural network to predict HR phase-shifted interference patterns denoted as
from the corresponding LR input as shown in
Figure 1. By considering the sub-pixel structure of the micro-polarized image sensor, the network effectively reconstructs higher-resolution complex field information at the hologram plane. This approach enhances the resolution of the complex amplitude distribution without requiring changes to the physical hardware, thereby improving the quality of holographic reconstructions.
In this study, we used MATLAB to simulate the interference between the object and reference beams. The overall process flow of our proposed method is illustrated in
Figure 1. While various diffraction calculations—such as the Fresnel and Fraunhofer diffraction equations—can be used to simulate the interferogram at the hologram plane, we opted for the angular spectrum method to calculate diffraction during both the recording and reconstruction stages [
29,
30]. To implement the digital holography recording process, we added specific phase shifts to the reference beam to obtain four phase-shifted holograms. Each of these HR phase-shifting holograms corresponds to a single-phase shift resulting from the interference of the object and reference beams. The intensity of each interference pattern was calculated by squaring the magnitude of its corresponding HR phase-shifted hologram. These high-resolution interference patterns serve as the ground truth for training our neural network. To simulate the micro-polarizer array’s arrangement, we performed decimation to extract the corresponding pixel points from the four HR phase-shifting interference patterns. However, this process resulted in LR interference patterns that lost their original spatial resolution. To address this, we applied linear interpolation to the LR phase-shifted patterns to match the pixel count or resolution of the HR patterns, accommodating the synchronous architecture of the U-Net. This ensures that the input and output dimensions align, allowing for the neural network to effectively learn the mapping from low to high-resolution interference patterns. Linear interpolation was chosen for its simplicity, computational efficiency, and ability to preserve the spatial and phase relationships in the LR phase-shifted patterns. It introduces minimal artifacts and ensures the network is trained on data that closely represent the true characteristics of the decimated patterns. Later in this study, simulation results will confirm sufficiency for producing HR holographic image reconstructions, while alternative methods such as cubic or spline interpolation can be explored in future work.
The process of decimation and interpolation is shown in
Figure 2. The interpolated low-resolution phase-shifted interference patterns will now be represented as
corresponding to the phase shifts of 0, π/2, π, and 3π/2, respectively. The complex field
CF will then be calculated using generalized Equation (1), where
I represent an interference pattern with the respective phase shifts. However, the reconstructed image quality for the object’s complex amplitude is degraded when the interpolated low-resolution interference pattern is utilized. We will use these interpolated low-resolution interference patterns as input to the network. The choice of micro-polarizer angles at 0°, 45°, 90°, and 135° ensures orthogonality, providing optimal sampling of the polarization state across the entire phase space. This configuration evenly distributes the phase information, enabling accurate reconstruction of the complex field without redundancy or data loss.
To validate the proposed network structure, the network is trained to predict the real-valued HR interference pattern with phase shift corresponding to the given LR phase-shifted interpolated input. The network is trained only with a single pair of low- and high-resolution interference patterns for all the images in the training dataset. This also reduces the memory requirement for the deep learning network. At the prediction stage, all four pairs of interpolated low-resolution phase-shifted interference patterns from the test dataset are given as input to predict all four corresponding HR phase-shifted patterns. The simulations are run in MATLAB. We first train and test the network using the simulation data generated by numerical propagation, as shown in
Figure 3. The complete numerical process, starting from the target images in the given dataset to the calculation of the complex field information, utilizing simulated input LR and ground truth HR phase-shifted interference patterns, is summarized in Algorithm 1.
Algorithm 1. Low- and high-resolution training data acquisition and complex field calculation. |
Required | Ground truth HR and Input LR phase-shifted Interference patterns. |
Input | ; four LR interference patterns. |
Ground Truth | ; four HR interference patterns. |
| Target image resolution. |
| Total number of images in original training dataset. |
| Target image in dataset. |
| Hologram for the target image at hologram plane. |
| The Fourier and inverse Fourier transform operator. |
| . |
| = 1, 2, 3, 4). |
| }. |
| {0, π/2, π, 3π/2}. |
| : intensity of ground truth HR and decimated interference patterns. |
| : intensity of interpolated LR phase-shifted interference patterns. |
| in polarization imaging sensor. |
| Ground truth HR and input LR complex field information. |
Steps | Procedure |
1 | Four HR ground truth interference patterns at hologram plane. |
1.1 | do |
1.2 | in the dataset, apply resizing. |
1.3 | . |
1.4 | |
1.5 | . |
2 | Four LR input interference patterns at hologram plane. |
2.1 | . |
2.2 | . |
2.3 | end |
3 | Complex field information at hologram plane |
3.1 | , respectively, to obtain complex field information at hologram plane. |
We multiplied target images at the object plane with the random phase. The light from the object scene is numerically propagated and interfered with a reference plane wave. The phase-shift amount is added to the reference wave to obtain four phase-shift holograms. The intensity is then calculated by taking the square of the magnitude of the phase-shift holograms. For training the network, we use one of the object’s four phase-shifted HR interference patterns at the hologram plane as the ground truth. Specifically, in all our simulations, we consistently used image pairs with zero phase shifts. The corresponding LR interference pattern—produced after decimation and interpolation as previously described—serves as the input to the network. The object distance from the hologram plane is 16 mm. The resolution of the target image at the object plane is 128 × 128. The wavelength of the light and pixel pitch for the interference pattern at the hologram plane or polarized image sensor are set to 532 nm and 10 um, respectively.
We utilized a convolutional neural network (CNN) based on U-Net architecture, as shown in
Figure 4. U-Net is favored for its simple structure and fast learning speed, making it widely used in holographic applications. The network comprises two main components: an encoder and a decoder. The encoder captures context and extracts features from the input data, while the decoder refines spatial information to produce high-resolution outputs. To ensure the preservation of fine details during upsampling, we incorporate skip connections between the corresponding layers of the encoder and decoder. These connections facilitate the flow of feature maps directly from the encoder to the decoder, having accurate regression results by combining low-level and high-level features.
In our implementation, the encoder processes an input of size (where are height and width of an image for a single-channel input) through a series of convolutional blocks, each containing two 3 3 convolutions, batch normalization (BN), and rectified linear unit (ReLU) activation. After each convolutional block, the spatial resolution remains the same , but the number of feature channels increases (e.g., 64, 128, 256, 512) with encoder level, . The spatial resolution and are reduced by a factor of 2 after each max pooling operation, resulting in size of , as the depth increases. The bottleneck has the smallest spatial dimensions and the largest number of feature channels , where . In the decoder, the feature maps are unsampled by a factor of 2 using transposed convolutions (also called deconvolutions) with a stride of 2, increasing the spatial resolution to as the decoder level increases,. Skip connections concatenate the corresponding encoder feature maps with the decoder maps , resulting in combined feature maps of size . The final output has dimensions , for a single-channel prediction. We chose the ReLU function for non-linear activation in the convolutional layers, introducing non-linearity to help the network learn complex patterns. To regularize the network and prevent overfitting, we apply a 50% dropout rate to the encoder’s last convolutional layer and the decoder’s first convolutional layer. The loss function is defined as the sum of mean square errors (MSEs) between the ground truth and predicted HR interference patterns. To implement this, we replace the SoftMax and segmentation layers present after the last convolutional layer of the original U-Net with regression layers that compute the real-valued MSE. The depth of both the encoder and decoder parts of the U-Net is set to four, meaning there are four levels of downsampling and upsampling. This depth allows for the network to capture a wide range of feature scales. For training, we set the batch size to 30 and the number of epochs to 50, balancing computational efficiency with the need for sufficient training iterations.
Figure 5 shows the training and prediction stage of the proposed method. Here, train and test images are taken from the MNIST dataset. Once trained from a single pair of interference patterns, the proposed network can reconstruct all four interference patterns with varying phase shifts for the corresponding input phase-shifted pattern in
Figure 5b. Note that the network-predicted interference patterns must be shifted accordingly by one-pixel along the x- or y-axis due to the sub-pixel structure of the polarized image sensor before calculating the final complex field information using Equation (1). The amplitude and phase of the low-, predicted high-, and high-resolution complex field information are shown in
Figure 6. The figure shows the clear distinction between the low and the predicted high-resolution complex field and its similarity to the ground truth high-resolution complex field.
Figure 7 illustrates the detailed flow diagram of the proposed methodology. The diagram outlines the sequential steps involved in calculating the high-resolution complex field information using the network’s predictions of four high-resolution phase-shifted interference patterns, derived from the corresponding network’s input of four low-resolution phase-shifted interference patterns. Additionally, the flow diagram methodically depicts the steps for acquiring both input and ground truth phase-shifted interference patterns from the target images in the MNIST dataset.
Our trained network also demonstrates robustness to variations in random phase distributions and transverse shifts applied to the LR phase-shifted input patterns, as illustrated in
Figure 8. In
Figure 8a, we observe that when provided with a LR interference pattern as input with zero phase shift, the network successfully predicts the corresponding HR phase-shifted interference patterns across different random phase distributions at the object plane. Similarly,
Figure 8b shows that the network accurately predicts HR patterns with matching transverse shifts when different transverse shifts are applied. This indicates that our method effectively handles variations in phase and position, ensuring reliable reconstruction under diverse conditions.
3. Simulation and Result Analysis
The proposed method is verified numerically with simulations carried out in MATLAB. For image reconstruction at the object plane, the diffraction calculation is applied to the extracted complex field
from the four predicted HR interference patterns. Several diffraction calculations, such as the Fresnel and Fraunhofer diffraction equations, can be used. We used the angular spectrum method [
45] to provide the reconstruction image at
z using the following equation:
where IFT denotes the inverse Fourier transform. The
are the spatial coordinates in object and hologram plane. The
are the spatial frequency coordinates in the hologram plane after the Fourier transform is performed on the extracted complex field to derive
followed by multiplication of the phase term. Finally, the inverse Fourier transform gives the reconstructed image
at the propagation distance of
z.
Figure 9 presents examples of image reconstruction using our proposed method, where the network is trained on the MNIST dataset. In
Figure 9a,b, the top rows display the object’s complex field amplitude and phase in the hologram plane. The bottom rows show the reconstructed images in the object plane, obtained by numerically propagating the complex field from the hologram to the object plane at a depth of 16 mm. We observe that the image reconstruction quality is improved through the training of our proposed network. For quantitative evaluation, intensity normalized cross-correlation (INCC) and peak signal-to-noise ratio (PSNR) values were calculated. They are provided alongside the corresponding reconstructed images in the figure below. The high-resolution reconstruction results obtained using the proposed method demonstrate an enhancement in image quality compared to the low-resolution reconstruction results. This improvement is evident in the higher values of INCC, which indicate a better structural match in intensity patterns. The PSNR also shows improvement over the low-resolution reconstructions by about 4 dB. Note that the PSNR measures the pixel-level accuracy, which is likely more affected by intensity variation caused by random phase distributions. In contrast, INCC remains robust to these random phase distributions and is primarily sensitive to the relative similarity of intensity patterns. This makes it a more reliable metric for evaluating structural fidelity in this context.
Figure 10 showcases the ability of our proposed network to reconstruct the object’s complex field from predicted interference patterns at varying distances from the hologram plane, effectively eliminating DC and conjugate noise. Even though the network was trained exclusively for a single depth of 16 mm from the hologram plane, it shows good performance within a certain depth tolerance at 12 mm and 14 mm distance. The reconstructed complex field at the hologram plane and the amplitude at the object plane closely match the ground truth, within small deviations from the network’s training depth. INCC and PSNR values are given for quantitative evaluation. However, we acknowledge that reconstruction quality degrades as the reconstruction distance deviates further from the training depth. This highlights the method’s robustness within small depth tolerance and indicates the need for further improvements to handle larger deviations.
To validate the proposed method, we trained the network on different datasets. Training the network on different datasets helps it learn to handle various types of data, making it stronger and more adaptable to work well beyond just one dataset. We trained our network on MNIST, FMNIST, and a custom-built dataset derived from the high-resolution digital USAF 1951 resolution test target image. The custom dataset initially comprised 75 randomly extracted patches from the USAF target image. We augmented these patches using rotation, translation, and resizing techniques, expanding the dataset to 45,000 images. The number of images we consider in training for MNIST and FMNIST datasets are 5000 and 40,000, respectively. The network’s input and ground truth data are acquired through the process described in
Section 2. The network is trained to predict the interference patterns for a size of 128 × 128 with an encoder and decoder depth of 4, a total epoch size of 30, and a batch size of 24. Once trained, the network generalized well for different input test data. We tested our network for larger input images.
Figure 11 shows the larger image of size 576 × 576, created by combining nine random images from the FMNIST test dataset in a 3 × 3 matrix. The proposed network works in a sliding window over four larger LR phase-shifted interference patterns, predicting four corresponding HR phase-shifted interference patterns. The images are divided into 8 × 8 overlapping patches of size 128 × 128 pixels. The stride between patches is 64 pixels (half of the patch size), resulting in an overlap of 64 pixels. We can see in the figure below that the amplitude and phase of the extracted complex field for the predicted interference patterns have zero boundaries around them. We considered only the center region of each predicted patch with a size of 64 × 64. The neural network processes each input patch to predict a corresponding 128 × 128 real-valued interference pattern with a respective phase shift. We considered only the central region of each output patch as CNNs are sensitive to the information available at the edges of input patches, which can cause edge artifacts, leading to less accurate reconstruction. Focusing on the central region during sliding window operations can minimize the influence of boundary-related inaccuracies or discontinuities in the final image reconstruction. The final complex field is then calculated, enabling a four-step shifted digital holography technique. Note that a one-pixel shift along the x- or y-axis due to the sub-pixel structure of the polarized image sensor should be considered before calculating the final complex field information using Equation (1).
Figure 12 shows the simulation results for the USAF 1951 resolution test target image. In this simulation, the original image size is 4416 × 4692 pixels. We divide it into overlapping patches of size 128 × 128 pixels like the previous case. These patches slide over the input image, resulting in a total of approximately 72 × 72 patches being processed by the network. We observe that our proposed method significantly enhances the quality of the reconstructed images. We achieve improved image reconstructions by utilizing the network’s high-resolution phase-shifted interference predictions to extract the complex field. This enhancement is particularly evident in the last row of
Figure 12, which displays zoomed-in portions of the reconstructed images. The results demonstrate that our method effectively improves the clarity and detail in the reconstructed images, confirming its efficacy in enhancing image quality.
Table 1 shows values for quantitative evaluation of low- and predicted high-resolution reconstruction images presented in
Figure 11 and
Figure 12.
4. Discussion
We proposed a simulation-based study utilizing U-Net-based architecture to extract HR complex field information at the hologram plane from four network predicted HR phase-shifted interference patterns. This study addresses the inherent low-resolution issue in hologram acquisition enabling four-step PSDH at the polarized-based image sensor. Extracting a complex field directly at the hologram plane rather than the object plane for a given interference pattern at the hologram plane offers more flexibility and adaptability in preparation of the training dataset while removing dependency on a specific numerical propagation algorithm. Also, the complex field at the hologram plane closely resembles the recorded interference pattern; this method results in higher-quality complex field extraction. The extracted complex field can then be numerically propagated to any desired distance, enabling object reconstruction without the presence of DC and conjugate noise artifacts. Our method simplifies computational complexity compared to handling complex-valued amplitude and phase data by training the network on real-valued interference pattens. Notably, the model is trained using a single pair of LR and HR interference patterns for all examples, and once trained, it can predict all four HR phase-shifted interference patterns from their LR counterparts. This approach demonstrates efficient and high-quality complex field extraction, making it a promising solution for HR holographic imaging applications.
We validate our proposed method with the training and test datasets generated in MATLAB, simulating phase-shifted interference patterns at the hologram plane, accommodating the sub-pixel structure of micro-polarized image sensors. The results confirm the effectiveness of the method in reconstructing higher-quality images at specific depths. The artifacts observed in the proposed zoomed-in reconstruction of digit “5” in
Figure 12 could be from the patch-based approach used to synthesize the final complex field at the hologram plane. The network predicts 128 × 128 patches with overlapping regions (72 × 72), which are combined to form 4416 × 4692 phase-shifted interference patterns. This process can introduce boundary inconsistencies and ghosting effects due to the lack of phase constraints or regularization designed to minimize artifacts in the final large-scale reconstruction. Additionally, there are a few limitations to consider. Firstly, the method has not yet been validated on optically acquired test data. Secondly, noise inherent in real-world holographic imaging (such as sensor noise, object artifacts, and environmental disturbances) has not been incorporated into the training process, which could affect the performance of the network.
The selection of experimental training and test data is crucial for ensuring the robustness and generalizability of the proposed method in future experimental work. It should include diverse training data that account for variations in scattering conditions, object sizes, and angular spectrum distributions. This diversity will help the network generalize well to real-world scenarios and mitigate the domain shift effect. The method is designed to remain robust to changes in the complex amplitude distribution of the reference wave, provided the phase-shifted interference patterns accurately capture the object and reference wave interactions. However, significant deviations in reference wave properties may require retraining or additional data. Additionally, training convolutional neural networks (CNNs) on experimental data can be challenging due to the difficulty and impracticality of acquiring large, high-quality datasets. The current model, trained solely on simulated data, may not generalize well to real-world scenarios without further adaptation.
The choice of orthogonal micro-polarizer arrays enables accurate image reconstruction of complex fields, exploring alternative micro-polarizer angle configurations could be considered in future studies to assess their impact on network’s generalizability and reconstruction accuracy. Moreover, performing a comparative analysis with other deep-learning networks, such as fully convolutional neural networks (F-CNNs) and ResNet, can help benchmark the performance of our U-Net-based approach.
By addressing these areas, the proposed method can be further refined and adapted for practical applications, enhancing its utility in high-resolution holographic imaging utilizing polarization image sensors.