1. Introduction
Acquiring high-spectral-resolution multispectral images from low-resolution sensors represents a significant challenge in various fields, including remote sensing, medical imaging, precision agriculture, and scene analysis [
1]. High-resolution multispectral images provide a more detailed and accurate analysis of observed scenes, which is important for environmental monitoring typo and disease-detection applications [
2]. The acquisition of multispectral images through systems based on CRI, Hyspex, etc., provides images of 30 to 100 bands, offering unparalleled information [
3]. Each spectral band makes it possible to detect specific characteristics of objects that are not visible to the naked eye. As in precision agriculture, these images can help identify areas requiring particular treatments. They can detect tissue abnormalities in healthcare and find applications in many areas. However, its devices are not fast and poorly suited to slow-motion objects. 8-band one-shot MSFA (multispectral filter array) cameras provide a compact, cost-effective solution for spectral data acquisition. However, they produce images of low spectral resolution. This band limitation makes capturing the details needed for in-depth analyses difficult. Therefore, although these sensors are accessible, the quality of the data they provide is insufficient for many critical applications. Thus, as part of the European CAVIAR project, which aims to improve image-detection technologies in various sectors, notably health, agriculture, etc., by relying on advanced imaging techniques, the use of an 8-band MSFA filter was used for the camera setup to capture a broader spectrum of light [
4,
5]. However, the transition to a high-spectral-resolution system (31 to 100 bands) requires more sophisticated and expensive sensors, which are slow, limiting their accessibility. This is why estimating the spectral response of 8-band one-shot MSFA cameras raises significant technical challenges. In this context, deep learning methods are a promising solution. This study focuses on applying advanced deep learning techniques to improve the quality and accuracy of the spectral response of 8-band one-shot cameras to approximate hyperspectral systems. By harnessing the power of convolutional neural networks (CNN) and other deep neural-network architectures, it is possible to transform data captured by low-resolution sensors into high-quality multispectral images [
6]. This study aims to demonstrate the potential of deep learning techniques to approximate the spectral response of 8-band one-shot cameras, making this technology more accessible and applicable to a wide range of domains. The results obtained open new perspectives for improving sensor systems and image analysis methods, thus contributing to significant advances in research and practical applications.
The specific contributions of our study are as follows:
We have implemented image datasets that contain 8-band images as input and 31-band images as output. We retrieved the images from the TokyoTech [
7] database, which offers 31-band multispectral images. Using a Gaussian projection, we generated 8-band images from this data. These images are then mosaicked and demosaicked to achieve a quality comparable to the camera developed as part of the CAVIAR project. This approach creates a robust dataset for training and evaluating models for estimating the spectral response of 8-band one-shot cameras.
We have developed several models based on deep-learning algorithms, including very deep super-resolution (VDSR), deeply recursive convolutional networks (DRCN), and Laplacian pyramid super-resolution networks (LapSRN). These models were initially used for super-resolution and then adapted to approximate the spectral response. These models are powerful tools for approximating the spectral response of 8-band one-shot MSFA cameras. By exploiting the advanced architectures of these networks, we aim to significantly improve the quality of the spectral response obtained with these cameras.
We compare the algorithms used for building the models by evaluating them using specific metrics such as structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) to measure their performance. Furthermore, a visual comparison between the predicted spectral responses and the ground truths is performed to validate the effectiveness of the proposed methods. This approach allows us to identify the best-performing models and optimize techniques for approximating the spectral response of 8-band one-shot MSFA cameras.
The rest of our manuscript is organized as follows:
Section 2 deals with the state of the art regarding the spectral response of multispectral cameras.
Section 3 details the materials and methods used in our approach.
Section 4 presents the experiments carried out and the results obtained.
Section 5 provides a discussion of these results.
Section 6 addresses the threats to validity. Finally,
Section 7 presents our conclusion.
2. Related Work
Several works have been carried out in the reconstruction of images, and we can cite, among others, Toivonen et al., who developed a practical method for estimating the spectral response of camera sensors and the associated uncertainty using a specific imaging process. This method relies on a restricted set of 15 images, including diffraction images and color patches of known spectra, to obtain high-resolution spectral response estimates. The approach is versatile and applicable to any camera sensor in the visible domain. It does not impose strict constraints on the possible estimates, thus ensuring high precision of the results, which agree with previous estimates [
8]. Darrodi et al. conducted a study to test the performance of different sensor estimation techniques from the literature. They used measured and synthetic data, which they then compared to reference data to evaluate the effectiveness of these techniques on two cameras [
9]. Han et al. proposed an estimation method based on a single image and exploiting fluorescence without requiring prior knowledge of the illumination spectrum. Under different light sources, the spectral distributions of the fluorescence emitted by the same material remain constant, up to one scale. Thus, the camera response to fluorescence maintains the same chromaticity. By relying on this chromatic invariance, they estimated the camera’s spectral sensitivity under arbitrary illumination, the spectrum of which is unknown. Through numerous experiments, they demonstrated the accuracy of their method under various lighting conditions. Additionally, they showed how to recover daylight spectra from the estimated results and then use the cameras’ estimated spectral sensitivities and daylight spectra to solve color correction problems [
10]. Zhao et al. studied spectral features using singular value decomposition (SVD) to extract basis functions. They collected data from the literature and direct measurements of the sensitivity of different cameras. Their paper compares the extracted basis functions to identify the optimal set with other mathematical basis functions. These optimized basis functions can then be used to estimate the unknown spectral sensitivity of any camera [
11]. Prasad et al. proposed a quick and straightforward method to obtain a reasonable approximation of camera-response functions using only a color chart and unknown, random lighting. This approach is made possible by carefully designing the cost function, which imposes several constraints to make the problem feasible. Among the components of this cost function, the Luther condition serves as a global shape prior, while commercial lighting systems with unknown spectra provide narrow spectral windows for local reconstruction. The quality of the rebuilding obtained is comparable to that of methods using known lighting [
12]. Matanga et al. developed an innovative method to approximate the spectral response of color images. They introduced an approach based on neural networks, aiming to improve the two previous methods: those using circular and exponential functions and those based on the inverse of Penrose or Wiener. This new method can also be applied to calibrating most color scanning systems and subwavelength filters [
13].
When we talk about a camera’s spectral response, we also talk about the quality and reconstruction of the images. Chunwei et al. thus conducted a comparative study on GANs from different angles to address these issues. They began by reviewing recent advances in GANs and then presented popular architectures suitable for image super-resolution, both for small and large samples. They analyzed the motivations, implementations, and differences of optimization methods based on GANs and discriminative learning, including supervised, semi-supervised, and unsupervised approaches. In addition, a comparison of the performance of these GANs was carried out on public datasets, accompanied by a quantitative and qualitative analysis [
14]. For their part, Vizilter et al. proposed a new approach to interconnect multispectral images via diffusive morphology techniques to improve aviation vision systems. Unlike traditional morphological methods, this approach does not require image segmenting to describe its shape and offers strong resistance to noise and distortions. They developed procedures for interconnecting two- and three-channel images using diffusive morphological filtrations and an image pyramid. A promising prototype of this improved vision system was tested with multispectral data collected during an aeronautical test [
15]. Andriyanov et al. focused on the problem of filtering multispectral images by studying the effectiveness of filtering image sequences. They analyzed variations in error variance as a function of the number of images in the sequence and the inter-image correlation coefficient. To statistically model these images, they used doubly stochastic random fields [
16]. Furthermore, Andriyanov et al. explored the restoration of images under conditions where a portion of the observations is subject to additive noise, or more precisely, the restoration of randomly sampled images with pixels affected by white Gaussian noise. They developed nonlinear filters based on deep doubly stochastic Gaussian models, which were more effective than linear methods and classical algorithms. These filters allow images to be restored using only 50% of the data with a relative error rate of 9% [
17]. These studies show the importance of robust and innovative approaches to improve image quality, particularly in multispectral vision and image reconstruction, where precision and resistance to noise are essential.
Most existing methods for approximating spectral responses must be adapted to our specific problem of 8-band MSFA one-shot cameras. Our article presents a new approach based on deep learning to address this challenge.
3. Materials and Methods
The experiments were conducted using Python programming at the ImViA laboratory on a DELL desktop computer. This system has an Intel(R) Core i7-10700 CPU running at 2.90 GHz, with 32 GB of RAM and an NVIDIA Quadro P400 GPU. The models were configured in Python version 3.8.8, utilizing Keras API version 2.4.3 and TensorFlow version 2.3 as the backend, with CUDA 10.1 and cuDNN 7.6 dependencies for GPU acceleration.
3.1. Dataset
Our dataset comes from the multispectral database called TokyoTech. This dataset contains 31-band multispectral images from 420 nm to 720 nm at 10 nm intervals. Images were captured using a monochrome camera with a VariSpec liquid crystal tunable filter (VariSpec VIS). This dataset is composed of 35 different objects, each containing 31 spectral bands [
7].
Figure 1 shows an extract of some images from the database.
We divided each image into 4 segments from this dataset to create a more extensive database for our experiment. This will allow us to obtain a dataset of 140 images containing 31 spectral bands.
Figure 2 shows some images from the new dataset.
After splitting the data, we used a Gaussian projection, an ideal method to model the MSFA filters of the color shade hybrid sensor and obtain spectral responses from 31 to 8 bands. This projection reduces the dimensionality of the data while retaining essential spectral characteristics, thus facilitating the precise estimation of spectral bands from a reduced number of initial bands.
Figure 3 shows the transformation of the 31 bands into 8 bands.
Once the Gaussian projection is performed on the 31-band images to reduce them to 8 bands, we move on to the mosaic creation step. This mosaic results from assembling the 8 image strips to form a larger, more informative composite image. This mosaic is then demosaicked using the bilinear method. This bilinear method makes it possible to restore best the images captured by the camera developed as part of the CAVIAR project while preserving spectral details. Using bilinear demosaicking ensures a more faithful representation of the original data, which is essential for subsequent analyses and applications.
Figure 4 shows the images obtained before and after the operation.
Mosaicking is a fundamental process that creates a composite image from multiple-pixel images captured at different wavelengths. This process takes place in several key stages:
1. When capturing an image with our multispectral camera, several individual images are generated. In our case, we obtain eight distinct images, each representing pixels with similar spectral properties. Although they provide detailed information about the observed scene, these images remain incomplete due to their dispersion.
2. The captured pixels are grouped according to their spectral properties. This grouping is essential to reduce data redundancy and improve the clarity of analysis.
3. To reconstruct a fully defined composite image, we rely on our MSFA (multispectral filter array) model [
18]. This powerful tool generates mosaics from scattered images while preserving spectral information.
Demosaicking, a crucial part of the process, separates the composite image obtained during mosaicking to find the dispersed images. For each sparse image, demosaicking uses interpolation techniques to estimate the missing pixel values. In our case, we employed bilinear or bicubic interpolation, which evaluates missing values based on neighboring pixels. Once the missing values have been interpolated, the different scattered images are merged to create the demosaicked image.
Figure 5 illustrates the mosaicking and demosaicking process.
Figure 6 shows mosaic images from the dataset.
Figure 7 summarizes the process of preparing our data for the experimentation of our study.
3.2. Deep-Learning Algorithms
Image reconstruction is a growing field in image processing, aiming to improve image quality with finer and more precise details. Deep-learning algorithms have progressed considerably, enabling more accurate and detailed image reconstructions. Among these algorithms, several innovative approaches stand out for their ability to exploit deep and complex architectures to improve images [
19].
In our study, we modify the architecture of three algorithms to meet our specific needs: very deep super-resolution (VDSR), Laplacian pyramid super-resolution network (LapSRN), and deeply recursive convolutional network (DRCN).
Each of these algorithms provides a unique method for tackling the reconstruction problem by exploiting their architectures’ specific characteristics to optimize the reconstructed images’ quality. When these algorithms process images, they must understand and model how each spectral band contributes to the final image. The spectral response of the cameras plays a role in this process, as it directly influences the quality of the input data for these algorithms. An accurate and well-calibrated spectral response allows algorithms to capture better and restore spectral details, thereby improving the quality and fidelity of reconstructed images.
3.2.1. Very Deep Super-Resolution (VDSR)
VDSR is a sophisticated deep-learning technique designed to improve image resolution. It uses a convolutional neural network (CNN) architecture of 20 layers, significantly more profound than traditional networks. This depth allows VDSR to effectively capture complex features and high-frequency details in images [
20].
The architecture includes the following steps:
Input processing: An 8-band multispectral image is fed to the first convolutional layer, which uses 64 filters of size 3 × 3 and applies a ‘same’ padding to maintain the dimensions of the input image. A function of ReLU activation is applied to introduce nonlinearity.
Hidden layers: The image output from the first layer will be passed through 19 convolutional layers, followed by a final convolutional layer, which uses several filters equal to the number of output channels and applies ‘same’ padding. (20th layer). Each convolutional layer uses 64 filters of size 3 × 3 to extract features from the image.
Residual learning: Another convolution is applied to the input image to bring it into the desired output shape. Residual addition combines the final layer’s outputs and the input image’s convolution. This will help the model learn the residual details needed to improve the image resolution.
The equation resulting from this model is the following:
I is the input image.
K1, K2, …, Kn is the convolution filters.
Let R be the result of the last convolution layer.
Let O be the output image.
Algorithm 1 presents the pseudocode of the VDSR algorithm.
Algorithm 1: VDSR |
Input: I, K1, K2, …, Kn, , Output: O Step 1: Initialize the input Step 2: Apply the sequence of convolutional layers For i = 1 to 18: Step 3: Compute the result of the last convolutional layer Step 4: Compute the residual connection Step 5: Calculate the final output image |
Figure 8 shows the VDSR architecture implemented.
3.2.2. Deeply Recursive Convolutional Network (DRCN)
The deeply recursive convolutional network (DRCN) is an advanced convolutional neural-network architecture for image super-resolution. This model exploits deep recursion to improve the quality of images by increasing their resolution. DRCN is based on applying the same series of convolutional layers recursively, multiple times, to extract and refine image features gradually. This recursion allows the model to be deep and efficient regarding parameters [
21]. Its architecture is as follows:
Input Layer: The low-resolution image is first fed into a convolutional input layer, initialing feature extraction.
Recursive Convolutional Blocks: The core of DRCN consists of several convolutional blocks that are applied recursively. Each block includes one or more convolutional layers followed by ReLU activation. The same set of layers is used multiple times to input the image.
Recursion: Instead of directly stacking numerous convolutional layers, DRCN reuses the same layers to obtain an adequate depth without significantly increasing the parameters.
Fusion Layer: Each recursive application’s outputs are combined and fused to create a comprehensive, multi-scale image representation. This is a testament to DRCN’s design’s power to capture the full complexity of the input.
Reconstruction Layer: Finally, a reconstruction layer takes the fused output and generates the final high-resolution image.
The equation resulting from this model is the following:
I is the input image.
K is the convolution filter for feature extraction.
K2, K3,…, Kn are the recursive layers’ convolution filters.
Kfinal is the convolution filter for the reconstruction layer.
F is the extracted feature maps.
Ri is the result of the i-th recursive layer.
O is the output image.
Algorithm 2 presents the pseudocode of the DRCN algorithm.
Algorithm 2: DRCN |
Input: I, K1, K2, K3, …, Kn, Output: O Step 1: Feature extraction using the first convolutional layer
Step 2: Initialize the first recursive layer
Step 3: Apply the recursive layers For i = 1 to 16
Step 4: Reconstruction layer |
Figure 9 shows the modified DRCN architecture implemented.
3.2.3. Laplacian Pyramid Super-Resolution Network (LapSRN)
The Laplacian pyramid super-resolution network (LapSRN) is a neural-network architecture designed for image super-resolution. It was introduced to improve image quality and resolution using a hierarchical approach inspired by the Laplacian pyramid.
LapSRN uses a Laplacian pyramid to represent images at different resolution scales. The Laplacian pyramid decomposes an image into a series of frequency subbands, allowing the network to process image details at various resolution levels stepwise [
22,
23].
The LapSRN architecture includes several levels, each corresponding to a different resolution scale.
Each pyramid level contains a convolutional block that extracts image features at that specific scale. This block consists of several convolutional layers followed by ReLU activation.
The equation resulting from this model is the following:
F: The initial output of the first transformation applied to the input image.
: A filter or convolution kernel applied to the input image.
: The input image could be a multispectral or low-resolution image.
This equation represents an initial transformation in which a convolution kernel
is applied to the input image
to obtain the first representation
. This is used to extract low-level features, such as edges or textures.
: An intermediate exit (perhaps at the pyramid level) at step j.
: A kernel or filter linked to a pyramid-type structure, often used for multi-scale super-resolutions.
: The final output of the recursive step (probably after 10 iterations).
This equation suggests that the final recursion
is combined with a specific kernel
, indicating a multi-scale mechanism. This allows information from different levels of detail to be aggregated, thereby enhancing the model’s ability to produce images with fine features at different resolutions.
O: The final output of the model.
: A final kernel applied to the previous output for the final transformation.
: The output of the second stage in the pyramid or another network module.
This last equation combines the kernel with the intermediate output to produce the result O. This final step can be seen as a refinement, where all the information obtained previously is integrated to generate the final version of the image with improved resolution or other desired properties.
Algorithm 3 presents the pseudocode of the LapSRN algorithm.
Algorithm 3: LapSRN |
Input: I, , , , , Output: O Step 1: Apply the first convolution to extract features from the input image
Step 2: Initialize the first residual block
Step 3: Apply the residual blocks For i = 1 to 10
Step 4: Create Laplace pyramid levels For j = 1 to 2
Step 5: Final output |
Figure 10 shows the LapSRN architecture implemented.
3.3. The General Process of Our Method
Our general methodology is divided into three main parts:
First, we prepare the data necessary for our study. This step includes collecting, transforming, and normalizing the data to fit our model. We describe how we divide 31-band images into 8-band images using Gaussian projections. These transformed data are then organized into mosaics and demosaicked using a bilinear method, which makes it possible to simulate the images obtained by the camera developed as part of the CAVIAR project.
The second part of our methodology consists of implementing different models based on deep-learning algorithms. We use advanced architectures of algorithms that we have modified to adapt to our study, such as very deep super-resolution (VDSR), Laplacian pyramid super-resolution network (LapSRN), and deeply recursive convolutional network (DRCN). Each model is adjusted to optimize image reconstruction, exploiting the specific characteristics of their respective architectures to improve image resolution and quality. These models are trained with the prepared data, thereby learning to predict the spectral responses of the 31-band images from the 8-band images. Finally, we evaluate our models using test data. This step measures the performance and efficiency of the models. We use various metrics such as loss function, PSNR (peak signal-to-noise ratio), and SSIM (structural similarity index) to evaluate the quality of spectral responses. The evaluation allows us to identify the strengths and limitations of each model and make the necessary adjustments to improve their performance.
Figure 11 illustrates the general outline of our methodology.
3.4. Metric Performance
To evaluate the performance of the spectral responses provided by our models, we use several metrics [
18], including
4. Experiments and Results
For our experiment, we present the different spectral responses provided by the implemented models obtained from the 8-band MSFA one-shot cameras. We compare our models using the results obtained from our other metrics. The presentation of our results is conducted quantitatively with metrics and qualitatively.
4.1. Experimental Setting
The configuration of the algorithms used for setting up the models is detailed in the
Table 1:
4.2. Quantitative Results
Table 2 presents three models suitable for estimating the spectral response of 8-band MSFA one-shot cameras: VDSR, DRCN, and LapSRN, using several performance metrics.
The loss function indicates the error between the predicted and actual images. The values obtained are 0.0061, 0.0047, and 0.0057 for the VDSR, DRCN, and LapSRN models. The DRCN model has the lowest loss, with 0.0047.
PSNR measures the quality of the reconstructed image, with higher values indicating better quality. DRCN achieves the highest PSNR at 25.5059, compared to VDSR at 23.6255 and LapSRN at 23.9505. SSIM evaluates the structural similarity between the original and reconstructed images. Values close to 1 indicate better similarity. DRCN also outperforms other models with an SSIM of 0.8355, while LapSRN has the lowest SSIM at 0.8047.
SAM measures the angle between spectral vectors, with lower values indicating better performance. DRCN has the lowest SAM value, 0.13215, indicating better spectral fidelity than VDSR, 0.17293, and LapSRN, 0.17764.
The RMSE quantifies the average magnitude of errors between the predicted and actual values. A lower RMSE value indicates better model accuracy. DRCN has the lowest RMSE, at 0.05849, while VDSR has the highest, at 0.06916.
MAE, which measures the average absolute errors in predictions, is a critical factor in evaluating the algorithms. DRCN’s best performance, with an MAE of 0.0415, highlights its accuracy, while VDSR’s highest MAE, at 0.0525, indicates its lower accuracy.
We can list hypotheses for which DRCN performs better:
DRCN (deeply recursive convolutional network) uses deeply recursive architectures that better capture complex and delicate details in images and could justify why DRCN outperforms other models in terms of PSNR, SSIM, and SAM, where accuracy and fidelity to detail are paramount.
In addition, its recursive mechanism would allow predictions to be refined at each iteration, thus capturing more nuances in image reconstruction. This capability could justify its better balance between noise reduction and texture preservation and its superior performance in terms of loss, RMSE, and MAE compared to VDSR and LapSRN models.
Figure 12 illustrates the learning and validation curves of the VDSR-based model, focusing on three key metrics: PSNR, SSIM, and the loss function (loss). These curves allow one to follow the evolution of the model’s performance at each iteration during training.
Figure 13 illustrates the learning and validation curves of the DRCN-based model, focusing on three key metrics: PSNR, SSIM, and the loss function (loss). These curves allow one to follow the evolution of the model’s performance at each iteration during training.
Figure 14 illustrates the learning and validation curves of the LapSRN-based model, focusing on three key metrics: PSNR, SSIM, and the loss function (loss). These curves allow one to follow the evolution of the model’s performance at each iteration during training.
4.3. Qualitative Results
This section presents the reference images (ground truth) and the spectral responses from the different models. For clarity, we focus on four specific wavelengths (420, 500, 670, 720) instead of the 31 bands available. This presentation focuses on three categories of images from the TokyoTech database: Chartres, Character, and Tshirts2, selected randomly.
We observed that the fitted DCRN model provides more accurate spectral responses than the fitted VDSR and LapSRN models, thus confirming previous quantitative results.
5. Discussion
The performance of the adapted VDSR, DRCN, and LapSRN models reveals significant insights into their capabilities to estimate the spectral responses of 8-band one-shot MSFA cameras. Each model has distinct characteristics that directly influence their effectiveness, and the metric results provide valuable insight into their overall performance.
The DRCN model is the best of the three in quantitative and qualitative assessment. With the lowest values for loss, RMSE (root mean square error), and MAE (mean absolute error), as well as the highest values for PSNR (peak signal-to-noise ratio) and the SSIM (structural similarity index), DRCN demonstrates an exceptional ability to minimize the error of spectral response estimation. These results show that DRCN does not just reduce the numerical differences between predictions and actual values and generate images perceived to be of better quality than the original images. DRCN’s complex architecture enables deep learning of image features, which is the key to superior performance. The depth of the network, combined with advanced optimization techniques, allows DRCN to capture finer details and improve spectral fidelity, which is crucial for applications where color and texture accuracy is paramount.
Although not as efficient as DRCN, the LapSRN model shows interesting results in terms of error minimization (loss, RMSE, MAE) and produces slightly higher-quality images in terms of PSNR compared to VDSR. LapSRN is designed to minimize visual artifacts while maintaining reasonable accuracy, making it suitable for applications where visual quality is more important than structural or spectral fidelity.
VDSR, for its part, stands out for its ability to preserve structures and maintain high spectral precision, as shown by SSIM and SAM (spectral angle mapper) scores. Although its error minimization performance is not as impressive as that of DRCN or LapSRN, VDSR excels in faithfully reproducing details and fine features of images, which is crucial in scenarios where structural fidelity has priority.
Overall, DRCN emerges as the best-performing model for spectral response estimation among those evaluated, providing superior image quality and better spectral fidelity. However, selecting the appropriate model should not be based on overall performance alone; it must also consider the application’s specific requirements, including available resources, use conditions, and priorities regarding visual quality, structural fidelity, or spectral precision. DRCN is ideal for applications requiring high accuracy and optimal image quality. At the same time, VDSR and LapSRN could be favored in contexts where other aspects, such as speed or simplicity of the model, are more important.
We use data covering wavelengths from 420 nm to 720 nm with an interval of 10 nm. This setting presents a first limitation: it becomes impossible to reproduce spectral variations lower than this 10 nm interval, which could restrict the precision of spectral reconstructions for applications requiring finer resolution. Furthermore, our results are limited to the visible spectrum since the available data only cover this range. Therefore, extending our findings to other spectral domains, such as infrared, would be infeasible with current data. Future exploration in these areas would require acquiring additional data in the corresponding wavelength ranges. These aspects constitute essential limitations for the interpretation of the results of this study.
Furthermore, the quality of the input data directly influences the results obtained. If the data used are of lower quality, with noise or artifacts, it becomes more difficult for models to accurately capture spectral and structural details. On the other hand, using very high-quality data would significantly improve model performance, leading to more accurate predictions and better spectral fidelity. It is, therefore, crucial to recognize that the size and quality of datasets play a key role in assessing model capabilities.
6. Threats to Validity
It is essential to consider some threats to the validity of this study, which could influence the interpretation of the results.
The performance of the VDSR, DRCN, and LapSRN models strongly depends on the quality and representativeness of the data used. If the multispectral datasets used for training and validation are not sufficiently diverse or representative of real-world conditions, models could be overfit or underfit, limiting their ability to generalize to other types of cameras or multispectral scenes.
Model performance also depends on the choice of hyperparameters, such as the networks’ learning rates or depths. If these hyperparameters are not rigorously optimized for each model, comparisons may not reflect the maximum capabilities of each architecture.
The conclusions drawn from this study are based on using a specific multispectral camera (8-band one-shot MSFA). It is important to recognize that these results may not be directly generalizable to other camera configurations, cameras with more or fewer spectral bands, or different application environments. For instance, environments that require more spectral precision or finer spatial resolution may yield different conclusions.
7. Conclusions
Spectral response evaluation of 8-band one-shot MSFA (multispectral filter array) cameras is an innovative research area combining multispectral imaging and advanced deep-learning techniques. In this study, we developed models based on VDSR, DRCN, and LapSRN algorithms, adapted to approximate the spectral response of 8-band one-shot MSFA cameras. We observed significant differences in performance and assessed using key metrics such as loss, PSNR, SSIM, SAM, RMSE, and MAE. The DRCN model performed best, achieving superior results on all metrics considered. Its ability to reduce reconstruction error while preserving the structural details of images makes it a preferred choice for applications requiring high image quality.
On the other hand, the VDSR and LapSRN models, although efficient, show lower performance, which could limit their use in contexts requiring optimal quality. Our study allowed us to develop models to estimate the spectral response of 8-band one-shot MSFA cameras into 31 bands. It is essential to continue this work to improve the quality of the spectral response because the quality of the original images significantly influences the results obtained. The future of image processing lies in developing models that combine performance, efficiency, and adaptability to various situations, ensuring continuous improvement in image quality in different application areas. Future research should focus on improving the input data, optimizing the models used, and exploring other models to find a better balance between performance and efficiency.
For future perspectives, the next step is to validate our approach to the camera developed as part of the CAVIAR project. For this purpose, a database is currently being prepared.