Next Article in Journal
A VIKOR-Based Sequential Three-Way Classification Ranking Method
Previous Article in Journal
Assessment of Solar Energy Generation Toward Net-Zero Energy Buildings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Approach to Recognize Faces Amidst Challenges: Fusion Between the Opposite Frequencies of the Multi-Resolution Features

by
Regina Lionnie
*,
Julpri Andika
and
Mudrik Alaydrus
Department of Electrical Engineering, Universitas Mercu Buana, Jl. Meruya Selatan No. 1, West Jakarta 11650, Indonesia
*
Author to whom correspondence should be addressed.
Algorithms 2024, 17(11), 529; https://doi.org/10.3390/a17110529
Submission received: 7 October 2024 / Revised: 1 November 2024 / Accepted: 15 November 2024 / Published: 17 November 2024
(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Abstract

:
This paper proposes a new approach to pixel-level fusion using the opposite frequency from the discrete wavelet transform with Gaussian or Difference of Gaussian. The low-frequency from discrete wavelet transform sub-band was fused with the Difference of Gaussian, while the high-frequency sub-bands were fused with Gaussian. The final fusion was reconstructed using an inverse discrete wavelet transform into one enhanced reconstructed image. These enhanced images were utilized to improve recognition performance in the face recognition system. The proposed method was tested against benchmark face datasets such as The Database of Faces (AT&T), the Extended Yale B Face Dataset, the BeautyREC Face Dataset, and the FEI Face Dataset. The results showed that our proposed method was robust and accurate against challenges such as lighting conditions, facial expressions, head pose, 180-degree rotation of the face profile, dark images, acquisition with time gap, and conditions where the person uses attributes such as glasses. The proposed method is comparable to state-of-the-art methods and generates high recognition performance (more than 99% accuracy).

1. Introduction

Face recognition works by matching facial features from the image under investigation with a collection of features known as a database [1]. With the rapid development of technology, the existing recognition performance based on machine learning has reached a certain level of maturity [2]. Due to the technology’s maturity, face recognition is being employed in various applications. One such use is the possibility of facial recognition as a payment option. Payment transactions can be completed using user facial authentication. The benefit of utilizing face recognition as a payment method is the ability to make payments without relying on a smartphone or any device when the user forgets to carry one or the phone is not working [3]. Facial recognition technology is safer than conventional payment methods, including inputting passwords to allow credit card or mobile payments [4].
A few obstacles must be overcome before face recognition can realize its full potential as a quick, safe breakthrough that facilitates payments. These challenges include head orientation, lighting conditions, facial expressions, aging, and hairstyles [5,6,7]. In this research, to overcome challenges and build a face recognition system that is robust and accurate, we propose to tackle these problems using combinations of methods based on image fusion. Image fusion is a technique that creates a single composite image by fusing several images from various sources while maintaining crucial details and improving overall quality. It has attracted much discussion in several areas, including computer vision, remote sensing, and medical imaging, where better analysis depends on integrating complementary data. Image fusion combines significant information from different sensors using various mathematical techniques to create a single composite image. This technique integrates complementary multi-temporal, multi-view, and multi-sensor data into one image with enhanced quality while preserving key features [8]. The fused image may provide more information than the source images. Due to these advantages, image fusion encompasses various tasks such as image enhancement, noise reduction, feature extraction, image matching, object recognition, image restoration, and tracking [9].
In face recognition, image fusion contributes to improving robustness or performance recognition. The approach of image fusion can be through a combination of visible and infrared images [10,11,12]; a fusion of multi-modal 2D and 3D face images [13,14,15]; a fusion of multi-modal biometric face and other biometric traits, e.g., face and ear [16]; face, iris, and fingerprint [17]; face, iris, and thumbprint [18]; or face and voice [19].
This research proposes an image fusion method using visible 2D face images. Several studies that previously investigated this approach are as follows. A study [20] built a facial recognition system using feature fusion based on facial features such as eyes, nose, and mouth and extracted global features using principal component analysis and local features using local binary patterns. The features were fused using a Laplacian pyramid and classified using an artificial neural network. The proposed model was tested against challenges such as variations in pose, illumination, expression, image resolution, and occlusion. The same features were further investigated and fused using multi-resolution singular value decomposition [21] and multi-resolution discrete cosine transform [22]. In [23], multi-resolution feature fusion that combined low- and high-resolution features was amplified using Gabor-feature hallucination. This proposed combination increased performance, especially in low-resolution face recognition. A study [24] proposed a hierarchy feature fusion on end-to-end layers in the convolutional neural network to tackle challenges such as illumination and occlusion. In [25], to improve recognition effectiveness, multi-feature fusion using 2D principal component analysis for extracting global features, and local binary pattern for extracting local features, was proposed. Feature fusion on a local binary pattern, elliptical local binary pattern, median binary pattern, and local phase quantization was proposed in [26] as improved local descriptors.
In [27], an age-invariant face recognition framework was built using feature fusion and decomposition. The authors presented methods of feature fusion based on recurrent neural network and linear feature fusion. The study [28] assembled a face recognition model based on a multi-scale feature fusion network using a multi-scale bilinear pooled convolutional neural network. In [29], the authors proposed estimated features of frontal face images from fused features based on a generative adversarial network to improve recognition of angled face images. An expression-based convolutional neural network was fused in the layer of multi-level feature fusion and transfer learning convolutional neural network [30] to tackle challenges such as facial expressions in face recognition. In [31], multi-modal biometrics of RGB data and texture descriptors from face and periocular were combined using a fusion strategy in the convolutional neural network. In [16], multi-modal biometrics of ear and face profiles represented by local phase quantization and local directional patterns, reduced by principal component analysis, were fused in a feature-level fusion to improve the overall recognition rate.
Looking deeper into image fusion, one strategy to find the best fuse features is, first, to decompose the source images using wavelet decomposition analysis. This process enables us to investigate source images with different scales and frequencies. The study [32] utilized weighted wavelet visual perception fusion. The study aimed to improve underwater images by correcting color and improving global and local contrast of the color-corrected images. The improved-contrast color-corrected images were then decomposed using the wavelet decomposition method. A study [33] investigated three wavelet fusion strategies, max-min, min-max, and mean-mean, to merge charge-coupled device red, green, and blue infrared images and depth-grayscale images. The experiment concluded that the min-max fusion strategy performed best with five levels of decomposition. In [34], feature fusion from the local binary pattern, Gabor, and local directionality pattern of eyebrows, eyes, mouth, and chin features was combined and classified using a support vector machine or artificial neural network. The study [35] proposed employing feature- and pixel-level fusion of wavelet sub-bands for face recognition. The feature-level fusion combined the results of the dimension-reduced approximation sub-band and the results from the pixel-level fusion of detail sub-bands. The best pixel-level fusion coefficients of detailed sub-bands were found using principal component and linear discriminant analyses. A study [36] used a weighted energy ratio to fuse wavelet packet decomposition images. The fused images were extracted using principal component analysis and the Fisher linear discriminant. A study [37] employed quaternion wavelet transform to decompose images into low-frequency and high-frequency coefficients. The low-frequency coefficients were fused using a weighted-average fusion rule based on phase, magnitude, and spatial variance. Meanwhile, the high-frequency coefficients were fused using the maximum fusion rule based on contrast and energy. The inverse quaternion wavelet transform was performed to obtain the final fused image.
Although accuracy and security have significantly increased due to advancements in facial recognition technology, integrating image fusion techniques poses a particular set of challenges. While image fusion results promise better information than the source images, they also introduce the complexity of determining what features or data are being fused. As mentioned, the image fusion approach can be made through a combination of different sensors, multi-modal biometric traits, and dimensional structures. The selected features to be fused need to be as complementary as possible. The overlap of unnecessary information should be minimized [38]. Moreover, it also raises the question of whether the image fusion can retain the visual information of the original image and how to tackle the effect of reduced contrast in the fused image [39].
We proposed using multi-resolution analysis based on discrete wavelet transform (DWT) to decompose images into low-frequency and high-frequency sub-bands. At the same time, the images were filtered using a Gaussian filter to produce blurred images or lower-frequency representation of the images. Then, the Difference of Gaussian was constructed by calculating the differences between the Gaussian blurred images with different sigma. This procedure resulted in high-frequency representation of the images. The low- and high-frequency sub-bands from the decomposition of discrete wavelet transform were then fused with opposite-frequency components from the Gaussian and Difference of Gaussian, i.e., the low-frequency sub-band from DWT was fused with Difference of Gaussian results, and the high-frequency sub-bands from DWT were fused with Gaussian results. These opposite frequencies from the same image reduced the challenges due to the image fusion technique requiring many inputs from several sources and solved the demand that features should be as complementary as possible.
The fusion methods employed Laplacian pyramid image fusion (LP-IF) or discrete wavelet transform image fusion (DWT-IF). The fusion rules were as follows: for LP-IF, averaging the base low-pass and highest absolute value for the fused high-pass coefficient; for DWT-IF, we found that the mean-mean rule fusion for both low- and high-frequency coefficients performed the best. To our knowledge, no research has ever proposed the combination of image fusion using discrete wavelet transform decomposition sub-bands with Gaussian filtered images and Difference of Gaussian coefficients. Using the multi-resolution image fusion also helped us to retain and improve the contrast of the original image. The low-frequency components represent the most energy in an image where they contain the slow change’s structure and the image’s brightness.
The fusion result was reconstructed as an enhanced image using inverse discrete wavelet transform. We used this approach to maintain and improve the information from the original image. A histogram of the oriented gradient was employed, and the extracted features were classified using a support vector machine. Our proposed method was tested against challenges in face recognition such as pose and lighting conditions, facial expressions, dark images, image acquisition variations like time gap and angle of the face profile, and attributes on faces like glasses and makeup. We showed that our proposed method produced high recognition accuracy and was robust against challenges such as lighting conditions, facial expressions, head pose, 180-degree rotation of the face profile, dark images, acquisition with time gap, and conditions where the person uses attributes such as glasses.
To be able to explain our research more clearly, we list several contributions from this paper:
  • We proposed a new combination of image fusion using opposite-frequency components from discrete wavelet transform sub-bands and Gaussian-filtered images or Difference of Gaussian. The proposed method needed only the same images to produce the opposite frequency and solve the problems of inputs from multiple sources, and the requirements for features needed to be as complementary as possible.
  • We examined the effects of Laplacian pyramid image fusion and discrete wavelet transform image fusion to improve recognition performance.
  • We investigated the variation in parameters inside the methods, such as the wavelet family employed in multi-resolution analysis of discrete wavelet transform and the wavelet family and level of decomposition employed in discrete wavelet transform image fusion.
  • Due to the high potential of other combinations inside our proposed method, including pixel-level or feature-level fusion of features, we showed that our chosen combination outperformed other combinations’ recognition performance.
  • The classification methods were compared using a support vector machine with linear, quadratic, and cubic kernels; nearest neighbor; and neural network.
  • Our proposed method was tested against challenges from six face datasets (four datasets and two sub-datasets) and compared with other image fusion and non-fusion methods.
The rest of this paper is organized as follows: Section 2 displays the materials and methods in our research; Section 3 discusses the findings and compares our results against other methods; Section 4 concludes our research.

2. Materials and Methods

We explain the materials in this experiment in Section 2.1 and the proposed methods in Section 2.2.

2.1. Face Datasets

We employed several face datasets for this research:
  • The Database of Faces (AT&T) [40]
    The Database of Faces (AT&T Laboratories, Cambridge), formerly known as The ORL Database of Faces, consists of 400 face images from 40 people. The images are in grayscale, with each image having 92 × 112 pixels. The database has variations in lighting conditions and facial expressions, with some variations in glasses and no glasses. Several people were photographed at different times. The background of each image is dark, and the image was taken from the front with an upright position. Figure 1a displays examples of images from this database.
  • The BeautyREC Dataset (BeautyREC) [41]
    The BeautyREC Dataset (BeautyREC) is a face makeup transfer dataset containing 3000 color images from various people. It is worth noting that this dataset is imbalanced. The size of each image is 512 × 512 pixels, then resized to 100 × 100 pixels and the color changed into grayscale images. The makeup transferring method is detailed in [41]. The makeup transfer dataset encompasses various makeup styles, face poses, and races. Figure 1b shows an example of images from this dataset.
  • The Extended Yale B Face dataset (EYB) [42,43]
    The Extended Yale B (EYB) Face dataset has 2414 frontal face images with variations in pose and lighting conditions. All 2414 images are from a total of 38 people. In this research, we employed only 58 images from each person due to the extremely dark images. Each image has a size of 168 × 192 pixels in grayscale. Figure 1c displays an example of images from this dataset.
  • The Extended Yale B Face dataset with dark images (EYB-Dark) [42,43]
    To challenge the method against dark images, we separated four images from each person in the EYB Face dataset and created a subset of EYB-Dark. The four images were taken randomly, but they needed to strictly abide by this rule: one under a slight illumination variation and three dark images. The total of this sub-dataset is 38 people × 4 images = 152 images. The size of each image is the same as the EYB Face dataset, and they are all in grayscale. Figure 1d shows an example of images from this dataset.
  • The FEI Face Database (FEI) [44]
    The FEI Face Database contains 2800 images from 200 people; each person has 14 images. Each image is a color image captured in an upright frontal position with a 180-degree rotation of the profile on a uniform white background. The scale may differ by approximately 10%; each image was originally 640 × 480 pixels. This database is a collection of face images from students and employees, aged 19 to 40, from the Artificial Intelligence Laboratory of FEI in São Bernardo do Campo, São Paulo, Brazil. All people have distinct appearances, hairstyles, and jewelry. The ratio between men and women is approximately the same. In this research, we resized the images into 320 × 240 pixels and changed the color into grayscale images. Figure 1e shows the example of images from FEI.
  • The FEI Face Database (FEI) Subset Frontal and Expression (FEI-FE) [44]
    The FEI Face Database has a subset that contains only two images per person and 400 images in total, consisting of one frontal face image with a neutral (non-smiling) expression and one frontal face image with a smiling expression. In this research, we resized the images into 180 × 130 pixels and changed the color into grayscale images. Figure 1f shows the example of images from FEI-FE.

2.2. Methods

2.2.1. Proposed Design

Our proposed method can be seen in the flowchart design in Figure 2. The method consisted of discrete wavelet transform (DWT), Gaussian filtering and Difference of Gaussian, image fusion from the output of DWT and both Gaussian filtering and Difference of Gaussian, inverse discrete wavelet transform (IDWT) from the output of image fusion creating an enhanced reconstructed image, and histogram of oriented gradient (HoG); then, the output was finally classified using machine learning algorithms. Each method, which was just mentioned, is explained below.

2.2.2. Multi-Resolution Analysis with Discrete Wavelet Transform (MRA-DWT)

The first process from our proposed method was to transform the input image with discrete wavelet transform (DWT). Multi-resolution analysis with DWT (MRA-DWT) serves a purpose, given that it can break down a signal into several components at different degrees of detail. Wavelets, small waves with varying scales and positions, break down a signal into its component elements. Multiple resolutions were used in this decomposition process to capture both high-frequency and low-frequency parts.
Two-dimensional DWT transformed an image (I) by using the procedure of decomposition and filtering using low-pass filters and high-pass filters twice, once along the column and once along the row, resulting in four different sub-bands: approximation (A), horizontal (H), vertical (V) and diagonal (D) [45]. The four sub-band results can be seen in Figure 3, while the MRA-DWT process is represented in the blue block in Figure 2. It is important to note that the ↓2 symbol in Figure 2 means down-sample columns (keeping the even-indexed columns) after filtering along rows and down-sample rows (keeping the even-indexed rows) after filtering along columns.
The scaling function determines the low-pass filter. It has a non-zero average, thus enabling it to represent the smooth or average components of the signal. The wavelet function determines the high-pass filter. It has a zero average for detecting changes in the image. The scaling function ϕ ( t ) and wavelet function ψ ( t ) are expressed in Equations (1) and (2), respectively, where h(k) and g(k) are the scaling and wavelet coefficients [46].
ϕ ( t ) = k = h ( k ) ϕ ( 2 t k )
ψ ( t ) = k = g ( k ) ϕ ( 2 t k )
The scaling and wavelet coefficients depend on the wavelet family. In this research, we examined the MRA-DWT wavelet family using Haar (haar), Daubechies (db2), Symlet (sym2), and biorthogonal wavelet (bior 1.3, bior 2.2, bior 2.6, bior 3.3, and bior 3.7) with one level of decomposition. For example, Figure 4 illustrates the scaling and wavelet functions from the wavelet Haar, and Equations (3) and (4) express the Haar scaling and wavelet functions, respectively.
ϕ t = 1   0 t < 1 , 0   o t h e r w i s e .
ψ t = 1   0 t < 1 2 , 1   1 2 t < 1 , 0   o t h e r w i s e .  
The approximation sub-band captures the most energy or information of an image, while the horizontal, vertical, and diagonal sub-bands capture finer details and sharp changes in an image. The approximation sub-band results from two low-pass filters, hence consisting of low-frequency components of the original image. The low-frequency component displays slow changes in the area of the image, and it is where the most energy of the data is located. The horizontal, vertical, and diagonal sub-bands are the results of at least one high-frequency filter, which is why these sub-bands are high-frequency components of the original image. The high-frequency components express small and fast changes in the image and extract details. Figure 3(left) shows the low-frequency component or the approximation sub-band, where we can see the representation of the blurred version of the original image due to two low-pass filters. The other parts of Figure 3 show the high-frequency sub-bands where only the changes and details in the horizontal, vertical, and diagonal directions are shown. Figure 3 shows that the horizontal and vertical sub-bands already represent the details of the original image even though they were filtered by a high-pass filter only once.

2.2.3. Gaussian Filtering and the Difference of Gaussian

The second step was to filter the same input image used in MRA-DWT with a Gaussian filter and create a Gaussian-filtered image and a Difference of Gaussian. Filtering with Gaussian produces an output (G0) of a blurred version of the original input image (I).
G 0 = I 1 2 π σ 2 e x 2 + y 2 2 σ 2
The value of σ was 2.7713, as in [47] (p. 245) and [48]. The x and y are the distances from the center point in the x and y directions. The * symbol represents the process of convolution. The following process was to decimate the G0 along both rows and columns by a factor of two. The purpose of decimation by the factor of two was to create the same size for the Gaussian-filtered features (G1_1 and G1_2), Difference of Gaussian features (L), and multi-resolution sub-bands from MRA-DWT (A, H, V, D). After the decimation, the filtering with Gaussian was performed once again for the input of G0 with two sigmas, σ 1 and σ 2 , producing G1_1 and G1_2, respectively. The value of σ 1 was 1.2263 and σ 2 was 1.9725, as in [47] (p. 245) and [48]. The kernel size of each Gaussian filter depended on the sigma. The kernel size was 2 × 2 × s i g m a + 1 , where indicated round up or round toward positive infinity. The kernel size for (5) was 13, that for (6) was 7, and that for (7) was 9.
G 1 _ 1 = G 0 1 2 π σ 1 2 e x 2 + y 2 2 σ 1 2
G 1 _ 2 = G 0 1 2 π σ 2 2 e x 2 + y 2 2 σ 2 2
It is worth noting that the Laplacian of Gaussian can be approximated by using the difference of two Gaussians, following the rule in [47]. One of the popular methods that employed the Difference of Gaussian to approximate the Laplacian of Gaussian is SIFT (scale-invariant feature transform), by [49]. A scale-normalized Laplacian of Gaussian (LoG) was calculated in (8) and (9) [47] (pp. 235–236).
L o G = λ ^ ( G 1 _ 1 G 1 _ 2 )
λ ^ = 2 κ 2 ( κ 2 1 )
The Difference of Gaussians (L) was used to approximate the Laplacian of Gaussians (10).
L o G = λ ^ ( G 1 _ 1 G 1 _ 2 ) L
The factor λ ^ could be omitted from (10) because κ was a fixed-scale ratio between two Gaussians at successive scale space levels (i.e., G1_1 and G1_2) [47] (p. 240 and p. 242). In the end, the Difference of Gaussian (L) was calculated by finding the difference between two Gaussians with adjacent scale levels.
L = G 1 _ 1 G 1 _ 2
The Gaussian filtering and Difference of Gaussian are represented in the green block in Figure 2. The results of Gaussian filtered features and Difference of Gaussian features can be seen in Figure 5. We can see that Gaussian filtering is a process of blurring the input image, and we consider the Gaussian filtered image to be a low-frequency component of the original image. The Difference of Gaussian creates the distinctness of edges, which brings up other details of the original image. The edges and details are considered the high-frequency component of the original image.

2.2.4. Image Fusion

Image fusion is a technique in computer vision and image processing that merges multiple images into a single composite image that is more informative and suitable for further processing tasks. In this research, the main objective of image fusion was to integrate opposite-frequency components from MRA-DWT and Gaussian filtering or Difference of Gaussian to create new features that had better robustness against facial recognition system challenges than any original image.
In general, according to the data integration phase, there are three types of image fusion: pixel-level fusion, feature-level fusion, and decision-level fusion [50]. In pixel-level fusion, the merging is directly performed on the input images. The essential features are extracted first from the image in feature-level fusion. Later, the process combines these features. Different from pixel-level and feature-level fusion, in decision-level fusion, the image analyses are performed separately until the final process to produce decisions, and these decisions are combined to create the final verdict.
In this research, we compared two different methods of image fusion, i.e., Laplacian pyramid image fusion (LP-IF) and discrete wavelet transform image fusion (DWT-IF). Both methods come from the transform domain image fusion [51]. The LP-IF and DWT/IDWT-IF processes are shown in the red block in Figure 2.
  • Laplacian pyramid image fusion (LP-IF)
The Laplacian pyramid was first proposed by Burt and Adelson [52]. It was then used to merge images [53]. The LP-IF creates a pyramid-like structure for the two source images using Gaussian filtering with decimation by a factor of two. This procedure is implemented until it reaches the desired level l. The Laplacian is calculated by subtracting the filtered Gaussian from level l from the filtered Gaussian from level l + 1 that first undergoes a process of un-decimation by a factor of two. Using the popular fusion rule, we chose an averaging for the base low-pass and the highest absolute value for the fused high-pass coefficient. The final fused image is acquired by going through the inverse pyramid transformation. We used level l = 4. All LP-IF codes in this research were credited to the image fusion toolbox by [54].
  • Discrete wavelet transform image fusion (DWT/IDWT-IF)
Image fusion by DWT (DWT/IDWT-IF) was first introduced by [55] with a multi-resolution decomposition that shows the frequency and location where the frequency occurs. Like LP-IF, DWT-IF creates a multi-scale representation of the source images, producing four different sub-bands for each image: A, H, V, and D (as explained in Section 2.2.2). The four sub-bands from each source image were fused using max-min, min-max, and mean-mean rules. The max-min rule indicates that the fusion takes the maximum absolute approximation value and minimum absolute value for details. In contrast, the min-max rule is the opposite of the max-min (minimum absolute value for approximation and maximum absolute value for details). The mean-mean rule means we take the mean value for approximation and details [33]. The final fused image is obtained through inverse DWT using the combined coefficients with different rules.
After we acquired four sub-bands from MRA-DWT (A, H, V, and D), the Gaussian filtered image (G1_1) from Equation (2), and the Difference of Gaussian (L) from Equation (4), we proposed to fuse these coefficients to produce an enhanced image. The fusion combined two opposite coefficients’ frequencies, i.e., the low-pass coefficient (which lacks a high-frequency component) was combined with the high-pass coefficient (which lacks a low-frequency component), and vice versa. The idea was to combine the opposite frequencies to complement each other. The approximation sub-band from MRA-DWT (A) consisted of only slow change components, so we combined it with the Difference of Gaussian (L), which had details of distinctness of edges. The detail sub-bands from MRA-DWT (H, V, and D) represented small and fast changes. We combined them with a Gaussian-filtered image (G1_1), where the result also showed slow changes from the original image. In addition, to support our idea, we tried to perform an experiment combining the components of the same frequencies, but the results were unsatisfactory. Figure 6 displays the results of the proposed image fusion.
The fusion rules in our proposed method were as follows (Figure 2):
  • Fused A (a low-frequency component) with L (a high-frequency component), resulting in AL;
  • Fused H (a high-frequency component) with G1_1 (a low-frequency component), resulting in HG;
  • Fused V (a high-frequency component) with G1_1 (a low-frequency component), resulting in VG;
  • Fused D (a high-frequency component) with G1_1 (a low-frequency component), resulting in DG.
The image fusion rules were as follows:
  • LP-IF: averaging for the base low-pass and highest absolute value for the fused high-pass coefficient;
  • DWT/IDWT-IF: max-min, min-max, and mean-mean rules with variations in wavelet family and level of decomposition.

2.2.5. Inverse Discrete Wavelet Transform for Multi-Resolution Analysis (MRA-IDWT)

To reconstruct an enhanced image from the fusion results, we took four fused coefficients, AL, HG, VG, and DG, to be reconstructed using a single-level inverse discrete wavelet transform (MRA-IDWT). The same wavelet was used when we decomposed it into MRA-DWT and reconstructed it with MRA-IDWT. The reconstruction process started by up-sampling the fused coefficients (AL, HG, VG, and DG) by introducing zeros between the samples at odd-indexed columns/rows to increase the image’s resolution in both dimensions. The equivalent synthesis filters, previously employed in the DWT, were subsequently applied to these up-sampled coefficients. The inverse wavelet transform was first applied along the columns of the up-sampled coefficients to rebuild the intermediate row data, merging the AL with HG and VG with DG. The procedure was then applied along the rows of the intermediate data to recreate the enhanced image completely.

2.2.6. Histogram of Oriented Gradient (HoG)

The histogram of oriented gradient (HoG) [56] is a feature descriptor used in image processing and computer vision for object detection and recognition. The technique has become popular due to its robustness and effectiveness in obtaining local shape information while being comparatively unaffected by global variations in position and illumination [57].
The gradient of an enhanced reconstructed image from the previous steps was computed first. The gradients show the edges and contours of the face in the image. Then, the image was divided into small, connected regions called cells. For each cell, a histogram of gradient orientations was created, where the gradient orientation was binned, and the magnitude of the gradient was used to vote for the corresponding orientation bin. The HoG descriptor included a step where groups of cells, known as blocks, were normalized. This normalization process involved adjusting the histogram values to account for changes in lighting conditions, enhancing the descriptor’s invariance to factors such as illumination and contrast. The block normalization was performed by computing the L2 norm of the histogram vector within the block. The final HoG descriptor was a concatenated vector of the normalized histograms from all blocks within the detection window.

2.2.7. Classification and Experiment Setup

The vector of HoG descriptor from each enhanced reconstructed image in the face dataset was reshaped into one row (one image for one row). Then, we combined them into one big matrix, where each row represented features from each image. In the end, we had a big matrix with the size of m × n, where m indicates the number of images inside the dataset and n indicates the size of the feature vector. Afterwards, we trained them using 1-nearest neighbor (1-NN); support vector machine (SVM) with linear, quadratic, and cubic kernels; and neural network with ReLU (NN-ReLU) as classification methods in our machine learning simulations. It is worth noting that in this research, individual people in all datasets represented a class. For example, the Database of Faces (AT&T) consisted of images from 40 people, so we had 40 classes for that dataset. We employed 5-fold cross-validation that divided testing and training data into 20:80 ratios to prevent overfitting. Finally, the average recognition accuracy from each simulation was calculated and compared.
In this research, due to the high potential of other combinations inside our proposed method, including pixel-level or feature-level fusion of features, we showed several possibilities (derivative from the flowchart design in Figure 2) to produce the best results of recognition performance. These several possibilities are shown in Figures S1 and S2. The first possibility (Figure S1) was to let one of the four features of AL, HG, VG, and DG go directly into the next step of HoG and classification. The second possibility (Figure S2) was to fuse the four features of AL, HG, VG, and DG into one concatenated feature vector. Table 1 shows the overall experiment designs from our proposed method in Figure 2 and several possibilities from Figures S1 and S2.
It is worth noting that in this research, for some experiments, we might have had a maximum of two pairs of DWT/IDWT with different purposes:
  • Multi-resolution analysis with discrete wavelet transform, to divide the input image into four different sub-bands, and inverse discrete wavelet transform, to produce an enhanced reconstructed image after fusion (MRA-DWT/IDWT);
  • The discrete wavelet transform and inverse discrete wavelet transform as a method of image fusion (DWT/IDWT-IF).
We compared using Haar (haar), Daubechies (db2), Symlet (sym2), and biorthogonal wavelet (bior 1.3, bior 2.2, bior 2.6, bior 3.3, and bior 3.7) families with one level of decomposition for all MRA-DWT/IDWT. We also compared wavelet families (haar, db2, sym2, and bior 2.6/bior 3.3) and decomposition levels (1, 3, 5, and 7) for DWT/IDWT-IF.
In this research, four different face datasets with two subsets (mentioned in Section 2.1) and their variations were the challenges to be tested with our proposed method. Table 2 displays the list of face datasets with their challenges.
All simulations in this research were performed using Matlab 2024a (Update 5) with 16 GB installed RAM and Intel(R) Core(TM) i7-7500U CPU @ 2.70 GHz 2.90 GHz.

3. Results and Discussion

We display the results of our research regarding the effect of our proposed method against variations and challenges in the datasets.

3.1. Results for AT&T Face Dataset

3.1.1. Results for All Experiment Designs

First, we explored the results of the six experiment designs from Table 1 against the AT&T Face Dataset. The AT&T Face Dataset has variations such as lighting conditions, facial expressions, face attributes using glasses and no glasses, and image acquisition with a time gap. Table 3 shows the recognition performance of accuracy results using 1-nearest neighbor (1-NN); support vector machine (SVM) with linear, quadratic, and cubic kernels; and neural network with ReLU (NN-ReLU) classification.
We can see from Table 3 that our proposed method (Exp. 5) produced the highest accuracy (99.2%) among the variations. The options of using only AL, HG, VG, and DG fusion features (Exp. 1 and 3) or concatenating the four features (Exp. 2 and 4) did not produce better results compared with the system using the one enhanced reconstructed image (Exp. 5 and 6). Although they did not perform the best, we learned that Exp. 2 and 4 generally produced results as high as the best accuracies from four features from Exp. 1 and 3 (AL/HG/VG/DG).
In terms of image fusion methods, LP-IF (Exp. 1 and 2) and DWT-IF (Exp. 3 and 4), Table 3 shows that both image fusion methods achieved accuracies higher than 97% (SVM) with LP-IF producing the best. In terms of classification method, overall, the SVM yielded better results compared with 1-NN and NN-ReLU.
In a more detailed analysis, Table 4 displays the confusion matrix performance for Exp. 5 and Exp. 6, while the confusion matrix chart can be seen in Figures S3 and S4. Table 4 and Figures S3 and S4 show a balanced performance between accuracy, precision, recall, and F1-Score. Our proposed methods (Exp. 5 and 6) were likely reliable and accurate at discriminating across classes. They generalized well throughout the dataset and did not overfit any one class or subset of the data.
Table 4 also shows the results if there were only five images per individual/class, and the results were compared with using full (10) images per individual/class. There was a decrease in accuracy of approximately 3.75% to 5.5% if our proposed method employed only five images per class.
We calculated the processing time for our proposed methods, Exp. 5 and 6, against the AT&T Face Dataset. These calculations were applied to Exp. 5 and Exp. 6 with variations in level of decomposition in the DWT/IDWT-IF. Figure 7 displays the results of the processing time in seconds. T1 shows the time taken from the input to the creation of the one big matrix (before classification), T2 shows the time taken from the input to the output of accuracy using SVM quadratic, and T3 shows the time taken from the input to the output of accuracy using SVM cubic. We also observed a significant time difference, approximately 87 s, between T1 for Exp. 5 and Exp. 6. DWT/IDWT-IF required much more time than LP-IF. We were not surprised to find an increase in the inference time from Exp. 6a to Exp. 6d. This increase happened due to the rise of the total decomposition levels; the higher the decomposition level in DWT/IDWT-IF, the more time it took. We also calculated the inference time, the time required for one input image processed by our simulation until the system produced the output (prediction). The average inference time was 0.0093 s.

3.1.2. Results for Different Wavelet Families

The experiments in Table 3 used the default settings of employing haar with one level of decomposition for MRA-DWT/IDW and db2 with five levels of decomposition for DWT/IDWT-IF and the mean-mean rule of image fusion. The following investigation shows the results of varying the wavelet family for MRA-DWT/IDWT. It is worth noting that different wavelet filters in the wavelet family for MRA-DWT/IDWT have effects when the process starts to decompose an image into sub-bands and when the process starts to reconstruct features from image fusion into one enhanced reconstructed image.
Different wavelet families for MRA-DWT/IDWT show different characteristics of low- and high-pass filters. The explanation is as follows [58]: wavelet Haar, with only one vanishing moment (haar), is not smooth and has anti-symmetry; Daubechies wavelet, with only two vanishing moments (db2) for the wavelet function, is not considered smooth and does not have symmetry characteristics; Symlet, with two vanishing moments (sym2), is also not considered smooth and has near symmetry characteristics; biorthogonal (bior nr.nd) has exact symmetry and is considered smooth only if the wavelet order is large. In the abbreviation, nr is the number of vanishing moments inside the reconstruction filter, and nd is the number of vanishing moments inside the decomposition filter. In this research, we employed bior 1.3, bior 2.2, bior 2.6, bior 3.3, and bior 3.7.
Figure 8a and Figure 8b display the results of using haar, db2, sym2, bior 1.3, bior 2.2, bior 2.6, bior 3.3, and bior 3.7 with one level of decomposition against the classification methods for Exp. 5 and 6, respectively. Figure 8a shows that the haar wavelet yielded the best result, followed by bior 1.3 and bior 2.2. Figure 8b shows that bior 2.6 and bior 3.3 performed better. They improved the accuracy results by 0.8% (from 98% using haar to 98.8% using bior 2.6/bior 3.3).
In the subsequent investigation, we examined the effect of variations inside DWT/IDWT-IF from Exp. 6. Different from MRA-DWT/IDWT, various wavelet families inside DWT/IDWT-IF have the effects of decomposing input features and reconstructing them into fused features for image fusion purposes.
To investigate DWT/IDWT-IF, we employed bior 3.3 for MRA-DWT/IDWT since it yielded the best results from the previous investigation. First, we checked using the db2 wavelet for one, three, five, and seven levels of decomposition. The peak of the best accuracy results came from five levels of decomposition. Figure 9 shows the results of this investigation. Knowing that five levels of decomposition performed best, we then varied the wavelet family for DWT/IDWT-IF, using haar, db2, sym2, and bior 3.3. The results in Figure 10 show that db2 performed the best among others for DWT/IDWT-IF if paired with five levels of decomposition.
From these wavelet filter variations, we summarized for the AT&T Face Dataset, using haar with one level of decomposition for MRA-DWT/IDWT, yielded 99.2% accuracy for Experiment 5; using bior 3.3 with one level of decomposition for MRA-DWT/IDWT, and db2 with five levels of decomposition for DWT/IDWT-IF, yielded 98.8% accuracy for Experiment 6. As mentioned, the AT&T Face Dataset has variations such as lighting conditions, facial expressions, facial attributes using glasses and no glasses, and image acquisition with a time gap. Our proposed method produced high recognition accuracy against these challenges inside the AT&T Face Dataset.

3.2. Results for Other Face Datasets

3.2.1. Results for EYB and EYB-Dark Face Datasets

The following investigation evaluated the EYB Face Dataset using our proposed method. Having variations in pose and lighting conditions, the EYB Face Dataset was examined. From a total of six experiments, we investigated only Exp. 2, 4, 5, and 6 since the last experiment showed that using concatenated vectors (Exp. 2 and 4) generally performed as well as the highest accuracy from each fused coefficient (Exp. 1 and 3).
We examined Exp. 2, 4, 5, and 6 using only SVM for classification due to the best results produced by the SVM in the last experiment. Figure 11 shows the accuracy results of these settings. We saw that the results from Exp. 2, 4, and 5 produced almost the same highest value. The best recognition performance of accuracy was 99.8%.
To challenge our proposed method further against lighting conditions, we evaluated it against the EYB-Dark Face Dataset. Figure 1d shows that this subset of the EYB Face Dataset has only four images for each individual, where one image is considered to use better lighting, and three other images are dark.
The same experiment was operated for the EYB-Dark Face Dataset, i.e., Exp. 2, 4, 5, and 6. Figure 12 shows the results of recognition performance. Overall, the accuracy was not as high as the EYB Face Dataset and was reduced by approximately 3.7–43.9%. This decrement is understandable, since the EYB-Dark has only four images per person, three of which are dark (Figure 1d). While Exp. 2 and 4 failed to produce high accuracy, our proposed method, Exp. 5 and 6, generated better results with the highest performance, achieving 93.4%.
To further investigate and generate better results, we examined the fusion rule of mean-mean, min-max, and max-min on the EYB-Dark Face Dataset for Experiment 6. Figure 13 displays the results and shows that the mean-mean rule produced the best result between the three different fusion rules. Our results showed different outcomes than those of the study in [33], which found that using the min-max fusion rules provided the best recognition results for hand gesture intention.
Figure 14 shows the results of DMW/IDWT-IF using each fusion rule. Figure 14 shows why the mean-mean rule performed best, followed by the min-max rule. The mean-mean rule produced results that made details previously obscured by the darkness became visible. While the min-max rule also brought up obscured details, it was not as informative as the mean-mean rule.
Moreover, we varied the wavelet family for MRA-DWT/IDWT. The results are shown in Figure 15. This experiment found a better wavelet filter for this challenge inside the EYB-Dark Face Dataset. Using the sym2 wavelet improved the recognition by 2.7%, achieving an accuracy of 96.1%.
Correspondingly, we investigated the effect of the level of decomposition and various wavelet families in DWT/IDWT-IF for Experiment 6 with the same settings as in Figure 9 and Figure 10 for the AT&T Face Dataset. The same conclusion was achieved, which was that using five levels of decomposition produced the best results. These results can be seen in Figure S5.
While using the five levels of decomposition and sym2 wavelet for MRA-DWT/IDWT, we varied the wavelet family for DWT/IDWT-IF for Experiment 6, i.e., haar, sym2, db2, and bior 3.3. The best results (see Figure S6) were also produced using the db2 wavelet. The same outcome was previously obtained from the AT&T Face Dataset.
Against the challenges of dark images, our proposed method was proven to produce a high result of 96.1% accuracy. This result can be compared against the study in [59] using the same subset of the dataset, producing an accuracy of 95.4% but with the help of the contrast-limited adaptive histogram equalization (CLAHE) method inside the recognition system. Our proposed method performed better compared with the system using a contrast adjustment method.

3.2.2. Results for BeautyREC Face Dataset

Moving to the next dataset, the BeautyREC Face Dataset was examined to investigate our proposed method against the challenges of face images using makeup. It is highly desirable to consider that the BeautyREC Face Dataset is a result of transferring the makeup style of a reference image to the corresponding components (skin, lips, eyes) of a source image created by [41]. The dataset contains diverse variations in face pose and race.
First, we tried our proposed Exp. 5 and 6 method against the BeautyREC face dataset to observe the effect of the total number of images. Figure 16 shows the results of recognition accuracy for our proposed method. We noticed that using all 3000 images created a slight decrease in accuracy. Overall, the method has not yet produced satisfactory results for this dataset. The best result was 46% accuracy using only 1820 images and 44.4% using all images.
Next, we observed thirteen variations in our experiments (Figure 17) to find the most suitable setting and parameters to increase recognition performance. We varied the wavelet family for MRA-DWT/IDWT for both fusion with LP-IF (Exp. 5a–Exp. 5e) and DWT/IDWT-IF (Exp. 6a–Exp. 6e). The other wavelet families for MRA-DWT/IDWT (sym2, db2, bior2.6, and bior 3.3) did not improve the accuracy. Using haar for MRA-DWT/IDWT and db2 for DWT/IDWT-IF, we examined the effect of decomposition levels (Exp. 6f–Exp. 6h). Unfortunately, the variations in the level of decomposition also did not improve the results. We then investigated the wavelet family for DWT/IDWT-IF with haar for MRA-DWT/IDWT and five levels of decomposition (Exp. 6i–Exp. 6k). We found that db2 still produced better results. Last, we studied the fusion rules using min-max and max-min (Exp. 6l and Exp. 6m), yet still found that using the mean-mean rule was a better choice.
Although the purpose of [41] in creating BeautyREC was to produce a robust makeup transfer dataset and not to test the face recognition system, we still wanted to challenge our proposed method against this case. Regrettably, our proposed method produced only as high as 44.4% accuracy against this dataset. The most probable reasons were high head pose variations and makeup styles (see Figure 18), a non-uniform background, and the BeautyREC Face Dataset being imbalanced, resulting in our proposed method not producing high accuracy results.
To examine the effect using an imbalanced and a balanced dataset, we observed the performance of the confusion matrix for the BeautyREC Face Dataset. To create a balanced dataset, we gathered 5 images from 30 people in the dataset, resulting in a total of 150 images to be analyzed. It is also worth noting that we observed how imbalanced the BeautyREC Face Dataset is. There was originally a total of 41 people, and everyone provided a very diverse number of images, ranging from only 1 image to 182 images. The average number of images was 73 images, while the standard deviation was as high as 46.22. These values show that this dataset has a wide range and significant variability.
Table 5 (Figures S7 and S8) displays the results of the confusion matrix for this dataset. Table 5 shows that overall, our proposed method performed better if the data were balanced and used only five images per class/individual. The accuracy difference between the imbalanced and balanced datasets was around 24.51% to 32.16%. We also observed that the imbalanced dataset had a higher recall. However, the balanced dataset had the same recall or lower recall. High recall in a face recognition system means correctly identifying most faces it is supposed to recognize (fewer false negatives). On the other hand, low recall means the system misses more faces than it should recognize (higher false negatives). In the case of financial transactions using face recognition, generally, the system may prefer lower recall but balanced with higher precision, so the financial transactions can be stricter in their matching.
The accuracy, precision, and recall results varied vastly (Figure S7). The accuracy range was from 3.3% to 100%, the precision range was from 0% to 100%, and the recall range was from 0.2% to 75%. Meanwhile, a balanced dataset showed a narrower range for accuracy, precision, recall, and lowered misclassification results (Figure S8). Overall, after studying the results and understanding the distribution of the original imbalanced BeautyREC Face Dataset more deeply, a balanced dataset was much preferred for better performance in the recognition system.

3.2.3. Results for FEI and FEI-FE Face Database

In our last test, we examined our proposed method, Exp. 5 and 6, against the FEI and FEI-FE Face databases. The FEI Face Database has wider variations compared with the FEI-FE Face Database. The FEI Face Database not only has neutral and smiling face images, but also a 180-degree rotation of face profiles. On the other hand, the FEI-FE Face Database has only a frontal position with a neutral and smiling expression. Because the FEI-FE Face Database consists of only two images per person, we evaluated this data using 2-fold cross-validation.
As we can see from the previous experiments (against the AT&T and EYB Face datasets), the options of using the haar wavelet with one level of decomposition for MRA-DWT/IDWT and db2 with five levels of decomposition and the mean-mean fusion rule for DWT/IWDT-IF produced the best results, so in this investigation we did not vary the wavelet family. Figure 19 and Figure 20 display the recognition results for the FEI and FEI-FE Face Database, respectively. Unsurprisingly, our proposed method performed better on the FEI-FE Face Database than the FEI Face Database. The best accuracy for the FEI Face Database was 97.5%, while the best for the FEI-FE Face Database was 99.5%. The accuracy difference between the two databases was approximately 2–2.3%. As previously mentioned, the FEI-FE Face Database consists of only frontal face images with neutral and smiling expressions. At the same time, the FEI Face Database has variations in the 180-degree rotation of the face profile. This led to higher results for the FEI-FE Face Database. Our proposed method was tested against face expressions, such as neutral and smiling, and face profile rotation (180 degrees). It was proven to yield high accuracy results.

3.3. Comparisons with Other Methods

This section compares our proposed method results against state-of-the-art image and non-image fusion methods inside face recognition. Table 6 displays the comparisons, and we organized them based on the dataset employed. Our proposed method is comparable to the state-of-the-art methods’ results and produces high recognition performance.

4. Conclusions

In this research, we proposed a pixel-level image fusion of opposite frequencies from MRA-DWT/IDWT sub-bands with Gaussian-filtered results (high-frequency from the wavelet sub-bands and low-frequency from the Gaussian filtered) and MRA-DWT/IDWT sub-bands with Difference of Gaussian (low-frequency from the wavelet sub-bands and high-frequency from the Difference of Gaussian). The pixel-level fusion of creating one enhanced reconstructed image (Exp. 5 and 6) outperformed other possibilities inside the proposed method (Exp. 1–Exp. 4).
The proposed method was challenged against AT&T, EYB, EYB-Dark, BeautyREC, FEI, and FEI-FE Face Datasets. The results showed that our proposed method was robust and accurate against challenges such as lighting conditions, facial expressions, head pose, 180-degree rotation of the face profile, dark images, acquisition with a time gap, and conditions where the person uses attributes such as glasses. The proposed method is comparable to state-of-the-art methods and generates high recognition performance. The proposed method successfully achieved an accuracy of 99.2% for AT&T, 99.8% for EYB, 96.1% for EYB-Dark, 97.5% for FEI, and 99.5% for FEI-FE Face Dataset. The detailed results of the best parameters were as follows: overall, LP-IF and DWT/IDWT-IF both produced the best performance, but notably, DWT/IDWT-IF with sym2 performed better when tested against challenges such as dark images; the mean-mean fusion rule for five levels of decomposition DWT/IDWT-IF always yielded the best accuracy, while the support vector machine was undeniably the best classifier.
Our proposed method did not produce satisfactory results against the makeup face challenge (BeautyREC) due to the high variations in head pose, makeup style, and non-uniform background, and due to this dataset being an imbalanced dataset with a vast range in the number of images. For future development, we suggest the research use pre-processing methods, such as background removal and facial landmark detection. We also know that our proposed algorithm has a relatively long process and sequence of steps, so we aim to simplify the algorithm in future research while maintaining the recognition performance.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/a17110529/s1, Figure S1: The first possibility of variations inside our proposed method: choose AL or HG or VG or DG; Figure S2: The second possibility of variations inside our proposed method: a concatenated vector of extracted HoG from AL, HG, VG, and DG; Figure S3: The row-normalized confusion matrix from Exp. 5 using SVM with (a) quadratic kernel and (b) cubic kernel; Figure S4: The row-normalized confusion matrix from Exp. 6 using SVM with (a) quadratic kernel and (b) cubic kernel; Figure S5: Accuracy results (%) for the EYB-Dark Face Dataset from Experiment 6 using the db2 wavelet in DWT/IDWT-IF and sym2 in MRA=DWT/IDWT with variations in level of decomposition; Figure S6: Accuracy results (%) for the EYB-Dark Face Dataset from Experiment 6 using various wavelet families in DWT/IDWT-IF with five levels of decomposition and sym2 in MRA-DWT/IDWT; Figure S7: The row-normalized confusion matrix from Exp. 5 using SVM with (a) quadratic kernel and (b) cubic kernel for the imbalanced BeautyREC Face Dataset; Figure S8: The row-normalized confusion matrix from Exp. 6 using SVM with (a) quadratic kernel and (b) cubic kernel for the balanced BeautyREC Face Dataset.

Author Contributions

Conceptualization, R.L.; methodology, R.L. and M.A.; software, R.L. and J.A.; validation, J.A. and M.A.; investigation, R.L.; writing—original draft preparation, R.L.; writing—review and editing, J.A.; supervision, J.A. and M.A.; project administration, J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DRTPM, Ditjen Diktiristek Indonesia, 2024 Research Grant, Main Contract No: 105/E5/PG.02.00.PL/2024, and Derivative Contract No: 808/LL3/AL.04/2024; 01-1-4/651/SPK/VII/2024.

Data Availability Statement

The Database of Faces (AT&T) [40] can be found at https://cam-orl.co.uk/facedatabase.html (accessed on 14 November 2024). The BeautyREC Dataset [41] can be found at https://li-chongyi.github.io/BeautyREC_files/ (accessed on 14 November 2024). The EYB Dataset was created by [42,43]. The FEI Face Database (including frontal face subset FEI-FE) [44] can be found at https://fei.edu.br/~cet/facedatabase.html (accessed on 14 November 2024). The Laplacian pyramid image fusion toolbox [54] can be found at https://github.com/yuliu316316/MST-SR-Fusion-Toolbox (accessed on 14 November 2024).

Acknowledgments

The authors thank Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) Universitas Mercu Buana Jakarta for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, H.; Hu, J.; Yu, J.; Yu, N.; Wu, Q. Ufacenet: Research on multi-task face recognition algorithm based on CNN. Algorithms 2021, 14, 268. [Google Scholar] [CrossRef]
  2. Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, present, and future of face recognition: A review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
  3. Zhong, Y.; Oh, S.; Moon, H.C. Service transformation under industry 4.0: Investigating acceptance of facial recognition payment through an extended technology acceptance model. Technol. Soc. 2021, 64, 101515. [Google Scholar] [CrossRef]
  4. Li, C.; Li, H. Disentangling facial recognition payment service usage behavior: A trust perspective. Telemat. Inform. 2023, 77, 101939. [Google Scholar] [CrossRef]
  5. Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face recognition systems: A survey. Sensors 2020, 20, 342. [Google Scholar] [CrossRef]
  6. Muwardi, R.; Qin, H.; Gao, H.; Ghifarsyam, H.U.; Hajar, M.H.I.; Yunita, M. Research and Design of Fast Special Human Face Recognition System. In Proceedings of the 2nd International Conference on Broadband Communications, Wireless Sensors and Powering (BCWSP), Yogyakarta, Indonesia, 28–30 September 2020; pp. 68–73. [Google Scholar] [CrossRef]
  7. Setiawan, H.; Alaydrus, M.; Wahab, A. Multibranch Convolutional Neural Network for Gender and Age Identification Using Multiclass Classification And FaceNet Model. In Proceedings of the 7th International Conference on Informatics and Computing (ICIC), Denpasar, Indonesia, 8–9 December 2022. [Google Scholar] [CrossRef]
  8. Kaur, H.; Koundal, D.; Kadyan, V. Image Fusion Techniques: A Survey. Arch. Comput. Methods Eng. 2021, 28, 4425–4447. [Google Scholar] [CrossRef]
  9. Karim, S.; Tong, G.; Li, J.; Qadir, A.; Farooq, U.; Yu, Y. Current advances and future perspectives of image fusion: A comprehensive review. Inf. Fusion 2023, 90, 185–217. [Google Scholar] [CrossRef]
  10. Singh, S.; Gyaourova, A.; Bebis, G.; Pavlidis, I. Infrared and visible image image fusion for face recognition. In Proceedings of the Biometric Technology for Human Identification, Orlando, FL, USA, 12–16 April 2004; Volume 5404, pp. 585–596. [Google Scholar] [CrossRef]
  11. Heo, J.; Kong, S.G.; Abidi, B.R.; Abidi, M.A. Fusion of visual and thermal signatures with eyeglass removal for robust face recognition. In Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar] [CrossRef]
  12. Chen, X.; Wang, H.; Liang, Y.; Meng, Y.; Wang, S. A novel infrared and visible image fusion approach based on adversarial neural network. Sensors 2022, 22, 304. [Google Scholar] [CrossRef]
  13. Hüsken, M.; Brauckmann, M.; Gehlen, S.; von der Malsburg, C. Strategies and benefits of fusion of 2D and 3D face recognition. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)—Workshops, San Diego, CA, USA, 21–23 September 2005. [Google Scholar] [CrossRef]
  14. Kusuma, G.P.; Chua, C.S. Image level fusion method for multimodal 2D + 3D face recognition. In Image Analysis and Recognition, Proceedings of the 5th International Conference, ICIAR 2008, Póvoa de Varzim, Portugal, 25–27 June 2008; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar] [CrossRef]
  15. Ouamane, A.; Belahcene, M.; Benakcha, A.; Bourennane, S.; Taleb-Ahmed, A. Robust multimodal 2D and 3D face authentication using local feature fusion. Signal Image Video Process. 2016, 10, 129–137. [Google Scholar] [CrossRef]
  16. Sarangi, P.P.; Nayak, D.R.; Panda, M.; Majhi, B. A feature-level fusion based improved multimodal biometric recognition system using ear and profile face. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 1867–1898. [Google Scholar] [CrossRef]
  17. Alay, N.; Al-Baity, H.H. Deep learning approach for multimodal biometric recognition system based on fusion of iris, face, and finger vein traits. Sensors 2020, 20, 5523. [Google Scholar] [CrossRef] [PubMed]
  18. Safavipour, M.H.; Doostari, M.A.; Sadjedi, H. A hybrid approach to multimodal biometric recognition based on feature-level fusion of face, two irises, and both thumbprints. J. Med. Signals Sens. 2022, 12, 177–191. [Google Scholar] [CrossRef] [PubMed]
  19. Byahatti, P.; Shettar, M.S. Fusion Strategies for Multimodal Biometric System Using Face and Voice Cues. IOP Conf. Ser. Mater. Sci. Eng. 2020, 925, 012031. [Google Scholar] [CrossRef]
  20. AL-Shatnawi, A.; Al-Saqqar, F.; El-Bashir, M.; Nusir, M. Face Recognition Model based on the Laplacian Pyramid Fusion Technique. Int. J. Adv. Soft Comput. Its Appl. 2021, 13, 27–46. [Google Scholar]
  21. Alfawwaz, B.M.; Al-Shatnawi, A.; Al-Saqqar, F.; Nusir, M.; Yaseen, H. Face recognition system based on the multi-resolution singular value decomposition fusion technique. Int. J. Data Netw. Sci. 2022, 6, 1249–1260. [Google Scholar] [CrossRef]
  22. Alfawwaz, B.M.; Al-Shatnawi, A.; Al-Saqqar, F.; Nusir, M. Multi-Resolution Discrete Cosine Transform Fusion Technique Face Recognition Model. Data 2022, 7, 80. [Google Scholar] [CrossRef]
  23. Pong, K.H.; Lam, K.M. Multi-resolution feature fusion for face recognition. Pattern Recognit. 2014, 47, 556–567. [Google Scholar] [CrossRef]
  24. Zhang, J.; Yan, X.; Cheng, Z.; Shen, X. A face recognition algorithm based on feature fusion. Concurr. Comput. Pract. Exp. 2022, 34, e5748. [Google Scholar] [CrossRef]
  25. Zhu, Y.; Jiang, Y. Optimization of face recognition algorithm based on deep learning multi feature fusion driven by big data. Image Vis. Comput. 2020, 104, 104023. [Google Scholar] [CrossRef]
  26. Karanwal, S. Improved local descriptor (ILD): A novel fusion method in face recognition. Int. J. Inf. Technol. 2023, 15, 1885–1894. [Google Scholar] [CrossRef]
  27. Meng, L.; Yan, C.; Li, J.; Yin, J.; Liu, W.; Xie, H.; Li, L. Multi-Features Fusion and Decomposition for Age-Invariant Face Recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3146–3154. [Google Scholar] [CrossRef]
  28. Li, Y.; Gao, M. Face Recognition Algorithm Based on Multiscale Feature Fusion Network. Comput. Intell. Neurosci. 2022, 2022, 5810723. [Google Scholar] [CrossRef] [PubMed]
  29. Charoqdouz, E.; Hassanpour, H. Feature Extraction from Several Angular Faces Using a Deep Learning Based Fusion Technique for Face Recognition. Int. J. Eng. Trans. B Appl. 2023, 36, 1548–1555. [Google Scholar] [CrossRef]
  30. Kumar, P.M.A.; Raj, L.A.; Sagayam, K.M.; Ram, N.S. Expression invariant face recognition based on multi-level feature fusion and transfer learning technique. Multimed. Tools Appl. 2022, 81, 37183–37201. [Google Scholar] [CrossRef]
  31. Tiong, L.C.O.; Kim, S.T.; Ro, Y.M. Multimodal facial biometrics recognition: Dual-stream convolutional neural networks with multi-feature fusion layers. Image Vis. Comput. 2020, 102, 103977. [Google Scholar] [CrossRef]
  32. Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater Image Enhancement via Weighted Wavelet Visual Perception Fusion. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 2469–2483. [Google Scholar] [CrossRef]
  33. Ding, I.J.; Zheng, N.W. CNN Deep Learning with Wavelet Image Fusion of CCD RGB-IR and Depth-Grayscale Sensor Data for Hand Gesture Intention Recognition. Sensors 2022, 22, 803. [Google Scholar] [CrossRef]
  34. Bellamkonda, S.; Gopalan, N.P. An enhanced facial expression recognition model using local feature fusion of Gabor wavelets and local directionality patterns. Int. J. Ambient. Comput. Intell. 2020, 11, 48–70. [Google Scholar] [CrossRef]
  35. Huang, Z.H.; Li, W.J.; Wang, J.; Zhang, T. Face recognition based on pixel-level and feature-level fusion of the top-level’s wavelet sub-bands. Inf. Fusion 2015, 22, 95–104. [Google Scholar] [CrossRef]
  36. Wenjing, T.; Fei, G.; Renren, D.; Yujuan, S.; Ping, L. Face recognition based on the fusion of wavelet packet sub-images and fisher linear discriminant. Multimed. Tools Appl. 2017, 76, 22725–22740. [Google Scholar] [CrossRef]
  37. Chai, P.; Luo, X.; Zhang, Z. Image Fusion Using Quaternion Wavelet Transform and Multiple Features. IEEE Access 2017, 5, 6724–6734. [Google Scholar] [CrossRef]
  38. Ye, S. A Face Recognition Method Based on Multifeature Fusion. J. Sens. 2022, 2022, 2985484. [Google Scholar] [CrossRef]
  39. Dey, A.; Chowdhury, S.; Sing, J.K. Performance evaluation on image fusion techniques for face recognition. Int. J. Comput. Vis. Robot. 2018, 8, 455–475. [Google Scholar] [CrossRef]
  40. Samaria, F.S.; Harter, A.C. Parameterisation of a stochastic model for human face identification. In Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar] [CrossRef]
  41. Yan, Q.; Guo, C.; Zhao, J.; Dai, Y.; Loy, C.C.; Li, C. Beautyrec: Robust, efficient, and component-specific makeup transfer. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 1102–1110. [Google Scholar] [CrossRef]
  42. Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 643–660. [Google Scholar] [CrossRef]
  43. Lee, K.C.; Ho, J.; Kriegman, D.J. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 684–698. [Google Scholar] [CrossRef]
  44. Thomaz, C.E.; Giraldi, G.A. A new ranking method for principal components analysis and its application to face image analysis. Image Vis. Comput. 2010, 28, 902–913. [Google Scholar] [CrossRef]
  45. Starosolski, R. Hybrid adaptive lossless image compression based on discrete wavelet transform. Entropy 2020, 22, 751. [Google Scholar] [CrossRef]
  46. Sundararajan, D. Discrete Wavelet Transform: A Signal Processing Approach; John Wiley & Sons: Singapore, 2015. [Google Scholar]
  47. Burger, W.; Burge, M.J. Principles of Digital Image Processing: Advanced Methods; Springer: London, UK, 2013. [Google Scholar]
  48. Lionnie, R.; Apriono, C.; Gunawan, D. Eyes versus Eyebrows: A Comprehensive Evaluation Using the Multiscale Analysis and Curvature-Based Combination Methods in Partial Face Recognition. Algorithms 2022, 15, 208. [Google Scholar] [CrossRef]
  49. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  50. Li, J.; Zhang, J.; Yang, C.; Liu, H.; Zhao, Y.; Ye, Y. Comparative Analysis of Pixel-Level Fusion Algorithms and a New High-Resolution Dataset for SAR and Optical Image Fusion. Remote Sens. 2023, 15, 5514. [Google Scholar] [CrossRef]
  51. Liu, Y.; Wang, L.; Cheng, J.; Li, C.; Chen, X. Multi-focus image fusion: A Survey of the state of the art. Inf. Fusion 2020, 64, 71–91. [Google Scholar] [CrossRef]
  52. Burt, P.J.; Adelson, E.H. The Laplacian Pyramid as a Compact Image Code. IEEE Trans. Commun. 1983, 31, 532–540. [Google Scholar] [CrossRef]
  53. Burt, P.J.; Adelson, E.H. Merging Images Through Pattern Decomposition. In Applications of Digital Image Processing VIII, Proceedings the 29th Annual Technical Symposium, San Diego, CA, USA, 20–23 August 1985; SPIE: Bellingham, DC, USA, 1985; Volume 0575. [Google Scholar] [CrossRef]
  54. Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
  55. Li, H.; Manjunath, B.S.; Mitra, S.K. Multisensor Image Fusion Using the Wavelet Transform. Graph. Models Image Process. 1995, 57, 235–245. [Google Scholar] [CrossRef]
  56. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
  57. Teng, J.H.; Ong, T.S.; Connie, T.; Anbananthen, K.S.M.; Min, P.P. Optimized Score Level Fusion for Multi-Instance Finger Vein Recognition. Algorithms 2022, 15, 161. [Google Scholar] [CrossRef]
  58. Lionnie, R.; Apriono, C.; Chai, R.; Gunawan, D. Curvature Best Basis: A Novel Criterion to Dynamically Select a Single Best Basis as the Extracted Feature for Periocular Recognition. IEEE Access 2022, 10, 113523–113542. [Google Scholar] [CrossRef]
  59. Lionnie, R.; Hermanto, V. Human vs machine learning in face recognition: A case study from the travel industry. SINERGI 2024. in editing. [Google Scholar]
  60. Aleem, S.; Yang, P.; Masood, S.; Li, P.; Sheng, B. An accurate multi-modal biometric identification system for person identification via fusion of face and finger print. World Wide Web 2020, 23, 1299–1317. [Google Scholar] [CrossRef]
  61. Miakshyn, O.; Anufriiev, P.; Bashkov, Y. Face Recognition Technology Improving Using Convolutional Neural Networks. In Proceedings of the 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT), Kyiv, Ukraine, 15–17 December 2021; pp. 116–120. [Google Scholar] [CrossRef]
  62. Hung, B.T.; Khang, N.N. Student Attendance System Using Face Recognition. In Proceedings of the Integrated Intelligence Enable Networks and Computing. Algorithms for Intelligent Systems, Gopeshwar, India, 25–27 May 2020; Springer: Singapore, 2021. [Google Scholar] [CrossRef]
  63. Bahrami, S.; Dornaika, F.; Bosaghzadeh, A. Joint auto-weighted graph fusion and scalable semi-supervised learning. Inf. Fusion 2021, 66, 213–228. [Google Scholar] [CrossRef]
  64. Zhang, Y.; Zheng, S.; Zhang, X.; Cui, Z. Multi-resolution dictionary learning method based on sample expansion and its application in face recognition. Signal Image Video Process. 2021, 15, 307–313. [Google Scholar] [CrossRef]
  65. Kas, M.; El-Merabet, Y.; Ruichek, Y.; Messoussi, R. A comprehensive comparative study of handcrafted methods for face recognition LBP-like and non LBP operators. Multimed. Tools Appl. 2020, 79, 375–413. [Google Scholar] [CrossRef]
  66. Nikan, S.; Ahmadi, M. Local gradient-based illumination invariant face recognition using local phase quantisation and multi-resolution local binary pattern fusion. IET Image Process. 2015, 9, 12–21. [Google Scholar] [CrossRef]
  67. Curtidor, A.; Baydyk, T.; Kussul, E. Analysis of random local descriptors in face recognition. Electronics 2021, 10, 1358. [Google Scholar] [CrossRef]
  68. Talab, M.A.; Awang, S.; Ansari, M.D. A Novel Statistical Feature Analysis-Based Global and Local Method for Face Recognition. Int. J. Opt. 2020, 2020, 4967034. [Google Scholar] [CrossRef]
  69. Al-Ghrairi, A.H.T.; Mohammed, A.A.; Sameen, E.Z. Face detection and recognition with 180 degree rotation based on principal component analysis algorithm. IAES Int. J. Artif. Intell. 2022, 11, 593–602. [Google Scholar] [CrossRef]
  70. Al-Shebani, Q.; Premarante, P.; Vial, P.J. A hybrid feature extraction technique for face recognition. In Proceedings of the International Proceedings of Computer Science and Information Technology, Shanghai, China, 27–28 March 2014; pp. 166–170. Available online: https://ro.uow.edu.au/eispapers/2231/ (accessed on 14 November 2024).
Figure 1. Examples of images inside each dataset: (a) AT&T [40], (b) BeautyREC [41], (c) EYB [42,43], (d) EYB-Dark [42,43], (e) FEI [44], (f) FEI-FE [44].
Figure 1. Examples of images inside each dataset: (a) AT&T [40], (b) BeautyREC [41], (c) EYB [42,43], (d) EYB-Dark [42,43], (e) FEI [44], (f) FEI-FE [44].
Algorithms 17 00529 g001
Figure 2. The flowchart of our proposed method.
Figure 2. The flowchart of our proposed method.
Algorithms 17 00529 g002
Figure 3. The MRA-DWT sub-bands (from left to right): approximation, horizontal, vertical, diagonal sub-bands with Haar and one level of decomposition.
Figure 3. The MRA-DWT sub-bands (from left to right): approximation, horizontal, vertical, diagonal sub-bands with Haar and one level of decomposition.
Algorithms 17 00529 g003
Figure 4. The illustration of the scaling function (left) and wavelet function (right) from the Haar wavelet.
Figure 4. The illustration of the scaling function (left) and wavelet function (right) from the Haar wavelet.
Algorithms 17 00529 g004
Figure 5. Results from Gaussian filtering and the Difference of Gaussian (from left to right): original image, Gaussian filtered image with σ 1 , Gaussian filtered image with σ2, Difference of Gaussian.
Figure 5. Results from Gaussian filtering and the Difference of Gaussian (from left to right): original image, Gaussian filtered image with σ 1 , Gaussian filtered image with σ2, Difference of Gaussian.
Algorithms 17 00529 g005
Figure 6. Example of results from proposed fusion (from top to bottom): AL, HG, VG, DG with image fusion DWT/IDWT-IF using the mean-mean rule.
Figure 6. Example of results from proposed fusion (from top to bottom): AL, HG, VG, DG with image fusion DWT/IDWT-IF using the mean-mean rule.
Algorithms 17 00529 g006
Figure 7. The comparison of processing times for the AT&T Face Dataset; Exp. 5; Exp. 6 using db2 in DWT/IDWT-IF with levels of decomposition: one (Exp. 6a); three (Exp. 6b); five (Exp. 6c); and seven (Exp. 6d).
Figure 7. The comparison of processing times for the AT&T Face Dataset; Exp. 5; Exp. 6 using db2 in DWT/IDWT-IF with levels of decomposition: one (Exp. 6a); three (Exp. 6b); five (Exp. 6c); and seven (Exp. 6d).
Algorithms 17 00529 g007
Figure 8. Accuracy results (%) for the AT&T Face Dataset (proposed method) using different wavelet families in MRA-DWT/IDWT with one level of decomposition: (a) Experiment 5; (b) Experiment 6.
Figure 8. Accuracy results (%) for the AT&T Face Dataset (proposed method) using different wavelet families in MRA-DWT/IDWT with one level of decomposition: (a) Experiment 5; (b) Experiment 6.
Algorithms 17 00529 g008
Figure 9. Accuracy results (%) for AT&T Face Dataset from Experiment 6 (proposed method) using db2 wavelet in DWT/IDWT-IF and bior3.3 in MRA-DWT/IDWT with variations in the level of decomposition.
Figure 9. Accuracy results (%) for AT&T Face Dataset from Experiment 6 (proposed method) using db2 wavelet in DWT/IDWT-IF and bior3.3 in MRA-DWT/IDWT with variations in the level of decomposition.
Algorithms 17 00529 g009
Figure 10. Accuracy results (%) for AT&T Face Dataset from Experiment 6 (proposed method) using various wavelet families in DWT/IDWT-IF with five levels of decomposition and bior3.3 in MRA-DWT/IDWT.
Figure 10. Accuracy results (%) for AT&T Face Dataset from Experiment 6 (proposed method) using various wavelet families in DWT/IDWT-IF with five levels of decomposition and bior3.3 in MRA-DWT/IDWT.
Algorithms 17 00529 g010
Figure 11. Accuracy results (%) for the EYB Face Dataset for Experiments 2, 4, 5, and 6.
Figure 11. Accuracy results (%) for the EYB Face Dataset for Experiments 2, 4, 5, and 6.
Algorithms 17 00529 g011
Figure 12. Accuracy results (%) for the EYB-Dark Face Dataset for Experiments 2, 4, 5, and 6.
Figure 12. Accuracy results (%) for the EYB-Dark Face Dataset for Experiments 2, 4, 5, and 6.
Algorithms 17 00529 g012
Figure 13. Accuracy results (%) for the EYB-Dark Face Dataset for Experiment 6 using fusion rules: mean-mean, min-max, and max-min.
Figure 13. Accuracy results (%) for the EYB-Dark Face Dataset for Experiment 6 using fusion rules: mean-mean, min-max, and max-min.
Algorithms 17 00529 g013
Figure 14. Fusion results of DWT/IDWT-IF with d2 and five levels of decomposition (from left to right) top: original image, using min-max rule, max-min rule, and mean-mean rule; bottom: fusion results but scaled based on the pixel value range.
Figure 14. Fusion results of DWT/IDWT-IF with d2 and five levels of decomposition (from left to right) top: original image, using min-max rule, max-min rule, and mean-mean rule; bottom: fusion results but scaled based on the pixel value range.
Algorithms 17 00529 g014
Figure 15. Accuracy results (%) for the EYB-Dark Face Dataset for Experiment 6 with the mean-mean fusion rule using different wavelet families for MRA-DWT/IDWT.
Figure 15. Accuracy results (%) for the EYB-Dark Face Dataset for Experiment 6 with the mean-mean fusion rule using different wavelet families for MRA-DWT/IDWT.
Algorithms 17 00529 g015
Figure 16. Accuracy results (%) for the BeautyREC Dataset from Exp. 5 and 6 with variations of employing 1820 images and all (3000) images.
Figure 16. Accuracy results (%) for the BeautyREC Dataset from Exp. 5 and 6 with variations of employing 1820 images and all (3000) images.
Algorithms 17 00529 g016
Figure 17. Accuracy results (%) for the BeautyREC Dataset: Exp. 5, LP-IF with MRA-DWT/IDWT (a) haar, (b) db2, (c) sym2, (d) bior2.6, (e) bior3.3; Exp. 6, DWT/IDWT-IF with MRA-DWT/IDWT (a) haar, (b) db2, (c) sym2, (d) bior2.6, (e) bior3.3; Exp. 6, DWT/IDWT-IF with haar for MRA-DWT/IDWT and db2 wavelet with total level of decomposition (f) one, (g) three, (h) seven; Exp. 6, DWT/IDWT-IF with haar for MRA-DWT/IDWT and five levels of decomposition using wavelets (i) haar, (j) sym2, (k) bior 2.6; Exp. 6, DWT/IDWT-IF using fusion rule (l) min-max, (m) max-min. All results came from SVM with the cubic kernel.
Figure 17. Accuracy results (%) for the BeautyREC Dataset: Exp. 5, LP-IF with MRA-DWT/IDWT (a) haar, (b) db2, (c) sym2, (d) bior2.6, (e) bior3.3; Exp. 6, DWT/IDWT-IF with MRA-DWT/IDWT (a) haar, (b) db2, (c) sym2, (d) bior2.6, (e) bior3.3; Exp. 6, DWT/IDWT-IF with haar for MRA-DWT/IDWT and db2 wavelet with total level of decomposition (f) one, (g) three, (h) seven; Exp. 6, DWT/IDWT-IF with haar for MRA-DWT/IDWT and five levels of decomposition using wavelets (i) haar, (j) sym2, (k) bior 2.6; Exp. 6, DWT/IDWT-IF using fusion rule (l) min-max, (m) max-min. All results came from SVM with the cubic kernel.
Algorithms 17 00529 g017
Figure 18. Example of high variations for one person inside the BeautyREC Face Dataset.
Figure 18. Example of high variations for one person inside the BeautyREC Face Dataset.
Algorithms 17 00529 g018
Figure 19. Accuracy results (%) for the FEI Face Database from Exp. 5 and 6.
Figure 19. Accuracy results (%) for the FEI Face Database from Exp. 5 and 6.
Algorithms 17 00529 g019
Figure 20. Accuracy results (%) for the FEI-FE Face Database from Exp. 5 and 6.
Figure 20. Accuracy results (%) for the FEI-FE Face Database from Exp. 5 and 6.
Algorithms 17 00529 g020
Table 1. The overall experiment designs.
Table 1. The overall experiment designs.
Exp.FlowchartFusion LevelMethod of Fusion *Coefficients
1Figure S1Pixel-levelLP-IFAL, HG, VG, DG (choose one)
2Figure S2Pixel- and feature-levelLP-IFConcatenated vector (c-AL+HG+VG+DG)
3Figure S1Pixel-levelDWT-IF **AL, HG, VG, DG (choose one)
4Figure S2Pixel- and feature-levelDWT-IF **Concatenated vector (c-AL+HG+VG+DG)
5Figure 2Pixel-levelLP-IFEnhanced reconstructed image using MRA-IDWT
6Figure 2Pixel-levelDWT-IF **Enhanced reconstructed image using MRA-IDWT
* Default MRA-DWT/IDWT using Haar wavelet with one level of decomposition. ** Default DWT/IDWT-IF using db2 with five levels of decomposition with the mean-mean fusion rule.
Table 2. Face datasets and their challenges.
Table 2. Face datasets and their challenges.
DatasetChallenges
AT&TLighting conditions, facial expressions, glasses and no glasses, image acquisition with a time gap
EYBPose, variation in lighting conditions
EYB-DarkVery dark face images (1 normal and 3 dark images)
BeautyRECFace makeup (transfer), imbalanced dataset
FEI180-degree rotation of the face profile
FEI-FENeutral and smiling expression
Table 3. The average accuracy results (%) for AT&T Face Dataset from all experiments.
Table 3. The average accuracy results (%) for AT&T Face Dataset from all experiments.
Exp.Coefficients1-NNSVMNN-ReLU
LinearQuadraticCubic
1AL97.097.597.897.896.2
HG96.895.897.898.096.5
VG96.596.097.597.296.5
DG96.296.898.098.297.0
2(c-AL+HG+VG+DG)97.297.598.298.297.8
3AL9797.898.298.297
HG96.896.297.897.896.8
VG9796.297.897.596.2
DG9696.898.297.597
4(c-AL+HG+VG+DG)96.297.598.298.297.5
5I’96.898.299.299.297.2
6I’9697.29897.895.5
Table 4. The confusion matrix performance results (%) for the AT&T Face Dataset.
Table 4. The confusion matrix performance results (%) for the AT&T Face Dataset.
Exp.Number of ImagesSVM KernelAccuracyPrecisionRecallF1-Score
510Quadratic98.598.598.6498.57
Cubic98.598.598.6498.57
5Quadratic939393.8793.43
Cubic93.593.594.8294.16
610Quadratic97.7597.7597.9397.84
Cubic97.7597.7597.9597.85
5Quadratic949495.1694.57
Cubic949495.0494.52
Table 5. The confusion matrix performance results (%) for the BeautyREC Face Dataset.
Table 5. The confusion matrix performance results (%) for the BeautyREC Face Dataset.
Exp.DatasetSVM KernelAccuracyPrecisionRecallF1-Score
5ImbalancedQuadratic28.8228.8238.8433.09
Cubic30.730.743.7636.09
BalancedQuadratic53.3353.3353.3353.33
Cubic50.6750.6748.7949.71
6ImbalancedQuadratic21.1721.1725.723.22
Cubic22.7722.7730.0925.92
BalancedQuadratic53.3353.3353.3353.33
Cubic525250.0350.99
Table 6. Comparison with other methods (fusion and non-fusion).
Table 6. Comparison with other methods (fusion and non-fusion).
DatasetResearchFusion/Non-FusionMethodsAccuracy/Recognition Rate (%)
AT&T (ORL)[60]FusionExtended local binary patterns and reduction with local non-negative matrix factorization (on unimodal face)97.20
[22]FusionMulti-resolution discrete cosine transform fusion97.70
[21]FusionMulti-resolution singular value decomposition fusion97.78
[61]Non-fusionConvolutional neural network98
[20]FusionLaplacian pyramid fusion98.2
[62]Non-fusionHaar-cascade and convolutional neural network98.57
OursFusionOpposite-frequency features of MRA-DWT/IDWT and image fusion with LP-IF or DWT/IDWT-IF99.2
EYB[63]Fusion (graph)Auto-weighted multi-view semi-supervised learning method88.57
[64]Non-fusionMulti-resolution dictionary learning method based on sample expansion89.59
[65]Non-fusionLocal neighborhood difference pattern97.64
[66]FusionPre-processing and local phase quantization and multi-scale local binary pattern with score-level fusion and decision-level fusion98.30
OursFusionOpposite-frequency features of MRA-DWT/IDWT and image fusion with LP-IF or DWT/IDWT-IF99.8
EYB-Dark[59]non-fusionHistogram of oriented gradient and contrast-limited adaptive histogram equalization95.4
OursFusionOpposite-frequency features of MRA-DWT/IDWT and image fusion with LP-IF or DWT/IDWT-IF96.1
FEI[67]Non-fusionPermutation coding neural classifier based on random local descriptor93.57
[68]Non-fusionIntegration of the binary-level occurrencematrix and the fuzzy local binary pattern and neural network classifier95.27
[69]Non-fusionPrincipal component analysis96
OursFusionOpposite-frequency features of MRA-DWT/IDWT and image fusion with LP-IF or DWT/IDWT-IF97.5
FEI-FE[70]Non-fusionUsing eye region with Gabor transform and nearest neighbor97
OursFusionOpposite-frequency features of MRA-DWT/IDWT and image fusion with LP-IF or DWT/IDWT-IF99.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lionnie, R.; Andika, J.; Alaydrus, M. A New Approach to Recognize Faces Amidst Challenges: Fusion Between the Opposite Frequencies of the Multi-Resolution Features. Algorithms 2024, 17, 529. https://doi.org/10.3390/a17110529

AMA Style

Lionnie R, Andika J, Alaydrus M. A New Approach to Recognize Faces Amidst Challenges: Fusion Between the Opposite Frequencies of the Multi-Resolution Features. Algorithms. 2024; 17(11):529. https://doi.org/10.3390/a17110529

Chicago/Turabian Style

Lionnie, Regina, Julpri Andika, and Mudrik Alaydrus. 2024. "A New Approach to Recognize Faces Amidst Challenges: Fusion Between the Opposite Frequencies of the Multi-Resolution Features" Algorithms 17, no. 11: 529. https://doi.org/10.3390/a17110529

APA Style

Lionnie, R., Andika, J., & Alaydrus, M. (2024). A New Approach to Recognize Faces Amidst Challenges: Fusion Between the Opposite Frequencies of the Multi-Resolution Features. Algorithms, 17(11), 529. https://doi.org/10.3390/a17110529

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop