HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images

Li, Bin; Du, Ruiqi; Li, Jie; Tang, Yuekai

doi:10.3390/sym16081015

Open AccessArticle

HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images

by

Bin Li

^*

,

Ruiqi Du

,

Jie Li

and

Yuekai Tang

School of Computer Science, Northeast Electric Power University, Jilin 132012, China

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(8), 1015; https://doi.org/10.3390/sym16081015

Submission received: 12 July 2024 / Revised: 2 August 2024 / Accepted: 6 August 2024 / Published: 8 August 2024

(This article belongs to the Special Issue Symmetry with Optimization in Real-World Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The task of converting facial sketch images to facial images aims to generate reasonable and clear facial images from a given facial sketch image. However, the facial images generated by existing methods are often blurry and suffer from edge overflow issues. In this study, we proposed HE-CycleGAN, a novel facial-image generation network with a symmetric architecture. The proposed HE-CycleGAN has two identical generators, two identical patch discriminators, and two identical edge discriminators. Therefore, HE-CycleGAN forms a symmetrical architecture. We added a newly designed high-frequency feature extractor (HFFE) to the generator of HE-CycleGAN. The HFFE can extract high-frequency detail features from the feature maps’ output, using the three convolutional modules at the front end of the generator, and feed them to the end of the generator to enrich the details of the generated face. To address the issue of facial edge overflow, we have designed a multi-scale wavelet edge discriminator (MSWED) to determine the rationality of facial edges and better constrain them. We trained and tested the proposed HE-CycleGAN on CUHK, XM2VTS, and AR datasets. The experimental results indicate that HE-CycleGAN can generate higher quality facial images than several state-of-the-art methods.

Keywords:

facial sketches to images; cycle-consistency adversarial network; high-frequency feature extractor; multi-scale wavelet edge discriminator

1. Introduction

The conversion between facial sketch images and facial images aims to establish a mapping between facial sketch images and facial images. The technology of converting facial sketch images into facial images can be applied in the field of criminal investigation, providing assistance for more accurate recognition or the retrieval of the suspect’s face. The technology of converting facial images into facial sketch images can recognize manga facial images and apply them to the field of digital entertainment. This study focuses on converting facial sketches into facial images.

By reviewing existing methods for converting facial sketch images to facial images, we found that most of these methods suffer from edge overflow in the generated facial images. At the position of the purple box in Figure 1, the facial images generated using existing methods [1,2,3,4] have varying degrees of overflow at the edges of the face. This indicates that existing methods have insufficient constraints on facial edges.

At present, the representative methods in the field of image transformation are the method based on generative adversarial networks (GANs [5]) and the method based on CycleGAN [1]. Figure 1 illustrates a representative GAN-based method, Pix2Pix [5], and three CycleGAN-based methods, CSGAN [2], CDGAN [3], and LSCIT [4]. However, GAN-based methods require a large amount of fully aligned training data, and obtaining a large amount of fully aligned training data is difficult [6]. In the task of converting facial sketch images to facial images, although there are paired facial sketch images and facial images, they are not completely aligned. As shown in Figure 2, the face length of the facial sketch images in the first column are significantly longer than the corresponding facial images (Ground Truth, GT) in the second column. This may make it difficult for Pix2Pix (GAN-based method) to generate high-quality facial images. As shown in Figure 2, the facial images generated by Pix2Pix are generally very blurry. However, CycleGAN does not require fully aligned facial sketch image and facial image data [7]. As shown in Figure 2, the quality of facial images generated by CycleGAN is significantly better than that generated by Pix2Pix. However, we found that the facial images generated by CycleGAN still have issues with blurring, and some details in the facial sketch images are lost. As shown in Figure 3, we input the facial sketch image (Figure 3a) into CycleGAN to generate a facial image (Figure 3b). We found that some detailed information was lost at the hair gap, shown in the red box in Figure 3b. However, we observed the phenomenon that when we input the generated facial image (Figure 3b) into CycleGAN and reconstructed it into a facial sketch image (Figure 3c), the reconstructed result restored the detailed information at the hair gap. This phenomenon indicates that in order to satisfy the constraint of cycle consistency, the generator tends to hide some information in the input image [8,9,10,11], which will make it difficult for CycleGAN to generate rich details.

In order to ensure that the generated facial images have rich details and reasonable edges, this paper proposes HE-CycleGAN: CycleGAN based on high-frequency detail features and edge constraints. HE-CycleGAN is a deep neural network with a symmetric structure consisting of two generators and four discriminators. The main innovation of this paper lies in extracting high-frequency information from the feature maps of deep neural networks and propagating the high-frequency information backward and reusing it. Based on this idea, we designed two generators and two additional discriminators for HE-CycleGAN. In the generator of HE-CycleGAN, we use a high-frequency feature extractor (HFFE) to extract high-frequency detail information from the feature maps at the encoding end and send it to the decoding end to enrich the details of the generated face. We have designed a multi-scale wavelet edge discriminator (MSWED) to address the issue of facial edge overflow. MSWED utilizes wavelet transforms to extract facial edge information at each scale. Afterwards, MSWED judges the rationality of facial edges based on facial edge information, achieving the goal of constraining facial edges and alleviating edge overflow problems. Our contribution has the four following aspects:

(1): We propose a network called HE-CycleGAN for converting facial sketch images to facial images.
(2): We added a high-frequency feature extractor (HFFE) to the generator of HE-CycleGAN, which alleviates the problem of losing details in facial images generated by the traditional CycleGAN to meet the constraint of cyclic consistency.
(3): We designed a multi-scale wavelet edge discriminator (MSWED). This MSWED can solve the problem of generated facial edge overflow.
(4): Finally, we quantitatively and qualitatively validated the effectiveness of the proposed HE-CycleGAN.

The rest of this paper is organized as follows: Section 2 reviews the relevant work on the conversion of facial sketch images to facial images. Section 3 provides a detailed introduction to the proposed HE-CycleGAN. The effectiveness of HE-CycleGAN is demonstrated through experiments detailed in Section 4. Finally, Section 5 provides a summary of this paper.

2. Related Work

Early methods were based on traditional machine learning methods. Wang and Tang [12] proposed a multi-scale Markov Random Field (MRF) model. To generate facial images, the facial regions are divided into overlapping small blocks for learning. Xiao et al. [13] proposed an embedded hidden Markov model (EHMM) that can learn non-linear mapping between facial sketch image and facial image pairs with fewer training samples and generate pseudo-images in the form of sketch images. Based on traditional methods, training models do not require a large amount of data, but the generated facial images are very blurry and often have a lot of missing texture information.

Deep learning methods have shown better results than traditional methods. Deep learning methods mainly include convolutional neural network (CNN)-based methods and GAN-based methods. CNNs are a special type of neural network that use convolutional computation for feature extraction, enabling the classification and prediction of input data [14]. Zhang et al. [15] proposed an end-to-end fully convolutional network to directly model the non-linear mapping between facial images and sketch images, providing effective pixel prediction. However, the facial images generated by this solution lack clear edges and contours. The GAN consists of a generator and a discriminator. The generator is responsible for generating realistic images, while the discriminator is responsible for distinguishing between generated images and real images [16,17]. The Pix2Pix proposed by Isola et al. [5] provides a universal framework for image-to-image conversion problems using conditional GANs. Li et al. [18] proposed a face image generation method with a conditional self-attention GAN. Capturing image structural information through self-attention alleviates the problem of difficulty in generating facial structures. Chen et al. [19] proposed an implicit spatial modeling GAN to alleviate the problem of overfitting facial sketch images. Li et al. [20] proposed a two-stage GAN for face generation. This involved maintaining semantic consistency through semantic loss and generating fine-grained information through color refinement loss to better preserve facial attributes. Sun et al. [21] proposed a new end-to-end generative adversarial fusion model. By parameterizing the Tahn activation function to learn the facial illumination highlight distribution, the problem of facial illumination highlight distribution is alleviated. However, as discussed in the third paragraph of the introduction section, due to the lack of fully aligned training data, the facial images generated by GAN-based methods are relatively blurry and lack rich texture details.

Zhu et al. [1] proposed that the cycle-consistency adversarial network (CycleGAN) can serve as a new framework for image-to-image conversion tasks. Compared to GAN, CycleGAN has achieved significant improvement in the quality of facial images generated. Babu K K et al. [2] proposed a cyclic-synthesized generative adversarial network (CSGAN). By using a cyclic-synthesized loss function to reduce the difference between the generated image and the cyclic image, the quality of image generation can be improved. Compared to CycleGAN, CSGAN has achieved significant improvements in generating facial images, but there are still issues with blurred details and facial edge overflow. Babu K K et al. [3] proposed a cyclic discriminative generative adversarial network (CDGAN) based on the cyclic-synthesized generative adversarial networks. By introducing new cyclic discriminative adversarial loss, the complexity of adversarial learning can be increased, further improving the quality of image generation. Compared to CSGAN, CDGAN generates clearer facial images, but there are still a few issues with blurred details and facial edge overflow. Wang et al. [4] proposed an unsupervised long–short cycle-consistent adversarial network (LSCIT) to address the issue of error accumulation during image conversion in the cycle-consistency adversarial network. This was done by utilizing the constraint of long–short cycle consistency to eliminate error accumulation. LSCIT can generate clearer facial images. However, there is a problem with artifacts in some of the facial images generated by LSCIT, and there is still overflow at the edges of the face. In order to meet the constraint of cycle consistency, the generator of CycleGAN tends to hide some information in the input image. Therefore, the facial images generated by the CycleGAN-based method mentioned above are more or less blurry, and there is also a problem with facial edge overflow. The HE-CycleGAN proposed in this paper aims to solve the problems of image blurring and facial edge overflow generated by CycleGAN-based methods. The high-frequency feature extractor (HFFE) in the generator of HE-CycleGAN will alleviate the problem of losing details in facial images generated by traditional CycleGAN. Meanwhile, the multi-scale wavelet edge discriminator (MSWED) of HE-CycleGAN can solve the problem of edge overflow in generated facial images.

3. The Proposed Method

The architecture of the proposed HE-CycleGAN is shown in Figure 4. HE-CycleGAN has six parts: two generators,

G_{photo}

and

G_{sketch}

, and four discriminators,

D_{photo}

,

D_{sketch}

,

D_{photo - edge}

, and

D_{sketch - edge}

.

G_{photo}

is used to convert facial sketch images into facial images, and

G_{sketch}

is used to convert facial images into facial sketch images.

D_{photo}

is used to determine the authenticity of the generated facial image, while

D_{sketch}

is used to determine the authenticity of the generated facial sketch image.

D_{photo - edge}

is used to determine whether the edges of the generated facial image are reasonable, and

D_{sketch - edge}

is used to determine whether the edges of the generated facial sketch image are reasonable. The entire network has a symmetrical architecture; that is, the

G_{photo}

and

G_{sketch}

networks have the same structure, the

D_{photo}

and

D_{sketch}

networks have the same structure, and the

D_{photo - edge}

and

D_{sketch - edge}

networks also have the same structure.

D_{photo}

and

D_{sketch}

are patch discriminators [22], while

D_{photo - edge}

and

D_{sketch - edge}

are multi-scale wavelet edge discriminators (MSWEDs).

HE-CycleGAN is divided into two branches. One of the branches inputs the facial sketch image

X

into the generator

G_{photo}

to generate the facial image

\hat{Y}

. The generated facial image

\hat{Y}

is evaluated using

D_{photo}

and

D_{photo - edge}

, and

\hat{Y}

is input into

G_{sketch}

to reconstruct the facial sketch image

\hat{X}

. The other branch inputs the facial image

Y

into the generator

G_{sketch}

to generate a facial sketch image

\hat{X}

. The generated facial sketch image

\hat{X}

is evaluated using

D_{sketch}

and

D_{sketch - edge}

, and

\hat{X}

is input into

G_{photo}

to reconstruct the facial image

\hat{Y}

.

D_{photo}

and

D_{photo - edge}

optimize

G_{photo}

through adversarial loss.

D_{sketch}

and

D_{sketch - edge}

optimize

G_{sketch}

through adversarial loss.

3.1. Generator Network Structure

The two generators

G_{photo}

and

G_{sketch}

of HE-CycleGAN have the same structure. As shown in Figure 5, the generator consists of an encoder, a decoder, nine residual modules (RB), and three high-frequency feature extractors (HFFEs). The encoder consists of three convolutional modules, each containing a convolutional layer, instance normalization, and activation function (ReLU). As shown in Figure 5, the residual block (RB) in the dashed box consists of two convolution modules. The first convolutional module includes a convolutional layer, instance normalization, and activation function (ReLU). The second convolution module only contains a convolutional layer and instance normalization. Nine residual modules are responsible for extracting advanced semantic features and transmitting them to the decoder. The decoder consists of three convolutional modules, with the first two consisting of upsampling, a convolutional layer, instance normalization, and activation function (ReLU). The first two convolution modules use upsampling joint convolution instead of transposed convolution to avoid the phenomenon of chessboard artifacts [23]. The last convolutional module consists of a convolutional layer and a Tanh activation function. This is due to the fact that the facial sketch image input to generator

G_{photo}

is normalized to between [−1, 1], and the output of generator

G_{photo}

needs to be input to generator

G_{sketch}

to reconstruct the facial sketch image. The output of the generator must also be normalized to between [−1, 1], and the Tanh activation function can ensure that the pixel range of the output is between [−1, 1], so the final output uses the Tanh activation function. The decoder is responsible for gradually restoring the details and dimensions of high-level semantic features, and for decoding them into facial images. The HFFE modules extract high-frequency features from three convolutional modules in the encoder and use skip connections to deliver them to the corresponding three convolutional modules in the decoder. The high-frequency features and the original features in the decoder form a combination of details and semantics in the decoder. The size of the feature maps output by each module is shown in Figure 5.

Due to the issue of cyclic consistency in the conversion of facial sketch images to facial images using the traditional CycleGAN, i.e., the problem of high-frequency details being hidden, it is difficult for the generator to generate rich detail information. We used wavelet transforms to obtain high-frequency detail information in the HFFE, and also employed channel attention (ECANet [24]) to enhance the high-frequency detail information.

The HFFE module mainly consists of wavelet transforms and channel attention (ECANet [24]). In this paper, we adopted the classic wavelet transform method: the Haar wavelet transform. We use Algorithm 1 to extract high-frequency detail features. The Haar wavelet transform contains four kernels,

{LL}^{T}

,

{LH}^{T}

,

{HL}^{T}

, and

{HH}^{T}

, with low (L)- and high (H)-filtering kernels being 1/√ 2 [1, 1] and 1/√ 2 [−1, 1], respectively.

F_{{LL}^{T}}

represents the feature map obtained after low-frequency filtering in the horizontal and vertical directions.

F_{{LH}^{T}}

represents the feature map obtained after horizontal low-frequency filtering and vertical high-frequency filtering.

F_{{HL}^{T}}

represents the feature map obtained after horizontal high-frequency filtering and vertical low-frequency filtering.

F_{{HH}^{T}}

represents the feature map obtained after high-frequency filtering in the horizontal and vertical directions. The algorithm for extracting four feature components (

F_{{LL}^{T}}

,

F_{{HL}^{T}}

,

F_{{LH}^{T}}

,

F_{{HH}^{T}}

) using the Haar wavelet transform is as follows.

Algorithm 1. Extract features using Haar wavelet transform

Input: F,

{L L}^{T},

{L H}^{T},

{H L}^{T},

{H H}^{T}

// F is the input feature
Output: W //include four feature components

F_{{L L}^{T}},

F_{{L H}^{T}},

F_{{H L}^{T}},

and

F_{{H H}^{T}}

K

\leftarrow

[

{L L}^{T},

{L H}^{T},

{H L}^{T},

{H H}^{T}

] // define four filtering kernels as a list

F_{{L L}^{T}},

F_{{L H}^{T}},

F_{{H L}^{T}},

F_{{H H}^{T}}

\leftarrow

Randomly initialize four feature components
W

\leftarrow

[F_{{L L}^{T}},

F_{{L H}^{T}},

F_{{H L}^{T}},

F_{{H H}^{T}}

]
Step

\leftarrow

2 // step size during filtering
for each

W_{i}, K_{i}

in W, K
// traverse the position of each element in the output feature
for each

W_{i x}, W_{i y}

in

W_{i}

W_{i} [W_{i x}

][

W_{i y}

]

\leftarrow 0

// traverse the position of each element in the filtering kernel
for each

K_{i x}, K_{i y} i n K_{i}

S \leftarrow

Calculate the product of

F [W_{i x} * S t e p + K_{i x}] [W_{i y} * S t e p + K_{i y}]

and

K_{i} [K_{i x}] [K_{i y}]

W_{i} [W_{i x}

][

W_{i y}

]

\leftarrow

Add S to

W_{i} [W_{i x}

][

W_{i y}

]
end
end
end

The structure of HFFE is shown in Figure 6. The

F_{{LL}^{T}}

component is mainly composed of low-frequency information. The

F_{{LH}^{T}}

,

F_{{HL}^{T}}

and

F_{{HH}^{T}}

components contain high-frequency detail information. Therefore, after performing wavelet transform on the feature map of the generator encoding end, we only retain the

F_{{LH}^{T}}

,

F_{{HL}^{T}}

, and

F_{{HH}^{T}}

components. The filtering step size of wavelet transform is 2. Therefore, the width and height of the

F_{{LH}^{T}}

,

F_{{HL}^{T}}

, and

F_{{HH}^{T}}

feature maps obtained after wavelet transform will become half of the original. So, we upsample the three feature maps

F_{{LH}^{T}}

,

F_{{HL}^{T}}

, and

F_{{HH}^{T}}

to their original sizes. The symbol

\oplus

represents adding up the upsampled features element by element. Afterwards, the feature map was passed through the channel attention (ECANet [24]) network. ECANet can emphasize useful features and suppress irrelevant ones, thereby further enhancing high-frequency detail information.

3.2. Discriminator Network Structure

Our proposed HE-CycleGAN includes four discriminators:

D_{photo}

,

D_{sketch}

,

D_{photo - edge}

, and

D_{sketch - edge}

.

D_{photo}

and

D_{sketch}

are patch discriminators [22] used to determine the authenticity of the generated facial images.

D_{photo - edge}

and

D_{sketch - edge}

are our newly designed multi-scale wavelet edge discriminators (MSWEDs).

The MSWED utilizes wavelet transform to extract facial edge information at each scale. Afterwards, MSWED judges the rationality of facial edges based on facial edge information. By utilizing MSWED to encourage the generator to generate more reasonable facial edges, the problem of facial edge overflow can be alleviated.

As shown in Figure 7, MSWED consists of six downsample modules, seven wavelet transform modules, seven convolutional layers, and one regression layer. The wavelet transform module includes the Haar wavelet transform and a 1 × 1 convolution. The box below, in Figure 7, shows the structure of the wavelet transform module. Wavelet transform is used to extract facial edge information (

F_{{LH}^{T}}

,

F_{{HL}^{T}}

, and

F_{{HH}^{T}}

) from different directions, and to concatenate the extracted facial edge information from different directions according to the channel. The 1 × 1 convolution is used to increase the dimension of the channel. For convolutional layers, the first six convolutional layers include 3 × 3 convolutions (with a step size of 2), instance normalization, and a LeakyReLU activation function, while the seventh convolutional layer only contains 3 × 3 convolutions. Convolutional layers are used to fuse facial edge features extracted at different scales, gradually reducing feature dimensions and expanding receptive fields. The last layer is the regression layer, which uses fully connected layers for scoring to determine whether the generated image edges are reasonable. The output size of each module is shown in Figure 7.

3.3. The Loss Function

Our proposed HE-CycleGAN includes four loss functions: adversarial loss, multi-scale wavelet edge discrimination adversarial loss, cycle-consistency loss, and color-identify loss. As shown in Figure 4, adversarial loss acts on the patch discriminator, multi-scale wavelet edge discrimination adversarial loss acts on the multi-scale wavelet edge discriminator, and cycle-consistency loss and color-identify loss act on the generator.

3.3.1. Adversarial Loss

The generator

G_{photo}

converts the facial sketch image into a facial image, and the discriminator

D_{photo}

uses Equation (1) to distinguish between the generated facial image

G_{photo} (x)

and the real facial image y.

L_{P - GAN} (G_{photo}, D_{photo}, X, Y) = E_{y ~ pdata (y)} {‖D_{photo} (y) - 1‖}_{2} + E_{x ~ pdata (x)} {‖D_{photo} (G_{photo} (x))‖}_{2}

(1)

Similarly, the generator

G_{sketch}

converts the facial image into a facial sketch image, and the discriminator

D_{sketch}

uses Equation (2) to distinguish between the generated facial sketch image

G_{sketch} (y)

and the real facial sketch image x.

L_{P - GAN} (G_{sketch}, D_{sketch}, X, Y) = E_{x ~ pdata (x)} {‖D_{sketch} (x) - 1‖}_{2} + E_{y ~ pdata (y)} {‖D_{sketch} (G_{sketch} (y))‖}_{2}

(2)

The adversarial loss function is defined as Equation (3):

L_{P - cycleGAN} (G_{photo}, G_{sketch}, D_{photo}, D_{sketch}) = L_{P - GAN} (G_{photo}, D_{photo}, X, Y) + L_{P - GAN} (G_{sketch}, D_{sketch}, X, Y)

(3)

3.3.2. Multi-Scale Wavelet Edge Discrimination Adversarial Loss

The discriminator

D_{photo - edge}

uses Equation (4) to encourage the generator

G_{photo}

to generate facial images with reasonable edges.

L_{E - GAN} (G_{photo}, D_{photo - edge}, X, Y) = E_{y ~ pdata (y)} {‖D_{photo - edge} (y) - 1‖}_{2} + E_{x ~ pdata (x)} {‖D_{photo - edge} (G_{photo} (x))‖}_{2}

(4)

Similarly, the adversarial loss of discriminator

D_{sketch - edge}

is defined as Equation (5):

L_{E - GAN} (G_{sketch}, D_{sketch - edge}, X, Y) = E_{x ~ pdata (x)} {‖D_{sketch - edge} (x) - 1‖}_{2} + E_{y ~ pdata (y)} {‖D_{sketch - edge} (G_{sketch} (y))‖}_{2}

(5)

The multi-scale wavelet edge discrimination adversarial loss function is defined as Equation (6):

L_{E - cycleGAN} (G_{photo}, G_{sketch}, D_{photo - edge}, D_{sketch - edge}) = L_{E - GAN} (G_{photo}, D_{photo - edge}, X, Y) + L_{E - GAN} (G_{sketch}, D_{sketch - edge}, X, Y)

(6)

3.3.3. Cycle-Consistency Loss

The purpose of the cycle-consistency loss function is to ensure that the input facial sketch image x, after passing through

G_{photo}

and

G_{sketch}

, obtains a reconstructed image that remains as consistent as possible with the original image x. That is,

x \to G_{photo} \to G_{sketch} (G_{photo}) \approx x

. Similarly, it can be concluded that the reconstructed image y obtained after passing through

G_{sketch}

and

G_{photo}

still maintains as much content consistency as possible with the original image y. That is,

y \to G_{sketch} \to G_{photo} (G_{sketch}) \approx y

. The cycle-consistency loss function adopts the

L_{1}

norm and is defined as Equation (7):

L_{cycle} (G_{photo}, G_{sketch}) = E_{x ~ pdata (x)} ‖G_{sketch} (G_{photo} (x)) - x‖ + E_{y ~ pdata (y)} ‖G_{photo} (G_{sketch} (y)) - y‖

(7)

3.3.4. Color-Identify Loss

The purpose of color-identify loss is to maintain the original color tone during image conversion. When calculating the color-identify loss of generator

G_{photo}

, the input of generator

G_{photo}

is the facial image y, and the generated result is also the facial image. Similarly, the color-identify loss of generator

G_{sketch}

can be found. The color-identify loss function adopts the

L_{1}

norm and is defined as Equation (8):

L_{color - id} (G_{photo}, G_{sketch}) = E_{x ~ pdata (x)} ‖G_{sketch} (x) - x‖ + E_{y ~ pdata (y)} ‖G_{photo} (y) - y‖

(8)

3.3.5. HE-CycleGAN Objective Function

The final loss function of HE-CycleGAN is defined as Equation (9), where α and β control the weight coefficients of different loss functions. Based on experience, we set alpha and beta to 10.0 and 5.0, respectively [1,25].

L_{HE - CycleGAN} (G_{photo}, G_{sketch}, D_{photo}, D_{sketch}, D_{photo - edge}, D_{sketch - edge}) = L_{P - cycleGAN} (G_{photo}, G_{sketch}, D_{photo}, D_{sketch}) + L_{E - cycleGAN} (G_{photo}, G_{sketch}, D_{photo - edge}, D_{sketch - edge}) + α * L_{cycle} (G_{photo}, G_{sketch}) + β * L_{color - id} (G_{photo}, G_{sketch})

(9)

Therefore, the overall optimization objective of the HE-CycleGAN network can be expressed as Equation (10):

G_{photo}, G_{sketch} = \arg \min_{G_{photo}, G_{sketch}} \max_{D_{photo}, D_{sketch}, D_{photo - edge}, D_{sketch - edge}} L_{HE - CycleGAN} (G_{photo}, G_{sketch}, D_{photo}, D_{sketch}, D_{photo - edge}, D_{sketch - edge})

(10)

4. Experiments

4.1. Datasets

We evaluated the performance of HE-CycleGAN on the CUHK_student [12,26] dataset, the XM2VTS [27] dataset, and the AR [28] dataset in the CUHK Face Sketch Database (CUFS). The CUHK_student dataset contains 188 facial images from the student database of the CUHK, each with artist-drawn sketch images. The XM2VTS dataset contains 295 facial images, each with a corresponding sketch image. The characters in this dataset cover multiple races and have significant age differences. The AR dataset contains 123 sets of facial-image-sketch samples. The image sizes in all three datasets are 250 × 200. We selected 100 pairs of samples for training and 88 pairs of samples for testing in the CUHK_student dataset. In total, 195 pairs of samples were selected for training and 100 pairs of samples were tested in the XM2VTS dataset. In total, 80 pairs of samples were selected for training and 43 pairs of samples were tested in the AR dataset. Table 1 lists the characteristics of the different datasets. To adapt to the input of the network, we set the size of all images to 256 × 256 and normalized pixel values to between [−1, 1].

4.2. Experimental Procedure

We used the Adam optimizer to update the network weights. The number of training epochs was 500, the batch size was set to 1, and the learning rate of the generator and discriminator was 0.0002 in the first 100 iterations, gradually decreasing to 0 after 100 iterations. The GPU model was NVIDIA Tesla T4, the graphics memory was 16GB, the operating system was Ubuntu 18.04, and the CUDA version was 11.8. The proposed model was implemented in Python version 3.10 of Python.

We used three commonly used evaluation metrics in the field of image conversion to evaluate the objective quality of the generated images: Structural Similarity (SSIM) [29], Learned Perceptual Image Patch Similarity (LPIPS) [30], and Fréchet inception distance (FID) [31]. SSIM is used to measure the similarity between generated images and real images, including luminance, contrast, and structure. Its range is between [0, 1], and the closer SSIM is to 1, the more similar the generated image is to the real image. LPIPS evaluates the perceptual differences between generated images and real images through deep learning models, and LPIPS is more in line with human perception. The lower the value of LPIPS, the more similar the two images are. FID is a measure of the distance between the feature vectors of generated images and real images. The lower the FID, the more similar the two sets of images are. The calculation formulas for SSIM, LPIPS, and FID are (11), (12), and (13), respectively.

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (σ_{xy} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(11)

In Equation (11), x and y represent the generated image and the real image,

μ_{x}

and

u_{y}

represent the mean of the generated image and the real image,

σ_{x}

and

σ_{y}

represent the variance of the generated image and the real image, and

σ_{xy}

represents the covariance of the generated image and the real image. The parameters c₁ and c₂ are numerical constants, and their function is to avoid the denominator being 0 in Equation (11). The parameters c₁ and c₂ can be calculated using Equation (12),

c_{1} = {(k_{1} L)}^{2}, c_{2} = {(k_{2} L)}^{2}

(12)

where k₁ = 0.01, k₂ = 0.03, and L = 255.

LPIPS (x, y) = \sum_{l} \frac{1}{H_{l} W_{l}} \sum_{h, w} {‖ω_{l} ⊙ ({\hat{x}}_{hw}^{l} - {\hat{y}}_{hw}^{l})‖}_{2}^{2}

(13)

In Equation (13), x and y are the generated image and the real image, respectively.

{\hat{x}}_{hw}^{l}

and

{\hat{y}}_{hw}^{l}

are the l-th layer features extracted from the generated image x and the real image y using VGG19 [32].

ω_{l}

is the scaling parameter, and

{‖\cdot‖}_{2}^{2}

is the square of the

L_{2}

norm.

FID (x, y) = {‖μ_{x} - μ_{y}‖}_{2}^{2} + Tr (Σ_{x} + Σ_{y} - 2 {(Σ_{x} Σ_{y})}^{\frac{1}{2}})

(14)

In Equation (14), x and y represent the generated image and the real image, respectively, and Inception V3 [33] is used to extract the feature vectors of the generated image and the real image, respectively.

μ_{x}

and

u_{y}

are the means of the feature vectors of the generated image and the real image,

Σ_{x}

and

Σ_{y}

are the covariance matrices of the generated image and the real image features,

Tr

represents the trace of the matrix, and

{‖\cdot‖}_{2}^{2}

is the square of the

L_{2}

norm.

4.3. Result Analysis

We conducted comparative experiments between the proposed method and existing methods: Pix2Pix [5], CycleGAN [1], CSGAN [2], CDGAN [3], and LSCIT [4]. Figure 8, Figure 9 and Figure 10 show the facial images generated by different methods on three datasets, respectively.

As shown in the first line of Figure 8, the result generated by Pix2Pix shows very blurry hair strands in the bangs area. In contrast, CycleGAN, CSGAN, CDGAN, and LSCIT can generate hair strands in the bangs area, but the generated hair strands are relatively sparse. In addition, eyebrows generated by other methods are relatively sparse. The hair strands in the bangs in the image generated by the proposed HE-CycleGAN are closer to those of the sketch image, and the eyebrows are also more similar to the sketch image. In the second row of Figure 8 (Figure 9 shows the enlarged results of these image), the result generated by Pix2Pix does not include the double eyelids in the sketch image. Although CycleGAN, CSGAN, CDGAN, and LSCIT can generate double eyelids, the generated double eyelids are relatively blurry. The image generated by HE-CycleGAN has the clearest double eyelids.

As shown in the first and second lines of Figure 10, Pix2Pix does not generate hair strands at the position of the red circle. However, CycleGAN, CSGAN, CDGAN, and LSCIT can generate hair strands at the position of the red circle. But the generated hair strands are relatively blurry. The proposed HE-CycleGAN generates hair strands that are closer to the sketch images and have greater clarity. As shown in the first and second lines of Figure 11, the hairs generated by the comparison method are relatively sparse. The proposed HE-CycleGAN generates hairs that are closer to that in the sketch image. In the third row (Figure 12 shows its enlarged result), the double eyelids generated by Pix2Pix, CycleGAN, CSGAN, CDGAN, and LSCIT are relatively blurry. The proposed HE-CycleGAN generates images with clearer double eyelids. As shown in the red circle in Figure 11, there are artifacts in the faces generated by CycleGAN and LSCIT.

In lines three and four of Figure 8 and lines three, four, and five of Figure 10, the facial edges generated by Pix2Pix are relatively reasonable, but overall, they are very blurry. In contrast, the faces generated by CycleGAN, CSGAN, CDGAN, and LSCIT have seen significant improvements, but there is an issue with facial edge overflow at the position of the red box. The proposed HE-CycleGAN generates facial edges that are more reasonable.

In summary, the facial images generated by our proposed HE-CycleGAN have clearer details and reasonable edges, which visually match the meaning expressed by the facial sketch images better.

Table 2, Table 3 and Table 4 show the quantitative comparison results of different methods on the CUHK_student, XM2VTS, and AR datasets, respectively. Our proposed HE-CycleGAN method has shown improvements in three metrics: Structural Similarity (SSIM), Learned Perceptual Image Patch Similarity (LPISP), and Fréchet inception distance (FID).

4.4. Ablation Studies

Table 5, Table 6 and Table 7 show the results of ablation studies conducted on three datasets. We used the original CycleGAN [1] as the baseline. As seen from the results in Table 5, Table 6 and Table 7, HE-CycleGAN achieved the highest SSIM, LPIPS, and FID values. The SSIM, LPIPS, and FID values of HE-CycleGAN with the HFFE removed and HE-CycleGAN with the MSWED removed were lower than those of HE-CycleGAN, but higher than the original CycleGAN. The above results demonstrate the effectiveness of the HFFE and the MSWED in HE-CycleGAN.

To verify the effectiveness of the channel attention (ECANet [24]) used in the HFFE module. We added HFFE modules and HFFE modules without ECANet to CycleGAN [1] for ablation studies. Table 8, Table 9 and Table 10 show the results of verifying the effectiveness of ECANet on the CUHK_student, XM2VTS, and AR datasets. From the results in Table 8, Table 9 and Table 10, it can be seen that introducing ECANet into the HFFE module is effective and can improve the performance of the model.

5. Conclusions

In this paper, we proposed HE-CycleGAN, a new network for converting facial sketch images to facial images. We added a high-frequency feature extractor (HFFE) to the generator of HE-CycleGAN, which can retain more detailed features. We also designed a multi-scale wavelet edge discriminator (MSWED). This MSWED can better constrain facial edges and avoid the problem of facial edge overflow. The experimental results on the CUHK_student dataset, XM2VTS dataset, and AR dataset show that the SSIM value of the proposed HE-CycleGAN is about 2% higher than that of the state-of-the-art method (LSCIT), while the LPIPS and FID values are 2% and 16% lower, respectively, than the state-of-the-art method (LSCIT). The proposed HE-CycleGAN has the following potential applications, such as converting facial photos into sketch images, or, after training with other styles of images, HE-CycleGAN can convert facial images into cartoons or other styles of images. Due to the additional detail features provided by wavelet transform for the HE-CycleGAN generator, we speculate that a very small number of facial images generated by HE-CycleGAN may exhibit excessive details when faced with a large number of generation demands. Excessive detailing refers to the presence of excessive hair and beard details in the generated facial images. In the next step of our research, we will first test the proposed HE-CycleGAN through a large number of actual images. If there is excessive detailing, we will investigate how to adaptively adjust the number of detailed features provided to the generator to avoid possible situations of over-detailing.

Author Contributions

Conceptualization, methodology, B.L.; software, validation, visualization, R.D.; writing—original draft preparation, writing—review and editing, R.D., J.L., and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the research project of the Jilin Provincial Department of Education under Grant no. JJKH20240153KJ.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

This article is not about human research.

Data Availability Statement

The data presented in this study are openly available; reference number [14,21,22,23].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, J.Y.; Park, T.; Isola, P. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Babu, K.K.; Dubey, S.R. CSGAN: Cyclic-synthesized generative adversarial networks for image-to-image transformation. Expert Syst. Appl. 2021, 169, 114431. [Google Scholar] [CrossRef]
Babu, K.K.; Dubey, S.R. Cdgan: Cyclic discriminative generative adversarial networks for image-to-image transformation. J. Vis. Commun. Image Represent. 2022, 82, 103382. [Google Scholar] [CrossRef]
Wang, G.; Shi, H.; Chen, Y.; Wu, B. Unsupervised image-to-image translation via long-short cycle-consistent adversarial networks. Appl. Intell. 2023, 53, 17243–17259. [Google Scholar] [CrossRef]
Isola, p.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Senapati, R.K.; Satvika, R.; Anmandla, A.; Ashesh Reddy, G.; Anil Kumar, C. Image-to-image translation using Pix2Pix GAN and cycle GAN. In International Conference on Data Intelligence and Cognitive Informatics; Springer Nature Singapore: Singapore, 2023; pp. 573–586. [Google Scholar]
Zhang, Y.; Yu, L.; Sun, B.; He, J. ENG-Face: Cross-domain heterogeneous face synthesis with enhanced asymmetric CycleGAN. Appl. Intell. 2022, 52, 15295–15307. [Google Scholar] [CrossRef]
Chu, C.; Zhmoginov, A.; Sandler, M. Cyclegan, a master of steganography. arXiv 2017, arXiv:1712.02950. [Google Scholar]
Porav, H.; Musat, V.; Newman, P. Reducing Steganography In Cycle-consistency GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019; pp. 78–82. [Google Scholar]
Gao, Y.; Wei, F.; Bao, J.; Gu, S.; Chen, D.; Wen, F.; Lian, Z. High-fidelity and arbitrary face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 16115–16124. [Google Scholar]
Lin, C.T.; Kew, J.L.; Chan, C.S.; Lai, S.H.; Zach, C. Cycle-object consistency for image-to-image domain adaptation. Pattern Recognit. 2023, 138, 109416. [Google Scholar] [CrossRef]
Wang, X.; Tang, X. Face photo-sketch synthesis and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 1955–1967. [Google Scholar] [CrossRef] [PubMed]
Xiao, B.; Gao, X.; Tao, D.; Li, X. A new approach for face recognition by sketches in photos. Signal Process. 2009, 89, 1576–1588. [Google Scholar] [CrossRef]
Bono, F.M.; Radicioni, L.; Cinquemani, S.; Conese, C.; Tarabini, M. Development of soft sensors based on neural networks for detection of anomaly working condition in automated machinery. In Proceedings of the NDE 4.0, Predictive Maintenance, and Communication and Energy Systems in a Globally Networked World, Long Beach, CA, USA, 4–10 April 2022; pp. 56–70. [Google Scholar]
Zhang, L.; Lin, L.; Wu, X.; Ding, S.; Zhang, L. End-to-end photo-sketch generation via fully convolutional representation learning. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China, 23–26 June 2015. [Google Scholar]
Zhou, G.; Fan, Y.; Shi, J.; Lu, Y.; Shen, J. Conditional generative adversarial networks for domain transfer: A survey. Appl. Sci. 2022, 12, 8350. [Google Scholar] [CrossRef]
Porkodi, S.P.; Sarada, V.; Maik, V.; Gurushankar, K. Generic image application using gans (generative adversarial networks): A review. Evol. Syst. 2023, 14, 903–917. [Google Scholar] [CrossRef]
Li, Y.; Chen, X.; Wu, F.; Zha, Z.J. Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2323–2331. [Google Scholar]
Chen, S.Y.; Su, W.; Gao, L.; Xia, S.; Fu, H. Deep generation of face images from sketches. arXiv 2020, arXiv:2006.01047. [Google Scholar]
Li, L.; Tang, J.; Shao, Z.; Tan, X.; Ma, L. Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement. Vis. Comput. 2022, 38, 3577–3594. [Google Scholar] [CrossRef]
Sun, J.; Yu, H.; Zhang, J.J.; Dong, J.; Yu, H.; Zhong, G. Face image-sketch synthesis via generative adversarial fusion. Neural Netw. 2022, 154, 179–189. [Google Scholar] [CrossRef] [PubMed]
Shao, X.; Qiang, Z.; Dai, F.; He, L.; Lin, H. Face Image Completion Based on GAN Prior. Electronics 2022, 11, 1997. [Google Scholar] [CrossRef]
Ren, G.; Geng, W.; Guan, P.; Cao, Z.; Yu, J. Pixel-wise grasp detection via twin deconvolution and multi-dimensional attention. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4002–4010. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Gao, G.; Lai, H.; Jia, Z. Unsupervised image dedusting via a cycle-consistent generative adversarial network. Remote Sens. 2023, 15, 1311. [Google Scholar] [CrossRef]
Zhang, W.; Wang, X.; Tang, X. Coupled information-theoretic encoding for face photo-sketch recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 20–25 June 2011. [Google Scholar]
Koch, B.; Grbić, R. One-shot lip-based biometric authentication: Extending behavioral features with authentication phrase information. Image Vis. Comput. 2024, 142, 104900. [Google Scholar] [CrossRef]
Liu, F.; Chen, D.; Wang, F.; Li, Z.; Xu, F. Deep learning based single sample face recognition: A survey. Artif. Intell. Rev. 2023, 56, 2723–2748. [Google Scholar] [CrossRef]
Rajeswari, G.; Ithaya Rani, P. Face occlusion removal for face recognition using the related face by structural similarity index measure and principal component analysis. J. Intell. Fuzzy Syst. 2022, 42, 5335–5350. [Google Scholar] [CrossRef]
Ko, K.; Yeom, T.; Lee, M. Superstargan: Generative adversarial networks for image-to-image translation in large-scale domains. Neural Netw. 2023, 162, 330–339. [Google Scholar] [CrossRef]
Kynkäänniemi, T.; Karras, T.; Aittala, M.; Aila, T.; Lehtinen, J. The role of imagenet classes in fréchet inception distance. arXiv 2022, arXiv:2203.06026. [Google Scholar]
Song, Z.; Zhang, Z.; Fang, F.; Fan, Z.; Lu, J. Deep semantic-aware remote sensing image deblurring. Signal Process. 2023, 211, 109108. [Google Scholar] [CrossRef]
Jayasumana, S.; Ramalingam, S.; Veit, A.; Glasner, D.; Chakrabarti, A.; Kumar, S. Rethinking fid: Towards a better evaluation metric for image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]

Figure 1. Edge overflow phenomenon in existing methods for generating facial images.

Figure 2. Results generated by Pix2Pix [5] and CycleGAN [1] trained with non-aligned data.

Figure 3. Results generated by CycleGAN [1]: (a) input facial sketch image; (b) the generated facial image; (c) image reconstructed using (b).

Figure 4. The architecture of HE-CycleGAN.

Figure 5. The architecture of the generator.

Figure 6. The structure of the HFFE.

Figure 7. The architecture of the MSWED.

Figure 8. Qualitative comparison results of different methods on the CUHK_student dataset.

Figure 9. Enlarged results of different comparison methods in the second row of Figure 8.

Figure 10. Qualitative comparison results of different methods on the XM2VTS dataset.

Figure 11. Qualitative comparison results of different methods on the AR dataset.

Figure 12. Enlarged results of different comparison methods in the third row of Figure 11.

Table 1. Characteristics of different datasets.

Datasets	Number of Sample Pairs	Size	Train/Test
CUHK_student	188	250 × 200	100/88
XM2VTS	295	250 × 200	195/100
AR	123	250 × 200	80/43

Table 2. Comparison results of different methods on the CUHK_student dataset.

	Pix2Pix [5]	CycleGAN [1]	CSGAN [2]	CDGAN [3]	LSCIT [4]	Ours
SSIM ⬆	0.6866	0.6938	0.6837	0.6916	0.6944	0.7118
LPIPS ⬇	0.2756	0.2300	0.2524	0.2270	0.2281	0.2017
FID ⬇	127.6600	65.5658	87.0142	60.4071	67.2641	51.4870

Table 3. Comparison results of different methods on the XM2VTS dataset.

	Pix2Pix [5]	CycleGAN [1]	CSGAN [2]	CDGAN [3]	LSCIT [4]	Ours
SSIM ⬆	0.5834	0.5940	0.5984	0.5967	0.6057	0.6109
LPIPS ⬇	0.2481	0.2371	0.2426	0.2453	0.2452	0.2207
FID ⬇	66.5135	47.1245	58.0198	47.1513	50.8334	41.2961

Table 4. Comparison results of different methods on the AR dataset.

	Pix2Pix [5]	CycleGAN [1]	CSGAN [2]	CDGAN [3]	LSCIT [4]	Ours
SSIM ⬆	0.6836	0.6801	0.6930	0.6830	0.6816	0.7048
LPIPS ⬇	0.2423	0.2596	0.2276	0.2585	0.2529	0.2128
FID ⬇	99.0394	92.0436	77.0084	77.5533	74.7337	51.8288

Table 5. The results of ablation studies on the CUHK_student dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
CycleGAN [1]	0.6938	0.2300	65.5658
+HFFE	0.7034	0.2080	54.0132
+MSWED	0.7085	0.2107	56.7533
HE-CycleGAN	0.7118	0.2017	51.4870

Table 6. The results of ablation studies on the XM2VTS dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
CycleGAN [1]	0.5940	0.2371	47.1245
+HFFE	0.6004	0.2288	43.3593
+MSWED	0.6031	0.2250	41.9002
HE-CycleGAN	0.6109	0.2207	41.2961

Table 7. The results of ablation studies on the AR dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
CycleGAN [1]	0.6801	0.2596	92.0436
+HFFE	0.7029	0.2169	52.6087
+MSWED	0.6978	0.2240	64.8579
HE-CycleGAN	0.7048	0.2128	51.8288

Table 8. The results of verifying the validity of ECANet on the CUHK_student dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
+HFFE	0.7034	0.2080	54.0132
HFFE − ECANet [24]	0.6969	0.2115	56.8686

Table 9. The results of validating the effectiveness of ECANet on the XM2VTS dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
+HFFE	0.6004	0.2288	43.3593
HFFE − ECANet [24]	0.5978	0.2340	44.9255

Table 10. The results of validating the effectiveness of ECANet on the AR dataset.

Method	SSIM ⬆	LPIPS ⬇	FID ⬇
+HFFE	0.7029	0.2169	52.6087
HFFE − ECANet [24]	0.6971	0.2203	56.2123

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, B.; Du, R.; Li, J.; Tang, Y. HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images. Symmetry 2024, 16, 1015. https://doi.org/10.3390/sym16081015

AMA Style

Li B, Du R, Li J, Tang Y. HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images. Symmetry. 2024; 16(8):1015. https://doi.org/10.3390/sym16081015

Chicago/Turabian Style

Li, Bin, Ruiqi Du, Jie Li, and Yuekai Tang. 2024. "HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images" Symmetry 16, no. 8: 1015. https://doi.org/10.3390/sym16081015

APA Style

Li, B., Du, R., Li, J., & Tang, Y. (2024). HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images. Symmetry, 16(8), 1015. https://doi.org/10.3390/sym16081015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HE-CycleGAN: A Symmetric Network Based on High-Frequency Features and Edge Constraints Used to Convert Facial Sketches to Images

Abstract

1. Introduction

2. Related Work

3. The Proposed Method

3.1. Generator Network Structure

3.2. Discriminator Network Structure

3.3. The Loss Function

3.3.1. Adversarial Loss

3.3.2. Multi-Scale Wavelet Edge Discrimination Adversarial Loss

3.3.3. Cycle-Consistency Loss

3.3.4. Color-Identify Loss

3.3.5. HE-CycleGAN Objective Function

4. Experiments

4.1. Datasets

4.2. Experimental Procedure

4.3. Result Analysis

4.4. Ablation Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI