Next Article in Journal
User Experience Design for Social Robots: A Case Study in Integrating Embodiment
Previous Article in Journal
Using Machine Learning Algorithms to Determine the Post-COVID State of a Person by Their Rhythmogram
Previous Article in Special Issue
Assessment of Various Multimodal Fusion Approaches Using Synthetic Aperture Radar (SAR) and Electro-Optical (EO) Imagery for Vehicle Classification via Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adversarial and Random Transformations for Robust Domain Adaptation and Generalization

Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
*
Authors to whom correspondence should be addressed.
Sensors 2023, 23(11), 5273; https://doi.org/10.3390/s23115273
Submission received: 29 April 2023 / Revised: 18 May 2023 / Accepted: 29 May 2023 / Published: 1 June 2023

Abstract

:
Data augmentation has been widely used to improve generalization in training deep neural networks. Recent works show that using worst-case transformations or adversarial augmentation strategies can significantly improve accuracy and robustness. However, due to the non-differentiable properties of image transformations, searching algorithms such as reinforcement learning or evolution strategy have to be applied, which are not computationally practical for large-scale problems. In this work, we show that by simply applying consistency training with random data augmentation, state-of-the-art results on domain adaptation (DA) and generalization (DG) can be obtained. To further improve the accuracy and robustness with adversarial examples, we propose a differentiable adversarial data augmentation method based on spatial transformer networks (STNs). The combined adversarial and random-transformation-based method outperforms the state-of-the-art on multiple DA and DG benchmark datasets. Furthermore, the proposed method shows desirable robustness to corruption, which is also validated on commonly used datasets.

1. Introduction

For modern computer vision applications, we expect a model trained on large-scale datasets to be able to perform uniformly well across various testing scenarios. For example, consider the perception system of a self-driving car; we want it to be able to generalize well across weather conditions and city environments. However, current supervised-learning-based models remain weak when it comes to out-of-distribution generalization [1]. When testing and training data are drawn from different distributions, the model can suffer from a significant accuracy drop. This is known as the domain shift problem, which has drawn increasing attention in recent years [1,2,3,4].
Domain adaptation (DA) and domain generalization (DG) are two typical techniques used to address the domain shift problem. DA and DG aim to utilize one or multiple labeled source domains to learn a model that performs well on an unlabeled target domain. The major difference between DA and DG is that DA methods require target data during training, whereas DG methods do not require target data in the training phase. DA can be categorized as supervised, semi-supervised, and unsupervised, depending on the availability of the labels of target data. In this paper, we consider unsupervised DA, which does not require labels of target data. In recent years, many works have been proposed to address either DA or DG problems [3,5]. In this work, we address both DA and DG in a unified framework.
Data augmentation is an effective technique for reducing overfitting and has been widely used in many computer vision tasks to improve the generalization ability [6] of the model. Recent studies show that using worst-case transformations or adversarial augmentation strategies can greatly improve the generalization and robustness of the model [7,8]. However, due to the non-differentiable properties of image transformations, searching algorithms such as reinforcement learning [9,10] or the evolution strategy [8] have to be applied, which are not computationally practical for large scale problems. In this work, we are concerned with the effectiveness of data augmentation for DA and DG, especially the adversarial data augmentation strategies, without using heavy-searching-based methods.
Motivated by the recent success of RandomAugment [11] on improving the generalization of deep learning models and consistency training in semi-supervised and unsupervised learning [12,13,14], we propose a unified DA and DG method by incorporating consistency training with random data augmentation. The idea is quite simple. When conducting a forward pass in neural networks, we force the randomly augmented and non-augmented pair of training examples to have similar responses by applying a consistency loss. Because consistency training does not require labeled examples, we can apply it with unlabeled target domain data for domain adaptation training. Consistency training and source-domain supervised training are both within a joint multi-task training framework and can be trained end-to-end. Random augmentation can also be regarded as a method for noisy injection, and by applying consistency training with noisy and original examples, the model’s generalization ability is expected to be improved. Following VAT [15] and UDA [13], we use KL divergence to compute the consistency loss.
To further improve the accuracy and robustness, we consider employing adversarial augmentations to find worst-case transformations. Our interest is in performing adversarial augmentation for DA and/or DG without using searching-based methods. Most image transformations are non-differentiable, except for a subset of geometric transformations. Inspired by the spatial transformer networks (STNs) of [16], we propose a differentiable adversarial spatial transformer network for both DA and DG. As we will show in the experimental section, the adversarial STN alone achieves promising results on both DA and DG tasks. When combined with random image transformations, it outperforms the state of the art, which is validated on several DA and DG benchmark datasets.
In this work, apart from the cross-domain generalization ability, robustness is also our concern. This is particularly important for real applications when applying a model to unseen domains, which, however, is largely ignored in current DA and DG literature. We evaluate the robustness of our models on CIFAR-10-C [17], which is a robustness benchmark with 15 types of corruptions algorithmically simulated to mimic real-world corruptions. The experimental results show that our proposed method not only reduces the cross-domain accuracy drop but also improves the robustness of the model.
Our contributions can be summarized as follows:
(1)
We build a unified framework for domain adaptation and domain generalization based on data augmentation and consistency training.
(2)
We propose an end-to-end differentiable adversarial data-augmentation strategy with spatial transformer networks to improve accuracy and robustness.
(3)
We show that our proposed methods outperform state-of-the-art DA and DG methods on multiple object recognition datasets.
(4)
We show that our model is robust to common corruptions and obtained promising results on the CIFAR-10-C robustness benchmark.

2. Related Work

2.1. Domain Adaptation

Modern domain adaption methods usually address domain shifts by learning domain-invariant features. This purpose can be achieved by minimizing a certain measure of domain variance, such as the Maximum Mean Discrepancy (MMD) [1,18] and fuzzy MMD [19], or aligning the second-order statistics of the source and target distributions [20,21].
Another line of work uses adversarial learning to learn features that are discriminative in source space and at the same time invariant with respect to domain shift [2,22,23]. In [2], a gradient reverse layer is proposed to achieve domain-adversarial learning by back-propagation. In [22], a method that combines adversarial learning and MMD is proposed. Ref. [23] outlined a generalized framework for adversarial adaptation and proposed ADDA, which uses an inverted label GAN loss to enforce domain confusion. In [24], a multi-layer adversarial DA method was proposed, in which a feature-level domain classifier is used to learn domain-invariant representation while a prediction-level domain classifier is used to reduce domain discrepancy in the decision layer. In [3], CycleGAN [25]-based unpaired image translation is employed to achieve both feature-level and pixel-level adaptation. In [26], cluster assumption is applied to domain adaptation, and a method called Virtual Adversarial Domain Adaptation (VADA) is proposed. VADA utilizes VAT [15] to enforce classifier consistency within the vicinity of samples. Drop to Adapt [27] also enforces the cluster assumption by leveraging adversarial dropout. In [28], adversarial learning and self-training are combined, in which an adversarial-learned confusion matrix is utilized to correct the pseudo label and then align the feature distribution.
Recently, self-supervised-learning-based domain adaptation was proposed [4]. Self-supervised DA integrates a pretext learning task, such as image rotation prediction in the target domain with the main task in the source domain. Self-supervised DA has shown the capability of learning domain-invariant feature representations [4,29]. In [30], label-consistent contrastive learning is proposed for source-free domain adaptation.

2.2. Domain Generalization

Similar to domain adaptation, existing work usually learns domain-invariant features by minimizing the discrepancy between the given multiple source domains, assuming that the source-domain-invariant feature works well for the unknown target domain. Domain-Invariant Component Analysis (DICA) is proposed in [31] to learn an invariant transformation by minimizing the dissimilarity across domains. In [32], a multi-domain reconstruction auto-encoder is proposed to learn domain-invariant features.
Adversarial learning has also been applied in DG. In [33], an MMD-based adversarial autoencoder (AAE) is proposed to align the distributions among different domains and match the aligned distribution to an arbitrary prior distribution. In [34], correlation alignment is combined with adversarial learning to minimize the domain discrepancy. In [35], optimal transport with Wasserstein distance is adopted in the adversarial learning framework to align the marginal feature distribution over all the source domains.
Some work utilizes the low-rank constraint to achieve domain generalization capability, such as [36,37,38]. Meta-learning has recently been applied to domain generalization, including [39,40,41]. In [42], a method integrated adversarial learning and meta-learning was proposed.
In [5,43], self-supervised DG is proposed by introducing a secondary task to solve a jigsaw puzzle and/or predict image rotation. This auxiliary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the main task. With this simple model, state-of-the-art domain generalization performance can be achieved.

2.3. Data Augmentation

Data augmentation is a widely used trick in training deep neural networks. In visual learning, early data augmentation usually uses a composition of elementary image transformation, including translation, flipping, rotation, stretching, shearing, and adding noise [44]. Recently, more complicated data augmentation approaches have been proposed, such as CutOut [45], Mixup [46], and AugMix [47]. These methods are designed by human experts based on prior knowledge of the task, together with trial and error. To automatically find the best data augmentation method for a specific task, policy-search-based automated data-augmentation approaches have been proposed, such as AutoAugment [9] and Population based augmentation (PBA) [48]. The main drawback of these automated data augmentation approaches is the prohibitively high computational cost. Recently, Ref. [7] improved the computational efficiency of AutoAugment by simultaneously optimizing target-related object and augmentation policy search loss.
Another kind of data augmentation method aims at finding the worst-case transformations and utilizing them to improve the robustness of the learned model. In [49], adversarial data augmentation is employed to generate adversarial examples, which are appended during training to improve the generalization ability. In [8], the authors further proposed searching for worst-case image transformations by random search or evolution-based search. Reinforcement learning is used in [7] to search for adversarial examples, in which RandAugment and worst-case transformation are combined.
Recently, consistency training with data augmentation has been used for improving semi-supervised training [13] and the generalization ability of supervised training [50].
Most related works focus on either domain adaptation or domain generalization, while in this work, we consider designing a general model to address both of them. Domain adversarial training is a widely used technique for DA and DG, and our work does not follow this mainstream methodology but seeks resolution from the perspective of representation learning, e.g., self-supervised learning [4], and consistency learning [29]. For representation learning, data augmentation also plays an important role, as it can reduce model overfitting and improve the generalization ability. However, whether data augmentation can address cross-domain adaptation and generalization problems is still not well explored. In this work, we design a framework to incorporate data augmentation and consistency learning to address both domain adaptation and generalization problems.

3. The Proposed Approach

In this section, we present the proposed method for domain adaptation and generalization in detail.

3.1. Problem Statement

In the domain adaptation and generalization problem, we are given a source domain D s and target domain D t containing samples from two different distributions, P S and P T . Denoting by { x s , y ^ s } D s a labeled source domain sample and by { x t } D t a target domain sample without label, we have x s P S , x t P T , and P S P T . When applying the model trained on the source domain to the target domain, the distribution mismatch can lead to a significant performance drop.
The task of unsupervised domain adaptation is to train a classification model F : x s y s that is able to classify x t to the corresponding label y t given { x s , y ^ s } and { x t } as training data. On the other hand, the task of domain generalization is to train a classification model F : x s y s which is able to classify x t to the corresponding label y t given only { x s , y ^ s } . The difference between these two tasks concerns whether { x t } is involved or not during training. For both domain adaptation and generalization, we assume there are n s source domains where n s 1 and there is one single target domain.
Many works have addressed either domain adaptation or domain generalization. In this work, we propose a unified framework to address both problems. In what follows, we first focus on domain adaptation and introduce the main idea and explain the details of the proposed method. Then, we show how this method can be adapted to domain generalization tasks as well.

3.2. Random Image Transformation with Consistency Training

Inspired by a recent line of work [13,29] in semi-supervised learning that incorporates consistency training with unlabeled examples to enforce the smoothness of the model, we propose using image transformation as a method for noisy injection and apply consistency training with the noisy and original examples. The overview of the proposed random image transformation with consistency training for domain adaptation is depicted in  Figure 1. In this section, we focus on the random image transformation part and leave the adversarial spatial transformer networks in the next section. The main idea can be explained as follows:
(1)
Given an input image x from either the source or target domain, we compute the output distribution p ( y x ) with x and a noisy version p ( y x ˜ ) by applying random image transformation to x ;
(2)
For domain adaptation, we jointly minimize the classification loss with labeled source-domain samples and a divergence metric between the two distributions D ( p ( y x ) p ( y x ˜ ) ) with unlabeled source and target domain samples, where D is a discrepancy measure between two distributions;
(3)
For domain generalization, the procedure is similar to (2) but without using any target domain samples.
Our intuition is that, on one hand, minimizing the consistency loss can enforce the model to be insensitive to the noise and improve the generalization ability; on the other hand, the consistency training gradually transmits label information from labeled source domain examples to unlabeled target domain ones, which improves the domain adaptation ability.
The applied random-image transformations are similar to RandAugment [11]. Table 1 shows the types of image transformations used in this work. The image transformations are categorized into three groups. The first group is the geometric transformations, including Shear, Translation, Rotation, and Flip. The second group is the color-enhancing-based image transformations, e.g., Solarize, Contrast, etc., and the last group includes other transformations, e.g., CutOut and SamplePairing. Each type of image transformation has a corresponding magnitude, which indicates the strength of the transformation. The magnitude can be either a continuous or discrete variable. Following [11], we also normalize the magnitude to a range from 0 to 10, in order to employ a linear scale of magnitude for each type of transformation. In other words, a value of 10 indicates the maximum scale for a given transformation, while 0 means the minimum scale. Note that these image transformations are commonly used as searching policies in recent auto-augmentation literature, such as [7,9,10]. Following [11], we do not use search but instead uniformly sample from the same set of image transformations. Specifically, for each training sample, we uniformly sample N a u g image transformations from Table 1 with the normalized magnitude value of M a u g and then apply them to the image sequentially. N a u g and M a u g are hyper-parameters. Following the practice of [11], we sampled N a u g { 1 , 2 , 3 , 5 , 10 } and M a u g { 3 , 6 , 9 , 12 } . We conducted validation experiments on the PACS dataset and VisDA dataset and found that N a u g = 2 and M a u g = 9 obtain the best results; thus, we keep N a u g = 2 and M a u g = 9 in all our experiments.
Following VAT [15] and UDA [13], we also use the KL divergence to compute the consistency loss. We denote by θ m the parameters of the classification model. The classification loss with labeled source-domain samples is written as the following cross-entropy loss:
L m ( θ m ) = E x s , y ^ s D s [ log p ( y ^ s x s ) ] .
The consistency loss for domain adaptation can be written as
L c ( θ m ) = E x D s D t E x ˜ D s ˜ D t ˜ [ D KL ( p ^ ( y x ) , p ( y x ˜ ) ) ] ,
where p ^ ( y x ) uses a fixed copy of θ m , which means that the gradient is not propagated through p ^ ( y x ) .
As a common underlying assumption in many semi-supervised learning methods, the classifier’s decision boundary should not pass through high-density regions of the marginal data distribution [51]. The conditional entropy minimization loss (EntMin) [52] enforces this by encouraging the classifier to output low-entropy predictions on unlabeled data. EntMin is also combined with VAT in [15] to obtain stronger results. Specifically, the conditional entropy minimization loss is written as
L e ( θ m ) = E x t D t [ p ( y t x t ) log p ( y t x t ) ] .
Following [4,5], we also apply the conditional entropy minimization loss to the unlabeled target domain data to minimize the classifier prediction uncertainty. The full objective of domain adaptation is thus written as follows:
J D A ( θ m ) = min θ m ( L m + λ c L c + λ e L e ) ,
where λ c and λ e are the weight factor for the consistency loss and conditional entropy minimization loss.
For domain generalization, as no target domain data are involved during the training, Equation (2) can be written as:
L c ( θ m ) = E x D s E x ˜ D s ˜ [ D KL ( p ^ ( y x ) , p ( y x ˜ t ) ) ] ,
and the final objective function is the weighted sum of Equation (5) and the classification loss Equation (1):
J D G ( θ m ) = min θ m ( L m + λ c L c ) .

3.3. Adversarial Spatial Transformer Networks

The proposed random image transformation with consistency training is a simple and effective method to reduce domain shift. Recent works show that using worst-case transformations or adversarial augmentation strategies can significantly improve the accuracy and robustness of the model [7,8]. However, most of the image transformations in Section 3.2 are non-differentiable, making it difficult to apply gradient-descent-based methods to obtain optimal transformations. To address this problem, searching algorithms such as reinforcement learning [7] or evolution strategy [8] have been employed in recent works, which, however, are computationally expensive and do not guarantee obtaining the global optima. In this work, we find that a subset of the image transformations in Table 1 are actually differentiable, i.e., the geometric transformations. In this work, we build our adversarial geometric transformation on top of the spatial transformer networks (STN) [16]. Specifically, in this work, we focus on the affine transformations. The STN consists of a localization network, a grid generator, and a differentiable image sampler. The localization network is a convolutional neural network with parameter θ t , which takes as input an image x and regresses the affine transformation parameters ϕ . The grid generator takes as input ϕ and generates the transformed pixel coordinates as follows:
u v = ϕ u ˜ v ˜ 1 = ϕ 11 ϕ 12 ϕ 13 ϕ 21 ϕ 22 ϕ 23 u ˜ v ˜ 1 ,
where ( u ˜ , v ˜ ) are the normalized transformed coordinates in the output image and ( u , v ) are the normalized source coordinates in the input image, i.e., 1 u ˜ , v ˜ , u , v 1 . Finally, the differentiable image sampler takes the set of sampling points from the grid generator, along with the input image x , and produces the sampled output image x ˜ . The bilinear interpolation is used during the sampling process. We can denote STN by T : x x ˜ , a differentiable neural network with parameter θ t , which applies an affine transformation to the input image x .
The goal of the adversarial geometric transformation is to find the worst-case transformations, which is equivalent to maximizing the following objective function:
argmax θ t E x D s D t [ D KL ( p ^ ( y x ) , p ( y T ( x ) ) ] .
The straightforward way to solve the maximization problem in Equation (8) is to apply the gradient reverse trick, i.e., the gradient reversal layer (GRL) in [2], which is popular in domain adversarial training methods. The GRL has no parameters associated with it. During the forward propagation, it acts as an identity transformation. During the back-propagation, however, the GRL takes the gradient from the subsequent level and changes its sign, i.e., multiplies it by 1 , before passing it to the preceding layer. Formally, the forward and back propagation of GRL can be written as R ( x ) = x , d R d x = I . The loss function of the adversarial spatial transformer for domain adaptation can thus be written by
L a d v ( θ t ) = E x D s D t [ D KL ( p ^ ( y x ) , p ( y R ( T ( x ) ) ) ] ,
where T is the spatial transformer network with parameter θ t , and R is the gradient reverse layer (GRL) [1]. For domain generalization, the only difference is that only x D s is involved in Equation (9). With the adversarial spatial transformer network, the final objective function for domain adaptation is written as
J D A ( θ m , θ t ) = min θ m max θ t ( L m + λ c L c + λ e L e + λ t L a d v ) ,
and the final objective function for domain generalization is written as
J D G ( θ m , θ t ) = min θ m max θ t ( L m + λ c L c + λ t L a d v ) .

4. Experiments

In this section, we conduct experiments to evaluate the proposed method and compare the results with the state-of-the-art domain adaptation and generalization methods.

4.1. Datasets

Our method was evaluated on the following popular domain adaptation and generalization datasets:
PACS [53] is a standard dataset for DG. It contains 9991 images collected from Sketchy, Caltech256, TU-Berlin, and Google Images. It has 4 domains (Photo, Art Paintings, Cartoon, and Sketches), and each domain consists of 7 object categories. Following [5], we evaluated both domain generalization and multi-source domain adaptation on this dataset.
ImageCLEF-DA [54] is a benchmark dataset in the domain adaptation community for the ImageCLEF 2014 domain adaptation challenge. It consists of three domains, including Caltech-256 (C), ImageNet ILSVRC 2012 (I), and Pascal VOC 2012 (P). Each domain consists of 12 common classes. Six domain adaptation tasks are evaluated on ImageCLEF: I P , P I , I C , C I , C P and P C . There are 600 images in each domain and 50 images in each category.
Office-Home [55] was used for evaluating both domain adaptation and generalization. It contains 4 domains, and each domain consists of images from 65 categories of everyday objects. The 4 domains are Art (A), Clipart (C), Product (P), and Real-World (R). The Clipart domain is formed with clipart images. The Art domain consists of artistic images in the form of paintings, sketches, ornamentation, etc. The Real-World domain’s images are captured by a regular camera and the Product domain’s images have no background. The total number of images is about 15,500.
VLCS [56] was used for evaluating domain generalization. It contains images of 5 object categories shared by 4 separate domains: PASCAL VOC 2007, LabelMe, Caltech and Sun datasets. Unlike Office-Home and PACS, which are related in terms of domain types, VLCS offers different challenges because it combines object categories from Caltech with scene images of the other domains.
VisDA http://ai.bu.edu/visda-2017/ (accessed on 10 May 2020) is a simulation-to-real domain-adaptation dataset that has over 280 K images across 12 classes. The synthetic domain contains renderings of 3D models from different angles and with different lighting conditions, and the real domain contains nature images.
To investigate the robustness of the proposed model, we also evaluated it on popular robustness benchmarks, including the following.
CIFAR-10.1 [57] is a new test set of CIFAR-10 with 2000 images and the exact same classes and image dimensionality. Its creation follows the creation process of the original CIFAR-10 paper as closely as possible. The purpose of this dataset is to investigate the distribution shifts present between the two test sets, and the effect on object recognition.
CIFAR-10-C [17] is a robustness benchmark where 15 types of corruption are algorithmically simulated to mimic real-world corruption as much as possible on copies of the CIFAR-10 [58] test set. The 15 types of corruption are from four broad categories: noise, blur, weather, and digital. Each corruption type comes in five levels of severity, with level 5 being the most severe. In this work, we evaluated the models with the level 5 severity.

4.2. Experimental Setting

We implemented the proposed method using the PyTorch framework on a single RTX 2080 Ti GPU with 11 GB memory. The Alexnet [59], Resnet-18, and Resnet-50 [60] architectures were used as base networks and initialized with ImageNet [61] pretrained weights.
For training the model, we used an SGD solver with an initial learning rate of 0.001 . We trained the model for 60 epochs and decayed the learning rate to 0.0001 after 80 % of the training epochs. For training baseline models, we used simple data-augmentation protocols by random cropping, horizontal flipping, and color jittering.
We followed the standard protocol for unsupervised domain adaptation [2,62], where all labeled source domain examples and all unlabeled target domain examples were used for adaptation tasks. We also followed the standard protocol for domain-generalization transfer tasks as per [5], where the target domain examples were unavailable in the training phase. We set three different random seeds and ran each experiment three times. The final result is the average over the three repetitions.
We compared our proposed method with state-of-the-art DA and DG methods. The descriptions of the compared methods are shown in Table 2. In the following, we use Deep All to denote the baseline model trained with all available source-domain examples when all the introduced domain adaptive conditions are disabled. For the compared methods in Table 2, we used the results reported from the original papers if the protocol is the same.

4.3. Experimental Results

4.3.1. Unsupervised Domain Adaptation

The multi-source domain adaptation-results on PACS are reported in Table 3. We follow the settings in [4,5] and trained our model considering three domains as the source datasets and the remaining one as the target. RotC is an improved version of Rot, which applies consistency loss with the simplest image rotation transformations [29]. We use the open source code of [65] to produce the results of CDAN and CDAN+E and the open source code of [66] to produce the results of MDD. Our proposed approach outperforms all baseline methods on all transfer tasks. The last column shows the average accuracy on the four tasks. Our proposed approach outperforms state-of-the-art CDAN+E [65] by 4.7 percentage points and MDD by 1.4 percentage points.
To investigate the improvement from data augmentation, we add the same type of data augmentation as ours on Deep All, DANN, CDAN, CDAN+E, and MDD, which are denoted by Deep All (Aug), DANN (Aug), CDAN (Aug), CDAN+E (Aug), and MDD (Aug), respectively. From these results, we can see that data augmentation can obtain an improvement of 1.2 to 2.6 percentage points for existing domain-adaptation methods. Even with the same type of data augmentation, our proposed method still outperforms these baselines.
The results on Office-Home dataset are reported in Table 4. On the Office-Home dataset, we conducted 12 transfer tasks of four domains in the context of single-source domain adaptation. We achieved state-of-the-art performance on 8 out of 12 transfer tasks. It is noted that although Office-Home and PACS are related in terms of domain types, the number of total categories in Office-Home and PACS are 65 and 7, respectively. From the results, we can see that the proposed method scales when the number of categories changes from 7 to 65. The average accuracy achieved by our proposed method is 67.6 % , which outperforms all the compared methods.
The results on the ImageCLEF-DA dataset are reported in Table 5. As the three domains in ImageCLEF-DA are of equal size and balanced in each category and are visually more similar, there is little room for improvement in this dataset. Even so, our method still outperforms the comparison methods on four out of six transfer tasks. Our method achieves 88.2 % average accuracy, outperforming the latest methods, including CDAN+E [65], RotC [29] and MLADA [24].
The proposed method also obtains strong results on VisDA as reported in Table 6. It outperforms CDAN+E by 2.6 percentage points.
It is important to understand the improvement of the consistency loss. Because RotC is the combination of simple image rotation transformation and consistency loss, without using complex data augmentation, we can better understand how much of the improvement is from the consistency loss. Comparing to Rot, which does not use consistency loss, we can see from the above DA experiments that RotC obtains an improvemnt of about 0.7 to 3.5 percentage points thanks to the consistency loss.

4.3.2. Domain Generalization

In the context of multi-source domain generalization, we conducted four transfer tasks on the PACS dataset. We compared the performance of our proposed method against several recent domain-generalization methods. We evaluated the method with both Alexnet and Resnet-18 and report the results in Table 7 and Table 8. From the results, we can observe that our proposed method achieves state-of-the-art domain generalization performance with both backbone architectures. With Alexnet, our method outperforms the comparison methods on all 4 transfer tasks. The average accuracy of our method outperforms the prior best method WADG by around 1.7 percentage points, setting a new state-of-the-art performance. With Resnet-18, the average accuracy of our method is 82.73 % , also outperforming the existing latest ones.
As consistency loss is not mandatory for DG, we replaced consistency loss with cross-entropy loss and ran our methods. The results are denoted by Ours w/o consis. We can see that the model trained with cross-entropy loss obtains similar accuracy to ours with consistency loss. However, the consistency loss is required for DA because of the unlabeled target domain samples. To keep a unified framework for both DA and DG problems, we used consistency loss for DG in this work.
To investigate the improvement of pure data augmentation without consistency, we ran JiGen and Deep All with the same type of data augmentation as ours, denoted by JiGen (Aug) and Deep All (Aug). We can see that using pure augmentation, Deep All can obtain an improvement of around 0.8 to 1.2 percentage points, and JiGen can obtain an improvement of around 1.2 to 1.6 percentage points. Even so, our proposed method still outperforms these baselines.
We also conducted experiments on Office-Home and VLCS datasets for multi-source domain generalization. Compared to PACS, these two datasets are more difficult, and most recent works have only obtained small accuracy gains with respect to the Deep All baselines. The results on Office-Home and VLCS dataset are reported in Table 9 and Table 10, respectively. Our proposed method outperforms the compared methods on the four transfer tasks on the Office-Home dataset, and the results tested on the VLCS dataset show that our method achieves the best or close to the best performance on the four tasks, outperforming the recently proposed ones on average. It is noted that our baseline Deep All has relatively higher accuracy than other baselines. This is because we also add data augmentations such as random crop, horizontal flipping, and color jittering when training Deep All models. In this case, it is fairer to compare with the proposed method, which incorporates various data augmentation operations.

4.3.3. Robustness

Apart from domain adaptation and generalization, we are also interested in the robustness of the learned model. In this part, we evaluate the proposed method on robustness benchmarks CIFAR-10.1 and CIFAR-10-C. We trained on the standard CIFAR-10 training set and tested on various corruption datasets, i.e., in a single-source domain-generalization setting. Figure 2 shows the testing error on different datasets. We evaluated different image transform strategies and also compare them with recently proposed methods in [74], i.e., JT and TTT. Following [74], we used the same architecture and hyper-parameters across all experiments.
The method denoted by baseline refers to the plain Resnet model, which is equivalent to Deep All in the DG setting. JT and TTT are the joint training and testing time training in [74], respectively. We denote by rnd-all the random image transformation including geometric and color-based transformations, adv-stn the proposed adversarial STN without random image transforms, and adv-stn-color the adversarial STN combined with random color-based transformations.
On the left is the standard CIFAR-10 testing dataset, where we can see that all the compared methods obtained similar accuracies. On CIFAR10.1, the testing errors of all these methods increase simultaneously, but there is no significant gap between them. On the CIFAR-10-C corruption data sets, the performances of these methods vary a lot. Our proposed methods show improved accuracies compared to the baseline. adv-stn-color shows better performance than its variants and also outperforms JT and TTT. It can also be seen that adv-stn even outperforms rnd-all in most cases, although it only applies geometric transformations, which indicates the effectiveness of the proposed adversarial spatial transformations for improving robustness.

4.4. Ablation Studies and Analysis

Below, we focus on the PACS DG and DA setting for ablation analyses of the proposed method.

4.4.1. Ablation Study on Image Transformation Strategies

In this part, we conducted ablation studies on adversarial and random image transformations. Table 11 shows the ablation studies of domain adaptation on PACS with different image transformation strategies, and Table 12 shows the domain-generalization results. rnd-color and rnd-geo are subsets of rnd-all, where rnd-color refers to color-based transformations and rnd-geo refers to geometric transformations. Please see Table 1 for details of each subset of transformations. For the DA task, when comparing individual transformation strategies, rnd-color obtained the best accuracy, and adv-stn outperformed rnd-geo. The combination of rnd-color + adv-stn also outperformed rnd-color + rnd-geo. However, in this experiment, adv-stn did not further improve rnd-color, which might be due to the limited room for improvement in the baseline method.
In the DG experiment, we obtained the similar conclusion that adv-stn outperforms rnd-geo. As the baseline accuracy of DG is far from saturated compared to DA, we can see that adv-stn further improved rnd-color and the combination of rnd-color + adv-stn obtained the best accuracy.

4.4.2. Ablation Study on the Hyperparameters Setting

In this part, we conducted ablation studies on the hyperparameters setting. The final objective functions of our proposed method for domain adaptation (10) and domain generalization (11) are a weighted summation of several items, with the weighting factors as the hyperparameters. Since the conditional entropy minimization loss is widely used in domain adaptation, we fixed the weight λ e = 0.1 and conducted a grid test with different settings of λ c and λ t , which are shared by domain generalization. We tested ten different values, which are logarithmically spaced between 10 2 and 10 for λ c and λ t . For each setting, we ran with three different random seeds and calculated the mean accuracy. The results of multi-source domain adaptation and domain generalization, which take photo, cartoon, and sketch as the source domains and art_painting as the target domain, are reported in Figure 3. Resnet-18 was used as the base network.
From the figures, we can see that when both λ c and λ t are not too large, the accuracies are relatively stable, validating the insensitiveness of our proposed method to hyperparameters. However, when λ c and λ t grow too large, the performance decreases, especially with a small λ c and a large λ t . The reason may be that overwhelmingly large weights for consistency loss and adversarial spatial transformer loss over the main classification loss make the learned feature less discriminative for the main classification task, therefore resulting in lower accuracy. Moreover, too large λ t may lead to excessive emphasis on the extreme geometric distortions, which may be harmful to the general cross-domain performance.

4.4.3. Visualization of Learned Deep Features

To better understand the learned domain-invariant feature representations, we use t-SNE [75] to conduct an embedding visualization. We conducted experiments on the transfer task of photo , cartoon , sketch art _ painting with both DA and DG settings and visualized the feature embeddings. Figure 4 shows the visualization on the PACS DA setting, and Figure 5 shows the visualization on the PACS DG setting. In both figures, we visualize category alignment as well as domain alignment. We also compare to the baseline Deep All, which does not apply any adaptation.
From the visualization of the embeddings, we can see that the clusters created by our model not only separate the categories but also mix the domains. The visualization from the DG model suggests that our proposed method is able to learn feature representations generalizable to unseen domains. It also implies that the proposed method can effectively learn domain-invariant representation with unlabeled target domain examples.

4.4.4. Visualization of Adversarial Examples

To visually examine what the adversarial spatial transformations learned by adv-stn are, we plot the transformed examples during training in Figure 6. In the first row, we show the original images with simple random horizontal flipping and jittering augmentation. The second and third rows show images with rnd-all and adv-stn-color, respectively. From the figure, we can see that the proposed adversarial STN does find more difficult image transformations than random augmentation. Training with these adversarial examples greatly improves the generalization ability and robustness of the model.

5. Conclusions

In this work, we proposed a unified framework for addressing both domain adaptation and generalization problems. Our domain adaptation and generalization methods are built upon random image transformation and consistency training. This simple strategy can be used to obtain promising DA and DG performance on multiple benchmarks. To further improve its performance, we proposed a novel adversarial spatial transformer network that is differentiable and able to find the worst-case image transformation to improve the generalizability and robustness of the model. Experimental results on multiple object recognition DA and DG benchmarks verified the effectiveness of the proposed methods. Additional experiments tested on CIFAR-10.1 and CIFAR-10-C also validated the robustness of the proposed method.

Author Contributions

Conceptualization, L.X. and J.X.; methodology, L.X. and J.X.; software, J.X. and L.X.; validation, D.Z. and E.S.; formal analysis, Q.Z.; investigation, E.S.; data curation, D.Z. and Q.Z.; writing—original draft preparation, L.X. and J.X.; writing—review and editing, B.D.; project administration, B.D.; funding acquisition, L.X. and B.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 61803380 and 61790565.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the study are publicly available. See Section 4.1 for details.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
  2. Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. In Domain Adaptation in Computer Vision Applications; Springer: Cham, Switzerland, 2017. [Google Scholar]
  3. Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K.; Efros, A.A.; Darrell, T. CyCADA: Cycle Consistent Adversarial Domain Adaptation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
  4. Xu, J.; Xiao, L.; Lopez, A.M. Self-Supervised Domain Adaptation for Computer Vision Tasks. IEEE Access 2019, 7, 156694–156706. [Google Scholar] [CrossRef]
  5. Carlucci, F.M.; D’Innocente, A.; Bucci, S.; Caputo, B.; Tommasi, T. Domain Generalization by Solving Jigsaw Puzzles. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  6. Ranaldi, L.; Pucci, G. Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Appl. Sci. 2023, 13, 677. [Google Scholar] [CrossRef]
  7. Zhang, X.; Wang, Q.; Zhang, J.; Zhong, Z. Adversarial AutoAugment. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  8. Volpi, R.; Murino, V. Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  9. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. Autoaugment: Learning augmentation policies from data. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  10. Lim, S.; Kim, I.; Kim, T.; Kim, C.; Kim, S. Fast AutoAugment. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  11. Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical data augmentation with no separate search. In Proceedings of the Advances in Neural Information Processing Systems, virtual, 6–12 December 2020. [Google Scholar]
  12. Sajjadi, M.; Javanmardi, M.; Tasdizen, T. Regularization with Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  13. Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised Data Augmentation for Consistency Training. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
  14. Suzuki, T.; Sato, I. Adversarial Transformations for Semi-Supervised Learning. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020. [Google Scholar]
  15. Miyato, T.; Maeda, S.I.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1979–1993. [Google Scholar] [CrossRef] [PubMed]
  16. Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
  17. Hendrycks, D.; Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  18. Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep Domain Confusion: Maximizing for Domain Invariance. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
  19. Zhao, F.; Liu, W.; Wen, C. A New Method of Image Classification Based on Domain Adaptation. Sensors 2022, 22, 1315. [Google Scholar] [CrossRef] [PubMed]
  20. Sun, B.; Feng, J.; Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
  21. Sun, B.; Saenko, K. Deep CORAL: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016. [Google Scholar]
  22. Sun, H.; Chen, X.; Wang, L.; Liang, D.; Liu, N.; Zhou, H. C2DAN: An Improved Deep Adaptation Network with Domain Confusion and Classifier Adaptation. Sensors 2020, 20, 3606. [Google Scholar] [CrossRef] [PubMed]
  23. Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial Discriminative Domain Adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  24. Fang, Y.; Xiao, Z.; Zhang, W. Multi-layer adversarial domain adaptation with feature joint distribution constraint. Neurocomputing 2021, 463, 298–308. [Google Scholar] [CrossRef]
  25. Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  26. Shu, R.; Bui, H.H.; Narui, H.; Ermon, S. A DIRT-T Approach to Unsupervised Domain Adaptation. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  27. Lee, S.; Kim, D.; Kim, N.; Jeong, S.G. Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  28. Chen, M.; Zhao, S.; Liu, H.; Cai, D. Adversarial-Learned Loss for Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3521–3528. [Google Scholar]
  29. Xiao, L.; Xu, J.; Zhao, D.; Wang, Z.; Wang, L.; Nie, Y.; Dai, B. Self-Supervised Domain Adaptation with Consistency Training. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021. [Google Scholar]
  30. Zhao, X.; Stanislawski, R.; Gardoni, P.; Sulowicz, M.; Glowacz, A.; Krolczyk, G.; Li, Z. Adaptive Contrastive Learning with Label Consistency for Source Data Free Unsupervised Domain Adaptation. Sensors 2022, 22, 4238. [Google Scholar] [CrossRef]
  31. Muandet, K.; Balduzzi, D.; Schölkopf, B. Domain Generalization via Invariant Feature Representation. In Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. 10–18. [Google Scholar]
  32. Ghifary, M.; Kleijn, W.B.; Zhang, M.; Balduzzi, D. Domain generalization for object recognition with multi-task autoencoders. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
  33. Li, H.; Pan, S.J.; Wang, S.; Kot, A.C. Domain generalization with adversarial feature learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  34. Rahman, M.M.; Fookes, C.; Baktashmotlagh, M.; Sridharan, S. Correlation-aware Adversarial Domain Adaptation and Generalization. Pattern Recognit. 2019, 100, 107124. [Google Scholar] [CrossRef]
  35. Zhou, F.; Jiang, Z.; Shui, C.; Wang, B.; Chaib-draa, B. Domain generalization via optimal transport with metric similarity learning. Neurocomputing 2021, 456, 469–480. [Google Scholar] [CrossRef]
  36. Xu, Z.; Li, W.; Niu, L.; Xu, D. Exploiting Low-Rank Structure from Latent Domains for Domain Generalization. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
  37. Li, W.; Xu, Z.; Xu, D.; Dai, D.; Gool, L.V. Domain Generalization and Adaptation using Low Rank Exemplar SVMs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1114–1127. [Google Scholar] [CrossRef]
  38. Ding, Z.; Fu, Y. Deep Domain Generalization With Structured Low-Rank Constraint. IEEE Trans. Image Process. 2018, 27, 304–313. [Google Scholar] [CrossRef]
  39. Balaji, Y.; Sankaranarayanan, S.; Chellappa, R. Metareg: Towards domain generalization using meta-regularization. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
  40. Li, D.; Yang, Y.; Song, Y.Z.; Hospedales, T.M. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  41. Dou, Q.; Castro, D.C.; Kamnitsas, K.; Glocker, B. Domain Generalization via Model-Agnostic Learning of Semantic Features. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver BC, Canada, 8–14 December 2019. [Google Scholar]
  42. Chen, K.; Zhuang, D.; Chang, J.M. Discriminative adversarial domain generalization with meta-learning based cross-domain validation. Neurocomputing 2022, 467, 418–426. [Google Scholar] [CrossRef]
  43. Bucci, S.; D’Innocente, A.; Liao, Y.; Carlucci, F.M.; Caputo, B.; Tommasi, T. Self-Supervised Learning Across Domains. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5516–5528. [Google Scholar] [CrossRef] [PubMed]
  44. Dosovitskiy, A.; Springenberg, J.T.; Riedmiller, M.; Brox, T. Discriminative unsupervised feature learning with convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  45. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
  46. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  47. Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  48. Ho, D.; Liang, E.; Stoica, I.; Abbeel, P.; Chen, X. Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
  49. Volpi, R.; Namkoong, H.; Sener, O.; Duchi, J.C.; Murino, V.; Savarese, S. Generalizing to unseen domains via adversarial data augmentation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
  50. Chen, W.; Tian, L.; Fan, L.; Wang, Y. Augmentation Invariant Training. In Proceedings of the International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
  51. Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. MixMatch: A Holistic Approach to Semi-Supervised Learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  52. Grandvalet, Y.; Bengio, Y. Semi-supervised learning by entropy minimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004. [Google Scholar]
  53. Li, D.; Yang, Y.; Song, Y.Z.; Hospedales, T.M. Deeper, broader and artier domain generalization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  54. Saenko, K.; Hulis, B.; Fritz, M.; Darrel, T. Adapting visual category models to new domains. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010. [Google Scholar]
  55. Venkateswara, H.; Eusebio, J.; Chakraborty, S.; Panchanathan, S. Deep Hashing Network for Unsupervised Domain Adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  56. Torralba, A.; Efros, A.A. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011. [Google Scholar]
  57. Recht, B.; Roelofs, R.; Schmidt, L.; Shankar, V. Do cifar-10 classifiers generalize to cifar-10? arXiv 2018, arXiv:1806.00451. [Google Scholar]
  58. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; Toronto, ON, Canada, 2009. [Google Scholar]
  59. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
  60. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  61. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
  62. Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  63. Carlucci, F.M.; Porzi, L.; Caputo, B.; Ricci, E.; Bulo, S.R. Just dial: Domain alignment layers for unsupervised domain adaptation. In Proceedings of the International Conference on Image Analysis and Processing, Catania, Italy, 11–15 September 2017. [Google Scholar]
  64. Mancini, M.; Porzi, L.; RotaBulo, S.; Caputo, B.; Ricci, E. Boosting domain adaptation by discovering latent domains. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  65. Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional Adversarial Domain Adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
  66. Zhang, Y.; Liu, T.; Long, M.; Jordan, M. Bridging Theory and Algorithm for Domain Adaptation. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
  67. Sun, J.; Wang, Z.; Wang, W.; Li, H.; Sun, F. Domain adaptation with geometrical preservation and distribution alignment. Neurocomputing 2021, 454, 152–167. [Google Scholar] [CrossRef]
  68. Motiian, S.; Piccirilli, M.; Adjeroh, D.A.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  69. Li, Y.; Tian, X.; Gong, M.; Liu, Y.; Liu, T.; Zhang, K.; Tao, D. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
  70. D’Innocente, A.; Caputo, B. Domain generalization with domain-specific aggregation modules. In Proceedings of the 40th German Conference on Pattern Recognition (GCPR), Stuttgart, Germany, 9–12 October 2018. [Google Scholar]
  71. Matsuura, T.; Harada, T. Domain Generalization Using a Mixture of Multiple Latent Domains. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11749–11756. [Google Scholar]
  72. Zhao, S.; Gong, M.; Liu, T.; Fu, H.; Tao, D. Domain Generalization via Entropy Regularization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 16096–16107. [Google Scholar]
  73. Sankaranarayanan, S.; Balaji, Y.; Castillo, C.D.; Chellappa, R. Generate to Adapt: Aligning Domains Using Generative Adversarial Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  74. Sun, Y.; Wang, X.; Liu, Z.; Miller, J.; Efros, A.A.; Hardt, M. Test-Time Training for Out-of-Distribution Generalization. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
  75. van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Figure 1. Overview of our proposed model. We propose using random image transformations and adversarial spatial transformer networks (STN) to achieve domain adaptation and generalization (without the dashed line bounding box).
Figure 1. Overview of our proposed model. We propose using random image transformations and adversarial spatial transformer networks (STN) to achieve domain adaptation and generalization (without the dashed line bounding box).
Sensors 23 05273 g001
Figure 2. Test error (%) on CIFAR-10, CIFAR10.1 and CIFAR-10-C (level 5). Best viewed in color.
Figure 2. Test error (%) on CIFAR-10, CIFAR10.1 and CIFAR-10-C (level 5). Best viewed in color.
Sensors 23 05273 g002
Figure 3. Accuracies with different hyperparameter settings on PACS photo, cartoon, sketch art_painting task (Resnet-18). (a) Multi-source Domain adaptation with fixed λ e . (b) Domain generalization. Best viewed in color.
Figure 3. Accuracies with different hyperparameter settings on PACS photo, cartoon, sketch art_painting task (Resnet-18). (a) Multi-source Domain adaptation with fixed λ e . (b) Domain generalization. Best viewed in color.
Sensors 23 05273 g003
Figure 4. The t-SNE visualization on the PACS DA setting: (a) class visualization of Deep All, (b) domain visualization of Deep All, (c) class visualization of Ours, (d) domain visualization of Ours.
Figure 4. The t-SNE visualization on the PACS DA setting: (a) class visualization of Deep All, (b) domain visualization of Deep All, (c) class visualization of Ours, (d) domain visualization of Ours.
Sensors 23 05273 g004
Figure 5. The t-SNE visualization on the PACS DG setting: (a) class visualization of Deep All, (b) domain visualization of Deep All, (c) class visualization of Ours, (d) domain visualization of Ours.
Figure 5. The t-SNE visualization on the PACS DG setting: (a) class visualization of Deep All, (b) domain visualization of Deep All, (c) class visualization of Ours, (d) domain visualization of Ours.
Sensors 23 05273 g005
Figure 6. Visualization of the transformed images during the PACS DG training. First row: original image with random horizontal flipping and jittering; second row: original image with the proposed random augmentation; third row: the original image with the proposed adversarial spatial transformation combined with random color-based transformations.
Figure 6. Visualization of the transformed images during the PACS DG training. First row: original image with random horizontal flipping and jittering; second row: original image with the proposed random augmentation; third row: the original image with the proposed adversarial spatial transformation combined with random color-based transformations.
Sensors 23 05273 g006
Table 1. Image transform operations. Some operations have discrete magnitude parameters, while others have no or continuous magnitude parameters.
Table 1. Image transform operations. Some operations have discrete magnitude parameters, while others have no or continuous magnitude parameters.
NameMagnitude TypeMagnitude Range
Geometric transformationsShearX continuous[0, 0.3]
ShearYcontinuous[0, 0.3]
TranslateXcontinuous[0, 100]
TranslateYcontinuous[0, 100]
Rotatecontinuous[0, 30]
Flipnonenone
Color-based transformationsSolarizediscrete[0, 255]
Posterizediscrete[0, 4]
Invertnonenone
Contrastcontinuous[0.1, 1.9]
Colorcontinuous[0.1, 1.9]
Brightnesscontinuous[0.1, 1.9]
Sharpnesscontinuous[0.1, 1.9]
AutoContrastnonenone
Equalizenonenone
Other transformationsCutOutdiscrete[0, 40]
SamplePairingcontinuous[0, 0.4]
Table 2. The compared state-of-the-art methods on domain adaptation (DA) and domain generalization (DG) tasks. The column Year shows the year the method was published.
Table 2. The compared state-of-the-art methods on domain adaptation (DA) and domain generalization (DG) tasks. The column Year shows the year the method was published.
MethodsTaskYearDescription
DANN [2]DA2015Domain adversarial training.
DAN [1]DA2015Deep adaptation network.
ADDA [23]DA2017Adversarial discriminative domain adaptation.
JAN [62]DA2017Joint adaptation network.
Dial [63]DA2017Domain alignment layers.
DDiscovery [64]DA2018Domain discovery.
CDAN [65]DA2018Conditional domain adversarial training.
MDD [66]DA2019Adversarial training with margin disparity discrepancy.
Rot [4]DA2019Self-supervised learning by rotation prediction.
RotC [29]DA2020Self-supervised learning with consistency training.
ALDA [28]DA2020Adversarial-learned loss for domain adaptation.
MLADA [24]DA2021Multi-layer adversarial domain adaptation.
GPDA [67]DA2021Geometrical preservation and distribution alignment.
CCSA [68]DA and DG2017Embedding subspace learning.
JiGen [5]DA and DG2019Self-supervised learning by solving jigsaw puzzle.
JigRot [43]DA and DG2021Self-supervised learning by combining jigsaw and rotation.
TF [53]DG2017Low-rank parametrized network.
SLRC [38]DG2017Low rank constraint.
CIDDG [69]DG2018Conditional invariant deep domain generalization.
MMD-AAE [33]DG2018Adversarial auto-encoders.
D-SAM [70]DG2018Domain-specific aggregation modules.
MLDG [40]DG2018Meta learning approach.
MetaReg [39]DG2018Meta learning approach.
MMLD [71]DG2020Mixture of Multiple Latent Domains.
ER [72]DG2020Domain generalization via entropy regularization.
DADG [42]DG2021Discriminative adversarial domain generalization.
WADG [35]DG2021Wasserstein adversarial domain generalization.
Table 3. Multi-source domain adaptation results on PACS (Reset-18). Each column title indicates the name of the domain used as the target. We use bold font to highlight the best results.
Table 3. Multi-source domain adaptation results on PACS (Reset-18). Each column title indicates the name of the domain used as the target. We use bold font to highlight the best results.
PACS-DAArt_Paint.CartoonSketchesPhotoAvg.
[64]Deep All74.7072.4060.1092.9075.03
Dial87.3085.5066.8097.0084.15
DDiscovery87.7086.9069.6097.0085.30
[5]Deep All77.8574.8667.7495.7379.05
JiGen84.8881.0779.0597.9685.74
[4]Deep All74.7072.4060.1092.9075.00
Rot88.7086.4074.9098.0087.00
[29]Deep All74.7072.4060.1092.9075.00
RotC90.3087.4075.1097.9087.70
[43]Deep All77.8374.2665.8195.7178.40
JigRot89.6782.8783.9398.1788.66
[65]DANN82.9183.8369.5096.2983.13
DANN (Aug)89.0183.0678.5497.2586.96
[65]CDAN85.7088.1073.1097.2086.00
CDAN+E87.4089.4075.3097.8087.50
CDAN (Aug)90.6785.9680.5097.4388.64
CDAN+E (Aug)90.2885.4181.3798.0888.78
[66]MDD89.6088.9987.3597.7890.92
MDD (Aug)90.2886.2685.7297.5489.95
Deep All77.2672.6469.0595.4178.59
Deep All (Aug)80.0374.4967.8595.2779.41
Ours92.5691.4487.0898.0492.28
Table 4. Accuracy (%) on Office-Home for unsupervised domain adaptation (Resnet-50). The bold font highlights the best domain-adaptation results. A C indicates that A (Art) is the source domain and C (Clipart) is the target domain.
Table 4. Accuracy (%) on Office-Home for unsupervised domain adaptation (Resnet-50). The bold font highlights the best domain-adaptation results. A C indicates that A (Art) is the source domain and C (Clipart) is the target domain.
Office-Home A C A P A R C A C P C R P A P C P R R A R C R P Avg.
ResNet-50 [60]34.950.058.037.441.946.238.531.260.453.941.259.946.1
DAN [1]43.657.067.945.856.560.444.043.667.763.151.574.356.3
DANN [2]45.659.370.147.058.560.946.143.768.563.251.876.857.6
JAN [62]45.961.268.950.459.761.045.843.470.363.952.476.858.3
GPDA [67]52.973.477.152.966.165.652.944.976.165.649.779.263.0
CDAN [65]49.069.374.554.466.068.455.648.375.968.455.480.563.8
Rot [4]50.467.874.658.766.767.455.752.477.571.059.681.265.3
CDAN+E [65]50.770.676.057.670.070.057.450.977.370.956.781.665.8
ALDA [28]53.770.176.460.272.671.556.851.977.170.256.382.166.6
RotC [29]51.769.075.460.470.370.757.753.378.672.259.981.766.7
Ours55.169.074.562.566.769.862.256.077.773.561.982.267.6
Table 5. Accuracy (%) on ImageCLEF-DA for unsupervised domain adaptation (Resnet-50). The bold font highlights the best domain adaptation results. I P indicates that ImageNet ILSVRC 2012 is the source domain and Pascal VOC 2012 is the target domain.
Table 5. Accuracy (%) on ImageCLEF-DA for unsupervised domain adaptation (Resnet-50). The bold font highlights the best domain adaptation results. I P indicates that ImageNet ILSVRC 2012 is the source domain and Pascal VOC 2012 is the target domain.
ImageCLEF-DA I P P I I C C I C P P C Avg.
ResNet-50 [60]74.883.991.578.065.591.280.7
DAN [1]74.582.292.886.369.289.882.5
Rot [4]77.991.695.686.970.594.884.2
DANN [2]75.086.096.287.074.391.585.0
JAN [62]76.888.094.789.574.291.785.8
CDAN [65]76.790.697.090.574.593.587.1
MLADA [24]78.291.295.590.876.092.287.3
CDAN+E [65]77.790.797.791.374.294.387.7
RotC [29]78.692.596.188.973.995.987.7
Ours78.192.796.591.674.995.988.2
Table 6. Accuracy (%) on VisDA (Synthetic → Real) for unsupervised domain adaptation (ResNet-50).
Table 6. Accuracy (%) on VisDA (Synthetic → Real) for unsupervised domain adaptation (ResNet-50).
MethodJAN [62]GTA [73]CDAN [65]CDAN+E [65]Ours
Synthetic → Real61.669.566.870.072.6
Table 7. Domain generalization results on PACS (Alexnet). For details about the meaning of columns and the use of bold fonts, see Table 3.
Table 7. Domain generalization results on PACS (Alexnet). For details about the meaning of columns and the use of bold fonts, see Table 3.
PACS-DGArt_Paint.CartoonSketchesPhotoAvg.
[53]Deep All63.3063.1354.0787.7067.05
TF62.8666.9757.5189.5069.21
[69]Deep All57.5567.0458.5277.9865.27
DeepC62.3069.5864.4580.7269.26
CIDDG62.7069.7364.4578.6568.88
[40]Deep All64.9164.2853.0886.6767.24
MLDG66.2366.8858.9688.0070.01
[70]Deep All64.4472.0758.0787.5070.52
D-SAM63.8770.7064.6685.5571.20
[42]Deep All63.1266.1660.2788.6569.55
DADG66.2170.2862.1889.7672.11
[39]Deep All67.2166.1255.3288.4769.28
MetaReg69.8270.3559.2691.0772.63
[5]Deep All66.6869.4160.0289.9871.52
JiGen67.6371.7165.1889.0073.38
JiGen (Aug)71.5369.5068.0691.0875.04
[43]Deep All66.5069.6561.4289.6871.81
JigRot69.7071.0066.0089.6074.08
[71]Deep All68.0970.2361.8088.8672.25
MMLD69.2772.8366.4488.9874.38
[72]Deep All68.3570.1490.8364.9873.57
ER71.3470.2989.9271.1575.67
[35]Deep All63.3063.1354.0787.7067.05
WADG70.2172.5170.3289.8175.71
Deep All68.2674.5263.6590.7874.30
Deep All (Aug)73.7370.0965.7992.2275.45
Ours w/o consis.73.4471.4273.9189.7077.12
Ours74.0272.2372.3691.1677.44
Table 8. Domain generalization results on PACS (Resnet-18). For details about the meaning of columns and the use of bold fonts, see Table 3.
Table 8. Domain generalization results on PACS (Resnet-18). For details about the meaning of columns and the use of bold fonts, see Table 3.
PACS-DGArt_Paint.CartoonSketchesPhotoAvg.
[42]Deep All75.6072.3068.1093.0677.27
DADG79.8976.2570.5194.8680.38
[70]Deep All77.8775.8969.2795.1979.55
D-SAM77.3372.4377.8395.3080.72
[5]Deep All77.8574.8667.7495.7379.05
JiGen79.4275.2571.3596.0380.51
JiGen (Aug)79.4471.5070.8695.3379.28
[72]Deep All78.9375.0296.6070.4880.25
ER80.7076.4096.6571.7781.38
[43]Deep All77.8374.2665.8195.7178.40
JigRot81.0773.9774.6795.9381.41
[39]Deep All79.9075.1069.5095.2079.93
MetaReg83.7077.2070.3095.5081.68
[71]Deep All78.3475.0265.2496.2178.70
MMLD81.2877.1672.2996.0981.83
Deep All77.2672.6469.0595.4178.59
Deep All (Aug)80.0374.4967.8595.2779.41
Ours w/o consis.81.8475.0577.0195.0782.24
Ours82.3275.7077.0395.8782.73
Table 9. Domain generalization results on Office-Home (Resnet-18). For details about the meaning of columns and the use of bold fonts, see Table 3.
Table 9. Domain generalization results on Office-Home (Resnet-18). For details about the meaning of columns and the use of bold fonts, see Table 3.
Office-Home-DGArtClipartProductReal-WorldAvg.
[70]Deep All55.5942.4270.3470.8659.81
D-SAM58.0344.3769.2271.4560.77
Deep All52.1545.8670.8673.1560.51
[5]JiGen53.0447.5171.4772.7961.20
[35]WADG55.3444.8272.0373.5561.44
[43]JigRot58.3349.6772.9775.2764.06
[42]Deep All54.3141.4170.3173.0359.77
DADG55.5748.7170.9073.7062.22
Deep All57.1649.0672.2273.5963.01
Ours59.2054.6773.2173.9365.25
Table 10. Domain generalization results on VLCS (Alexnet). For details about the meaning of columns and the use of bold fonts, see Table 3.
Table 10. Domain generalization results on VLCS (Alexnet). For details about the meaning of columns and the use of bold fonts, see Table 3.
VLCS-DGCaltechLabelmePascalSunAvg.
[69]Deep All85.7361.2862.7159.3367.26
DeepC87.4762.6063.9761.5168.89
CIDDG88.8363.0664.3862.1069.59
[68]Deep All86.1055.6059.1054.6063.85
CCSA92.3062.1067.1059.1070.15
[38]Deep All86.6758.2059.1057.8665.46
SLRC92.7662.3465.2563.5470.97
[53]Deep All93.4062.1168.4164.1672.02
TF93.6363.4969.9961.3272.11
[33]MMD-AAE94.4062.6067.7064.4072.28
[70]Deep All94.9557.4566.0665.8771.08
D-SAM91.7556.9558.5960.8467.03
[43]Deep All96.1559.0570.8463.9272.49
JigRot96.3059.2070.7366.3773.15
[5]Deep All96.9359.1871.9662.5772.66
JiGen96.9360.9070.6264.3073.19
[71]Deep All95.8957.8872.0167.7673.39
MMLD96.6658.7771.9668.1373.88
[72]Deep All97.1558.0773.1168.7974.28
ER96.9258.2673.2469.1074.38
[35]Deep All92.8663.1068.6764.1172.19
WADG96.6864.2671.4766.6274.76
[42]Deep All94.4461.3068.1163.5871.86
DADG96.8063.4470.7766.8174.76
Deep All97.7263.0371.9366.7074.85
Ours98.7462.2772.7968.1675.49
Table 11. Ablation studies of domain adaptation on PACS. The first three columns indicate the types of image transformations applied. Each column title in the middle indicates the name of the domain used as the target. We use bold font to highlight the best results.
Table 11. Ablation studies of domain adaptation on PACS. The first three columns indicate the types of image transformations applied. Each column title in the middle indicates the name of the domain used as the target. We use bold font to highlight the best results.
PACS-DA
Rnd-ColorRnd-GeoAdv-StnArt_Paint.CartoonSketchesPhotoAvg.
93.0291.5186.6298.0092.29
91.8389.4583.4297.9890.67
91.8591.6182.4597.9290.96
93.1091.0186.3398.1492.15
92.5691.4487.0898.0492.28
Table 12. Ablation studies of domain generalization on PACS. The first three columns indicate the types of image transformations applied. Each column title in the middle indicates the name of the domain used as target. We use bold font to highlight the best results.
Table 12. Ablation studies of domain generalization on PACS. The first three columns indicate the types of image transformations applied. Each column title in the middle indicates the name of the domain used as target. We use bold font to highlight the best results.
PACS-DA
Rnd-ColorRnd-GeoAdv-StnArt_Paint.CartoonSketchesPhotoAvg.
71.4072.4371.4490.2076.37
71.0872.4066.9891.3675.46
70.9273.4669.6690.7876.21
73.0572.1569.0891.6476.48
74.0272.2372.3691.1677.44
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, L.; Xu, J.; Zhao, D.; Shang, E.; Zhu, Q.; Dai, B. Adversarial and Random Transformations for Robust Domain Adaptation and Generalization. Sensors 2023, 23, 5273. https://doi.org/10.3390/s23115273

AMA Style

Xiao L, Xu J, Zhao D, Shang E, Zhu Q, Dai B. Adversarial and Random Transformations for Robust Domain Adaptation and Generalization. Sensors. 2023; 23(11):5273. https://doi.org/10.3390/s23115273

Chicago/Turabian Style

Xiao, Liang, Jiaolong Xu, Dawei Zhao, Erke Shang, Qi Zhu, and Bin Dai. 2023. "Adversarial and Random Transformations for Robust Domain Adaptation and Generalization" Sensors 23, no. 11: 5273. https://doi.org/10.3390/s23115273

APA Style

Xiao, L., Xu, J., Zhao, D., Shang, E., Zhu, Q., & Dai, B. (2023). Adversarial and Random Transformations for Robust Domain Adaptation and Generalization. Sensors, 23(11), 5273. https://doi.org/10.3390/s23115273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop