Unsupervised Domain Adaptation for Image Classification and Object Detection Using Guided Transfer Learning Approach and JS Divergence
Abstract
:1. Introduction
- To the best of our knowledge, we propose the first-of-its-kind layer selection strategy using the Guided Transfer Learning approach to fine-tune the domain adaptation network and maximize feature transfer between domains.
- We employ JS-Divergence to reduce the feature distribution gap between source and target domains.
- We introduce the weighted cross entropy loss to tackle the class imbalance problem.
- We further propose a robust object detection UDA framework that is applied to the two-stage Faster R-CNN and single-stage SSD (Single Shot multibox Detector) object detector effectively.
- We conduct extensive experiments on benchmark datasets to validate the performance of our UDA image classification and object detection method, compare it with state-of-the-art (SOTA) methods, and obtain promising results. Moreover, we also demonstrate ablation studies to show the impact of each component in our proposed framework. Furthermore, we present the first-of-its-kind Indian Vehicle dataset for domain adaptive object detection task to evaluate the adaptivity of our object detector in the new domain.
2. Related Work
2.1. Unsupervised Domain Adaptive Image Classification
2.1.1. Discrepancy-Based Approaches
2.1.2. Adversarial-Based Approaches
2.2. Unsupervised Domain Adaptive Object Detection
3. Methodology
3.1. Problem Formulation
3.2. Guided Transfer Learning Approach
Algorithm 1: Guided Transfer Learning (GTL) approach to find the kth layer | |
Input: labeled source domain , unlabeled target domain data Output: kth layer () | |
1: | Take five random samples () from the Source domain ) and Target domain ) for each label Here, each label is considered as per the labeled data available in the source domain. Since label space is the same in both domains, five random samples have been selected from the target domain. Here, we consider whole images for the image classification problem and cropped images as per the bounding box for the object detection problem. |
2: | Pass all samples through the ResNet-50 network up to the last convolution layer to generate a flattened feature vector and apply the mean on five feature vectors of the same labeled images. —source feature vectors, —target feature vectors, where j = 1 to 5 (#samples), i = 1 to (#labels). The mean of five samples’ feature vectors of each label is calculated using Equations (1) and (2). |
3: | Apply JS (Jensen–Shannon) distance among source feature vectors and the target feature vector of each label to find the similarity score between each source label to the target label. Transferability measures should be symmetric in the domain adaptation because the label space is common in both domains and the distance between two feature vectors is computed using JS-Divergence. The JS-Divergence can be calculated as follows between the two distributions P and Q:
JS(P || Q) = 1/2 × KL(P || M) + 1/2 × KL(Q || M)
M = 1/2 × (P + Q)
|
4: | Calculate Transferability Score (). is computed by taking an average of the similarity score of each label, calculated in step-3. |
5: | Find the kth layer. Layers to are frozen and layers to are fine-tuned during the training process. |
3.3. Proposed Approach
3.3.1. Unsupervised Domain Adaptation for Image Classification (DAGTL-IC)
3.3.2. Loss Functions for Domain Adaptive Image Classification
- Classification lossClassification loss is calculated using the first stream of the architecture. It uses the labeled data from the source domain only as the target domain does not have labeled data. The features extracted from the last flattened layers are fed into the output layer with a softmax classifier to optimize the classification loss. The classification loss function can be written as follows:The classifier is expected to train well the conditional probability of input data to in the source label space. However, this assumption holds true only when the labeled data are equally divided among the number of classes. In the dataset of domain adaptation, it is observed that the data are not equally divided, resulting in a biased classification. In order to mitigate this situation, we introduced the weight to each of the classes to improve the performance of the classifier. can be defined as follows for each category:Let , this represents the frequency of each category.
- is a set of frequencies (number of samples) of each category, where represents the frequency of the ith category.
- is the total number of categories in the dataset.
- is the maximum frequency in F, i.e., the highest number of samples among all categories.
- is the weight for the ith category, which represents how important that category is in the dataset.
- The formula calculates by taking the ratio of and .
The intuition behind this equation is to assign lower weights to categories that appear more frequently, and higher weights to categories that occur less frequently. This equation balances the importance of each category during training and reduces the class bias problem due to an imbalanced dataset. - Dyomain discrepancy lossThe domain discrepancy loss is computed between the bottleneck layers of both domain streams. To minimize the distance between the domains, JS-Distance is employed to learn the domain invariant features. We intend to transfer as much knowledge as possible from the source domain to the target domain by minimizing the domain alignment loss. The feature vectors of the bottleneck layers of the source and target domains are denoted as and respectively. This loss function can be calculated as follows:JS-Distance is the square root of the JS-Divergence and its value ranges between 0 (highly similar distributions) and 1 (maximally different distributions) when using a base-2 logarithm. JS-Divergence is a method used to measure the similarity between two probability distributions. The reasons to use JS-Divergence are two folds. These are: (i) It is a symmetric version of KL-Divergence and can be used to calculate the distance between distributions because it has a finite range between 0 and 1. (ii) It is a kind of average between two distributions, thus two distributions are equally participating to find the domain invariant features.
- Overall objective loss function for domain adaptive image classificationTo achieve efficient domain adaptation in image classification, the aim is to minimize the distance between the domains and train a classifier that can be transferred across the domains. To meet both these criteria, an integrated approach is used by combining the classification loss and domain shift loss as an overall objective loss function with a trade-off parameter. The objective is to minimize the overall loss. After reducing the overall loss to a minimum, the trained model is directly applied to the target domain. The overall objective loss function for image classification is given as follows:The overall objective loss function of DAGTL-IC:denotes the classification loss in the source domain, represents the domain discrepancy loss between the domains and λ is the trade-off parameter; > 0.
Algorithm 2: Unsupervised Domain Adaptation for Image Classification (DAGTL-IC) | |
Input: labeled Source domain , Unlabeled target domain data , regularization parameters λ. Output: Domain invariant features , Classifier C. | |
1: | Configure the CNN. Initialize the ResNet-50 model until the last convolutional layer, then add a bottleneck layer of 512 neurons. Finally, add an output (classification) layer with a number of neurons that is equal to the number of categories in the dataset. |
2: | Find the kth layer using a guided transfer learning approach, according to Algorithm 1. |
3: | Freeze the layers and fine-tune the layers during the training process. |
4: | Repeat |
5: | Sample mini-batch from the source domain with labeled data and the target domain with unlabeled data |
6: | Feed the sampled mini-batch and calculate domain discrepancy loss , classification loss and the overall objective loss function |
7: | Update the parameters of the network by minimizing the overall loss using the stochastic gradient descent (SGD) method. |
8: | Until converges. |
9: | Return trained Classifier . |
3.3.3. Unsupervised Domain Adaptation for Object Detection (DAGTL-OD)
3.3.4. Loss Functions for Domain Adaptive Object Detection
- Detection lossThe object detection model is trained with classification loss and regression loss. Classification loss and regression loss are used to classify the object with a label and bounding box for better object localization from the ROIs. Classification loss is calculated as per Equation (8) with weight to handle the class imbalance problem. Regression loss is computed by applying the smooth L1 loss function to the difference between the predicted and ground truth bounding box values. These losses are computed in the source network, as this network is trained with labeled data only. The loss function of the detection model is written as follows:
- Domain discrepancy lossIn object detection, there are two important aspects for reducing the shift between the domains: whole image differences like scale, illumination, etc., and particular objects of the image differences like scale, and appearance. To align the distribution between domains, we introduce the two types of losses in the proposed network training: image-level domain discrepancy loss and object-level domain discrepancy loss . The image level discrepancy loss is calculated using JS-Divergence between the features extracted from the flattened layer of the source and target networks. This loss eliminates the distance between the distribution of both domains at the image level and learns the domain invariant features across the domains. Let and denote the feature vectors of the flattened layer from the source and target networks respectively. The image-level domain discrepancy loss can be written asThe object-level features are obtained from the vectors of the region of interest. These feature vectors from both domains are utilized to compute using the JS-divergence. However, there is not a fixed number of ROI vectors in both domains. Thus, the object-level domain discrepancy loss can be written for the jth ROI vector in the ith image as follows.
- Overall objective loss function for domain adaptive object detectionTo obtain an effective domain adaptive object detector, we attempt to reduce the domain shift gap across the domains including the classification and regression loss of the object detection model. We combine the detection loss and domain shift loss as an overall objective loss function with a trade-off parameter, and our goal is to minimize the total loss. After reducing the overall loss to a minimum, the trained detector model is directly applied to the target domain. The overall objective loss function for object detection is as follows.The overall objective loss function of DAGTL-OD:defines the object detection loss, which includes classification loss and regression loss, denotes the image-level domain discrepancy loss, represents the object-level domain discrepancy loss, and is the trade-off parameter; > 0.
Algorithm 3: Unsupervised Domain Adaptation for object detection (DAGTL-OD) | |
Input: labeled Source domain , Unlabeled target domain data , regularization parameters λ. Output: Domain invariant features at image-level and object-level , Detector . | |
1: | Configure the object detection model. Initialize the backbone network as a ResNet-50 model until the last convolutional layer. Add detection head (Faster R-CNN/SSD). |
2: | Find the kth layer from the ResNet-50 network using the guided transfer learning approach according to Algorithm 1. |
3: | and fine-tune the layers of ResNet-50. including the whole detection head during the training process. |
4: | Repeat |
5: | Sample mini-batch from the source domain with labeled data and the target domain with unlabeled data |
6: | Feed the sampled mini-batch and calculate object detection loss , image-level domain discrepancy loss (), object-level domain discrepancy loss () and overall objective loss function . |
7: | Update the parameters of the network by minimizing the overall loss using the SGD method. |
8: | Until converges. |
9: | Return trained detection network . |
4. Experimental Analysis
4.1. Dataset Description
4.1.1. Office-31
4.1.2. Office-Home
4.1.3. Cityscapes
4.1.4. Foggy Cityscapes
4.1.5. Indian Vehicle Dataset
4.2. Implementation Details
4.3. Results and Discussion
4.3.1. Office-31
4.3.2. Office-Home
4.3.3. Cityscape → Foggy Cityscapes
4.3.4. Cityscape → Indian Vehicle Dataset
4.4. Feature Visualization
4.5. Parameter Sensitivity and Convergence
4.6. Ablation Studies
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Patel, V.M.; Gopalan, R.; Li, R.; Chellappa, R. Visual Domain Adaptation: A survey of recent advances. IEEE Signal Process. Mag. 2015, 32, 53–69. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar]
- Sun, B.; Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Proceedings, Part III 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 443–450. [Google Scholar]
- Kang, G.; Jiang, L.; Yang, Y.; Hauptmann, A.G. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4893–4902. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Shen, J.; Qu, Y.; Zhang, W.; Yu, Y. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- Ghifary, M.; Kleijn, W.B.; Zhang, M. Domain adaptive neural networks for object recognition. In Proceedings of the PRICAI 2014: Trends in Artificial Intelligence: 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, Australia, 1–5 December 2014; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 898–904. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. Int. Conf. Mach. Learn. PMLR 2015, 37, 97–105. [Google Scholar]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. Int. Conf. Mach. Learning. PMLR 2017, 70, 2208–2217. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yoo, D.; Kim, N.; Park, S.; Paek, A.; Kweon, I. Pixel-level domain transfer. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Sun, B.; Feng, J.; Saenko, K. Correlation alignment for unsupervised domain adaptation. Domain Adapt. Comput. Vis. Appl. 2017, 153–171. [Google Scholar] [CrossRef]
- Lee, C.Y.; Batra, T.; Baig, M.H.; Ulbricht, D. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10285–10295. [Google Scholar]
- Deng, W.; Zheng, L.; Sun, Y.; Jiao, J. Rethinking triplet loss for domain adaptation. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 29–37. [Google Scholar] [CrossRef]
- Samsudin, M.R.; Abu-Bakar, S.A.; Mokji, M.M. Balanced Weight Joint Geometrical and Statistical Alignment for Unsupervised Domain Adaptation. J. Adv. Inf. Technol. 2022, 13, 21–28. [Google Scholar] [CrossRef]
- Xie, B.; Li, S.; Lv, F.; Liu, C.H.; Wang, G.; Wu, D. A collaborative alignment framework of transferable knowledge extraction for unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 2022. [Google Scholar] [CrossRef]
- Wang, J.; Chen, Y.; Feng, W.; Yu, H.; Huang, M.; Yang, Q. Transfer learning with dynamic distribution adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 2020, 11, 1–25. [Google Scholar] [CrossRef]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
- Cao, Z.; Long, M.; Wang, J.; Jordan, M.I. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2724–2732. [Google Scholar]
- Volpi, R.; Morerio, P.; Savarese, S.; Murino, V. Adversarial feature augmentation for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5495–5504. [Google Scholar]
- Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3723–3732. [Google Scholar]
- Zhang, Y.; Tang, H.; Jia, K.; Tan, M. Domain-symmetric networks for adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5031–5040. [Google Scholar]
- Hu, L.; Kan, M.; Shan, S.; Chen, X. Unsupervised domain adaptation with hierarchical gradient synchronization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4043–4052. [Google Scholar]
- Zhang, C.; Zhao, Q.; Wang, Y. Hybrid adversarial network for unsupervised domain adaptation. Inf. Sci. 2020, 514, 44–55. [Google Scholar] [CrossRef]
- Tang, H.; Chen, K.; Jia, K. Unsupervised domain adaptation via structurally regularized deep clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8725–8735. [Google Scholar]
- Na, J.; Jung, H.; Chang, H.J.; Hwang, W. Fixbi: Bridging domain spaces for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1094–1103. [Google Scholar]
- Pei, Z.; Cao, Z.; Long, M.; Wang, J. Multi-adversarial domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Pinheiro, P.O. Unsupervised domain adaptation with similarity learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8004–8013. [Google Scholar]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
- Chen, L.; Chen, H.; Wei, Z.; Jin, X.; Tan, X.; Jin, Y.; Chen, E. Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7181–7190. [Google Scholar]
- Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Proceedings of the Computer Vision—ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part IV 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 213–226. [Google Scholar]
- Venkateswara, H.; Eusebio, J.; Chakraborty, S.; Panchanathan, S. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5018–5027. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Denker, J.; Gardner, W.; Graf, H.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L.D.; Baird, H.; Guyon, I. Neural network recognizer for hand-written zip code digits. Adv. Neural Inf. Process. Syst. 1988, 1, 323–331. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3339–3348. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6956–6965. [Google Scholar]
- Zhu, X.; Pang, J.; Yang, C.; Shi, J.; Lin, D. Adapting object detectors via selective cross-domain alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 687–696. [Google Scholar]
- Zheng, Y.; Huang, D.; Liu, S.; Wang, Y. Cross-domain object detection through coarse-to-fine feature adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13766–13775. [Google Scholar]
- He, Z.; Zhang, L. Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6668–6677. [Google Scholar]
- Kim, T.; Jeong, M.; Kim, S.; Choi, S.; Kim, C. Diversify and match: A domain adaptive representation learning paradigm for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12456–12465. [Google Scholar]
- Su, P.; Wang, K.; Zeng, X.; Tang, S.; Chen, D.; Qiu, D.; Wang, X. Adapting object detectors with conditional domain normalization. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer Proceedings, Part XI 16. : Berlin/Heidelberg, Germany, 2020; pp. 403–419. [Google Scholar]
- Chen, C.; Zheng, Z.; Ding, X.; Huang, Y.; Dou, Q. Harmonizing transferability and discriminability for adapting object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8869–8878. [Google Scholar]
- Rodriguez, A.L.; Mikolajczyk, K. Domain adaptation for object detection via style consistency. arXiv 2019, arXiv:1911.10033. [Google Scholar]
- Wang, W.; Cao, Y.; Zhang, J.; He, F.; Zha, Z.J.; Wen, Y.; Tao, D. Exploring sequence feature alignment for domain adaptive detection transformers. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 1730–1738. [Google Scholar]
- Zhou, W.; Du, D.; Zhang, L.; Luo, T.; Wu, Y. Multi-granularity alignment domain adaptation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9581–9590. [Google Scholar]
- Gong, K.; Li, S.; Li, S.; Zhang, R.; Liu, C.H.; Chen, Q. Improving Transferability for Domain Adaptive Detection Transformers. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10 October 2022; pp. 1543–1551. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Method | Type of Domain Adaptation | Base Network | Loss | Datasets | Year | ||
Office-31 [37] | Office-Home [38] | Digits (MNIST [39]/USPS [40]) | |||||
DDC [5] | Discrepancy-based | AlexNet | MMD | ✓ | - | - | 2014 |
DAN [13] | Discrepancy-based | AlexNet | MK-MMD | ✓ | - | - | 2015 |
DANN [23] | Adversarial-based | AlexNet | GAN-based Discriminator | ✓ | - | ✓ | 2015 |
CORAL [17] | Discrepancy-based | AlexNet | CORAL | ✓ | - | - | 2016 |
ADDA [24] | Adversarial-based | AlexNet & ResNet-50 | GAN-based Discriminator | ✓ | - | ✓ | 2017 |
JAN [14] | Discrepancy-based | ResNet-50 | JMMD | ✓ | - | - | 2017 |
CDAN [35] | Discrepancy-based | ResNet-50 | Conditional- based Discriminator | ✓ | ✓ | ✓ | 2018 |
MADA [33] | Adversarial-based | ResNet-50 | GAN-based Discriminator | ✓ | - | - | 2018 |
SimNets [34] | Adversarial-based | ResNet-50 | GAN-based Discriminator | ✓ | - | ✓ | 2018 |
CAN [7] | Discrepancy-based | ResNet-50 | CCD | ✓ | - | - | 2019 |
SymNets [28] | Adversarial-based | ResNet-50 | GAN-based domain confusion | ✓ | ✓ | - | 2019 |
SGC [19] | Discrepancy-based | ResNet-50 | JMMD | ✓ | ✓ | ✓ | 2020 |
MDDA [22] | Discrepancy-based | ResNet-50 | MMD | ✓ | ✓ | ✓ | 2020 |
HAN [30] | Discrepancy & Adversarial-based | ResNet-50 | CORAL and GAN-based Discriminator | ✓ | ✓ | - | 2020 |
GSDA [29] | Adversarial-based | ResNet-50 | Global and local Adversarial Discriminator | ✓ | ✓ | - | 2020 |
SRDC [31] | Adversarial-based | ResNet-50 | Clustering-based Discriminator | ✓ | ✓ | - | 2020 |
FixBi [32] | Adversarial-based | ResNet-50 | Augmentation | ✓ | ✓ | - | 2021 |
CAF [21] | Discrepancy-based | ResNet-50 | Wasserstein distance | ✓ | - | - | 2022 |
DALN [36] | Adversarial-based | ResNet-50 | NWD-based Discriminator | ✓ | ✓ | - | 2022 |
Method | Detection Network | Loss | Datasets | Year | ||
Cityscapes [58] | Foggy Cityscapes [58] | KITTI [59] | ||||
DA-Faster [46] | Faster R-CNN | H-divergence based Discriminator | ✓ | ✓ | ✓ | 2018 |
SWDA [47] | Faster R-CNN | Weak Global and Strong local Feature Alignment | ✓ | ✓ | ✓ | 2019 |
SCDA [48] | Faster R-CNN | Region-Level Adversarial Alignment | ✓ | ✓ | - | 2019 |
CFA [49] | Faster R-CNN | Prototype-based Semantic Alignment | ✓ | ✓ | ✓ | 2020 |
MAF [50] | Faster R-CNN | Adversarial domain alignment loss | ✓ | ✓ | ✓ | 2019 |
CDN [52] | Faster R-CNN | CDN-based adversarial loss | ✓ | ✓ | ✓ | 2020 |
HTCN [53] | Faster R-CNN | Pixel-wise adversarial loss | ✓ | ✓ | - | 2020 |
ODSC [54] | SSD | Pseudo Labels and Style Transfer alignment | ✓ | ✓ | - | 2020 |
SFA [55] | DefDETR | Token-wise and Hierarchical Sequence Feature Alignment loss | ✓ | ✓ | - | 2021 |
MGA [56] | Faster R-CNN & FCOS | Pixel-level, instance-level, and category-level. | ✓ | ✓ | ✓ | 2022 |
O2net [57] | DefDETR | Pixel- and instance-level | ✓ | ✓ | - | 2022 |
Domains | A & W | A & D | W & D | Ar & Cl | Ar & Pr | Ar & Rw | Cl & Pr | Cl & Rw | Pr & Rw | C & F | C & I |
---|---|---|---|---|---|---|---|---|---|---|---|
0.758 | 0.721 | 0.93 | 0.65 | 0.72 | 0.775 | 0.7 | 0.68 | 0.836 | 0.88 | 0.67 | |
37 | 35 | 45 | 31 | 35 | 38 | 34 | 33 | 41 | 43 | 32 |
Methods (Source → Target) | A → D | A → W | D → A | D → W | W → A | W → D | Avg. Accuracy |
---|---|---|---|---|---|---|---|
ResNet-50 | 68.9 | 68.4 | 62.5 | 96.7 | 60.7 | 99.3 | 76.1 |
CORAL [17] | 81.5 | 77.0 | 65.9 | 97.1 | 64.3 | 99.6 | 80.9 |
DANN [23] | 79.7 | 82.0 | 68.2 | 96.9 | 67.4 | 99.1 | 82.2 |
ADDA [24] | 77.8 | 86.2 | 69.5 | 96.2 | 68.9 | 98.4 | 82.9 |
JAN [14] | 84.7 | 85.4 | 68.6 | 97.4 | 70.0 | 99.8 | 84.3 |
MADA [33] | 87.8 | 90.0 | 70.3 | 97.4 | 66.4 | 100 | 85.2 |
MDDA [22] | 86.3 | 86.0 | 72.1 | 97.1 | 73.2 | 99.2 | 85.7 |
SimNets [34] | 88.6 | 85.3 | 73.4 | 98.2 | 71.8 | 99.7 | 86.2 |
SymNets [28] | 93.9 | 90.8 | 74.6 | 98.8 | 72.5 | 100 | 88.4 |
HAN [30] | 95.3 | 94.4 | 72.1 | 98.8 | 71.7 | 100 | 88.7 |
GSDA [29] | 94.8 | 95.7 | 73.5 | 99.1 | 74.9 | 100 | 89.7 |
SRDC [31] | 95.8 | 95.7 | 76.7 | 99.2 | 77.1 | 100 | 90.8 |
DALN [36] | 95.4 | 95.8 | 76.4 | 99.1 | 76.5 | 100 | 90.4 |
FixBi [32] | 95.0 | 96.1 | 78.7 | 99.3 | 79.4 | 100 | 91.4 |
Ours (DAGTL-IC) | 97.2 | 97.1 | 82.9 | 99.2 | 82.7 | 100 | 93.2 |
Source ↓ Target | Ar ↓ Cl | Ar ↓ Pr | Ar ↓ Rw | Cl ↓ Ar | Cl ↓ Pr | Cl ↓ Rw | Pr ↓ Ar | Pr ↓ Cl | Pr ↓ Rw | Rw ↓ Ar | Rw ↓ Cl | Rw ↓ Pr | Avg. Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 | 34.9 | 50.0 | 58.0 | 37.4 | 41.9 | 46.2 | 38.5 | 31.2 | 60.4 | 53.9 | 41.2 | 59.9 | 46.1 |
CORAL [17] | 42.2 | 59.1 | 64.9 | 46.4 | 56.3 | 58.3 | 45.4 | 41.2 | 68.5 | 60.1 | 48.2 | 73.1 | 55.3 |
DANN [23] | 45.6 | 59.3 | 70.1 | 47.0 | 58.5 | 60.9 | 46.1 | 43.7 | 68.5 | 63.2 | 51.8 | 76.8 | 57.6 |
JAN [14] | 45.9 | 61.2 | 68.9 | 50.4 | 59.7 | 61.0 | 45.8 | 43.4 | 70.3 | 63.9 | 52.4 | 76.8 | 58.3 |
CDAN [35] | 46.6 | 65.9 | 73.4 | 55.7 | 62.7 | 64.2 | 51.8 | 49.1 | 74.5 | 68.2 | 56.9 | 80.7 | 62.8 |
MDDA [22] | 54.9 | 75.9 | 77.2 | 58.1 | 73.3 | 71.5 | 59.0 | 52.6 | 77.8 | 67.9 | 57.6 | 81.8 | 67.3 |
SymNets [28] | 47.7 | 72.9 | 78.5 | 64.2 | 71.3 | 74.2 | 63.6 | 47.6 | 79.4 | 73.8 | 50.8 | 82.6 | 67.2 |
GSDA [29] | 61.3 | 76.1 | 79.4 | 65.4 | 73.3 | 74.3 | 65.0 | 53.2 | 80.0 | 72.2 | 60.6 | 83.1 | 70.3 |
SRDC [31] | 52.3 | 76.3 | 81.0 | 69.5 | 76.2 | 78.0 | 68.7 | 53.8 | 81.7 | 76.3 | 57.1 | 85.0 | 71.3 |
DALN [36] | 57.8 | 79.9 | 82.0 | 66.3 | 76.2 | 77.2 | 66.7 | 55.5 | 81.3 | 73.5 | 60.4 | 85.3 | 71.8 |
FixBi [32] | 58.1 | 77.3 | 80.4 | 67.7 | 79.5 | 78.1 | 65.8 | 57.9 | 81.7 | 76.4 | 62.9 | 86.7 | 72.7 |
Ours (DAGTL-IC) | 61.3 | 80.5 | 83.2 | 70.2 | 82.5 | 80.4 | 69.2 | 61.8 | 84.1 | 75.8 | 65.1 | 89.5 | 75.3 |
Methods | Person | Rider | Car | Truck | Bus | Train | Mcycle | Bicycle | mAP |
---|---|---|---|---|---|---|---|---|---|
Faster R-CNN | 17.8 | 23.6 | 27.1 | 11.9 | 23.8 | 9.1 | 14.4 | 22.8 | 18.8 |
DA-Faster [46] | 25.0 | 31.0 | 40.5 | 22.1 | 35.3 | 20.2 | 20.1 | 27.1 | 27.6 |
SCDA [48] | 33.5 | 38.0 | 48.5 | 26.5 | 39.0 | 23.3 | 28.0 | 33.6 | 33.8 |
ODSC [54] | 29.9 | 42.3 | 43.5 | 24.5 | 36.2 | 32.6 | 35.3 | 30.0 | 34.3 |
SWDA [47] | 30.3 | 42.5 | 44.6 | 24.5 | 36.7 | 31.6 | 30.2 | 35.8 | 34.8 |
CDN [52] | 35.8 | 45.7 | 50.9 | 30.1 | 42.5 | 29.8 | 30.8 | 36.5 | 36.6 |
HTCN [53] | 33.2 | 47.5 | 47.9 | 31.6 | 47.4 | 40.9 | 32.3 | 37.1 | 39.8 |
SFA [55] | 46.5 | 48.6 | 62.6 | 25.1 | 46.2 | 29.4 | 28.3 | 44.0 | 41.3 |
MGA [56] | 43.9 | 49.6 | 60.6 | 29.6 | 50.7 | 39.0 | 38.3 | 42.8 | 44.3 |
O2net [57] | 48.7 | 51.5 | 63.6 | 31.1 | 47.6 | 47.8 | 38.0 | 45.9 | 46.8 |
Ours (FRCNN) | 50.2 | 52.2 | 63.5 | 36.7 | 57.5 | 47.8 | 40.6 | 49.8 | 49.7 |
Ours (SSD) | 51.8 | 51.4 | 62.2 | 38.4 | 63.1 | 49.8 | 38.8 | 53.4 | 51.1 |
Methods | Car | Truck | Bus | Mcycle | Bicycle | mAP |
---|---|---|---|---|---|---|
Faster R-CNN | 70.8 | 48.6 | 50.3 | 65.2 | 55.3 | 58.0 |
Ours (FRCNN) | 85.8 | 61.3 | 65.4 | 78.5 | 61.3 | 70.5 |
Ours (SSD) | 82.5 | 65.9 | 69.7 | 82.5 | 63.1 | 72.7 |
ResNet-50 | A → W | A → D | W → A | D → A | Ar → Rw | Cl → Pr | ||||
---|---|---|---|---|---|---|---|---|---|---|
ResNet-50 | ✓ | 68.4 | 68.9 | 60.7 | 62.5 | 58 | 41.9 | |||
Proposed Model | ✓ | ✓ | 95.6 | 95.4 | 77.3 | 77.6 | 78.9 | 77.8 | ||
✓ | ✓ | ✓ | 96.2 | 96.4 | 81.1 | 80.1 | 81.3 | 80.5 | ||
✓ | ✓ | ✓ | ✓ | 97.1 | 97.2 | 82.7 | 82.9 | 83.2 | 82.5 |
Faster R-CNN | Faster R-CNN | Faster R-CNN + GTL | C → F | C → I | |||
---|---|---|---|---|---|---|---|
Faster R-CNN | ✓ | 18.8 | 58.0 | ||||
Proposed Model | ✓ | ✓ | 44.6 | 63.5 | |||
✓ | ✓ | ✓ | 46.2 | 66.9 | |||
✓ | ✓ | ✓ | ✓ | 48.9 | 69.2 | ||
✓ | ✓ | ✓ | ✓ | ✓ | 49.7 | 70.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Goel, P.; Ganatra, A. Unsupervised Domain Adaptation for Image Classification and Object Detection Using Guided Transfer Learning Approach and JS Divergence. Sensors 2023, 23, 4436. https://doi.org/10.3390/s23094436
Goel P, Ganatra A. Unsupervised Domain Adaptation for Image Classification and Object Detection Using Guided Transfer Learning Approach and JS Divergence. Sensors. 2023; 23(9):4436. https://doi.org/10.3390/s23094436
Chicago/Turabian StyleGoel, Parth, and Amit Ganatra. 2023. "Unsupervised Domain Adaptation for Image Classification and Object Detection Using Guided Transfer Learning Approach and JS Divergence" Sensors 23, no. 9: 4436. https://doi.org/10.3390/s23094436
APA StyleGoel, P., & Ganatra, A. (2023). Unsupervised Domain Adaptation for Image Classification and Object Detection Using Guided Transfer Learning Approach and JS Divergence. Sensors, 23(9), 4436. https://doi.org/10.3390/s23094436