An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment
Abstract
:1. Introduction
- We present a novel and meaningful perspective on the OOD generalization problem. It involves capturing changes in data sampling environments through variations in the distribution of confounders.
- We propose a method for out-of-distribution generalization without environment labels. Our proposed method employs a variational approach to perform backdoor adjustment on features, thereby eliminating the impact of environmental changes on model training.
- We propose a framework for an OOD generalization algorithm that can be combined with any backbone model.
2. Related Work
3. Methodology
3.1. Problem Formulation
3.2. Limitations of the Empirical Risk Minimization Model
3.3. Backdoor Adjustment Based on Variational Inference
3.4. Model Structure
3.4.1. Variational Posterior Encoder
3.4.2. Inference Model
Algorithm 1 Variational backdoor adjustment framework |
Input:, dimensions of confounders m, the number of pseudo-inputs K, and weight parameter . Initialize the parameters of , , , and .
Output: The parameters of models , , and . |
4. Experimental Results
4.1. Linear Simulated Data
4.2. Colored MNIST
4.3. Real-World Data
5. Discussion
6. Conclusions
7. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
OOD | Out-of-Distribution |
VBA | Variational Backdoor Adjustment |
Appendix A. The Description of Backdoor Adjustment
Appendix B. The Derivation of ELBO
Appendix C. Calculation of Kullback–Leibler Divergence
Appendix D. Generating Linear Simulated Data
Case 1 | Case 2 | Case 3 |
---|---|---|
2 | ||
Appendix E. Model Settings
Appendix E.1. Linear Simulated Data
Appendix E.2. Colored MNIST
Appendix E.3. Real-World Data
References
- Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant Risk Minimization. arXiv 2019, arXiv:1907.02893. [Google Scholar]
- Liu, J.; Hu, Z.; Cui, P.; Li, B.; Shen, Z. Heterogeneous risk minimization. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 6804–6814. [Google Scholar]
- Liu, J.; Shen, Z.; He, Y.; Zhang, X.; Xu, R.; Yu, H.; Cui, P. Towards out-of-distribution generalization: A survey. arXiv 2021, arXiv:2108.13624. [Google Scholar]
- Beery, S.; Van Horn, G.; Perona, P. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 456–473. [Google Scholar]
- Yin, M.; Wang, Y.; Blei, D.M. Optimization-based causal estimation from heterogenous environments. arXiv 2021, arXiv:2109.11990. [Google Scholar]
- Schölkopf, B.; Locatello, F.; Bauer, S.; Ke, N.R.; Kalchbrenner, N.; Goyal, A.; Bengio, Y. Toward causal representation learning. Proc. IEEE 2021, 109, 612–634. [Google Scholar] [CrossRef]
- Peters, J.; Bühlmann, P.; Meinshausen, N. Causal inference using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. 2016, 78, 947–1012. [Google Scholar] [CrossRef]
- Koyama, M.; Yamaguchi, S. Out-of-distribution generalization with maximal invariant predictor. In Proceedings of the CoRR, Victoria, BC, Canada, 18 November–16 December 2020. [Google Scholar]
- Wang, R.; Yi, M.; Chen, Z.; Zhu, S. Out-of-distribution generalization with causal invariant transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 375–385. [Google Scholar]
- Liu, J.; Hu, Z.; Cui, P.; Li, B.; Shen, Z. Kernelized heterogeneous risk minimization. arXiv 2021, arXiv:2110.12425. [Google Scholar]
- Pearl, J. Causal inference in statistics: An overview. Stat. Surv. 2009, 3, 96. [Google Scholar] [CrossRef]
- Yang, C.; Wu, Q.; Wen, Q.; Zhou, Z.; Sun, L.; Yan, J. Towards out-of-distribution sequential event prediction: A causal treatment. Adv. Neural Inf. Process. Syst. 2022, 35, 22656–22670. [Google Scholar]
- Pearl, J.; Glymour, M.; Jewell, N.P. Causal Inference in Statistics: A Primer; John Wiley & Sons: Hoboken, NJ, USA, 2016; pp. 24–29. [Google Scholar]
- Muandet, K.; Balduzzi, D.; Schölkopf, B. Domain generalization via invariant feature representation. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 10–18. [Google Scholar]
- Recht, B.; Roelofs, R.; Schmidt, L.; Shankar, V. Do imagenet classifiers generalize to imagenet? In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 9413–9424. [Google Scholar]
- Schneider, S.; Rusak, E.; Eck, L.; Bringmann, O.; Brendel, W.; Bethge, M. Improving robustness against common corruptions by covariate shift adaptation. arXiv 2020, arXiv:2006.16971. [Google Scholar]
- Tu, L.; Lalwani, G.; Gella, S.; He, H. An empirical study on robustness to spurious correlations using pre-trained language models. Trans. Assoc. Comput. Linguist. 2020, 8, 621–633. [Google Scholar] [CrossRef]
- Yi, M.; Wang, R.; Sun, J.; Li, Z.; Ma, Z.-M. Improved OOD generalization via conditional invariant regularizer. arXiv 2022, arXiv:2207.06687. [Google Scholar]
- Sinha, A.; Namkoong, H.; Duchi, J. Certifying some distributional robustness with principled adversarial training. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Cui, P.; Athey, S. Stable learning establishes some common ground between causal inference and machine learning. Nat. Mach. Intell. 2022, 4, 110–115. [Google Scholar] [CrossRef]
- Rojas-Carulla, M.; Schölkopf, B.; Turner, R.; Peters, J. Invariant models for causal transfer learning. J. Mach. Learn. Res. 2018, 19, 1309–1342. [Google Scholar]
- Kuang, K.; Xiong, R.; Cui, P.; Athey, S.; Li, B. Stable prediction across unknown environments. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1617–1626. [Google Scholar]
- Schölkopf, B. Causality for Machine Learning. arXiv 2018, arXiv:1911.10500. [Google Scholar]
- Chang, S.; Zhang, Y.; Yu, M.; Jaakkola, T.S. Invariant rationalization. In Proceedings of the International Conference on Machine Learning, ICML, Virtual Event, 13–18 July 2020. [Google Scholar]
- Belcastro, L.; Carbone, D.; Cosentino, C.; Marozzo, F.; Trunfio, P. Enhancing Cryptocurrency Price Forecasting by Integrating Machine Learning with Social Media and Market Data. Algorithms 2023, 16, 542. [Google Scholar] [CrossRef]
- Shen, Z.; Cui, P.; Zhang, T.; Kuang, K. Stable learning via sample reweighting. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 5692–5699. [Google Scholar]
- Duchi, J.; Namkoong, H. Learning models with uniform performance via distributionally robust optimization. Ann. Stat. 2018, 49, 1378–1406. [Google Scholar] [CrossRef]
- Yi, M.; Hou, L.; Sun, J.; Shang, L.; Jiang, X.; Liu, Q.; Ma, Z.-M. Improved ood generalization via adversarial training and pretraing. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11987–11997. [Google Scholar]
- Kamath, P.; Tangella, A.; Sutherland, D.J.; Srebro, N. Does invariant risk minimization capture invariance? In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 4069–4077. [Google Scholar]
- Creager, E.; Jacobsen, J.H.; Zemel, R. Environment inference for invariant learning. In Proceedings of the ICML Workshop on Uncertainty and Robustness, Virtually, 17 July 2020. [Google Scholar]
- Dawid, A.P. Causal inference without counterfactuals. J. Am. Stat. Assoc. 2000, 95, 407–424. [Google Scholar] [CrossRef]
- Rubin, D.B. Causal inference using potential outcomes: Design, modeling, decisions. J. Am. Stat. Assoc. 2005, 100, 322–331. [Google Scholar] [CrossRef]
- Robins, J.M.; Hernon, M.A.; Brumback, B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000, 11, 550–560. [Google Scholar] [CrossRef]
- Pearl, J. Causality: Models, Reasoning, and Inference; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Greenl, S.; Pearl, J.; Robins, J.M. Causal diagrams for epidemiologic research. Epidemiology 1999, 10, 37–48. [Google Scholar] [CrossRef]
- Spirtes, P. Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality; Center for the Statistics and the Social Sciences, University of Washington Series, Working Paper 128; Now Publishers Inc.: Norwell, MA, USA, 2013. [Google Scholar]
- Spirtes, P.; Glymour, C.N.; Scheines, R. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
- Hair, J.F., Jr.; Sarstedt, M. Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. J. Mark. Theory Pract. 2021, 29, 65–77. [Google Scholar] [CrossRef]
- Br, J.E.; Zhou, X.; Xie, Y. Recent developments in causal inference and machine learning. Annu. Rev. Sociol. 2023, 49, 81–110. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Hoffman, M.D.; Johnson, M.J. Elbo surgery: Yet another way to carve up the variational evidence lower bound. In Proceedings of the Workshop in Advances in Approximate Bayesian Inference, Barcelona, Spain, 9 December 2016. [Google Scholar]
- Tomczak, J.; Welling, M. Vae with a vampprior. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR 2018, Playa Blanca, Spain, 9–11 April 2018. [Google Scholar]
- Dinh, L.; Krueger, D.; Bengio, Y. Nice: Non-linear independent components estimation. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30, In Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; Neural Information Processing Systems Foundation, Inc.: Jolla, CA, USA, 2017; p. 30. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, Y.; Shen, Z.; Cui, P. Towards non-i.i.d. image classification: A dataset and baselines. Pattern Recognit. 2019, 110, 107383. [Google Scholar] [CrossRef]
Datasets | m | K | ||||||
---|---|---|---|---|---|---|---|---|
Linear | 2-layer MLP | 1-layer FNN | 1-layer FNN | 1-layer FNN | 2-layer MLP | 1 | 5 | 0.5 |
Colored MNIST | 3-layer CNN | 2-layer MLP | 2-layer MLP | 1-layer FNN | 2-layer MLP | 5 | 10 | 0.5 |
NICO | ResNet [46] | 3-layer MLP | 3-layer MLP | 1-layer FNN | 4-layer MLP | 20 | 10 | 0.2 |
House prices | 2-layer MLP | 2-layer MLP | 2-layer MLP | 2-layer MLP | 3-layer MLP | 5 | 5 | 0.4 |
Case 1 | Case 2 | Case 3 | Need Environment Labels? | ||||
---|---|---|---|---|---|---|---|
Methods | MSE | Var | MSE | Var | MSE | Var | |
ERM | 5.16 | 0.28 | 5.04 | 0.31 | 5.97 | 0.35 | no |
DRO | 4.74 | 0.21 | 4.51 | 0.25 | 5.22 | 0.23 | no |
KerHRM | 4.52 | 0.18 | 4.56 | 0.21 | 4.97 | 0.28 | no |
EIIL | 4.83 | 0.23 | 4.74 | 0.22 | 5.18 | 0.25 | no |
IRM | 4.14 | 0.16 | 4.17 | 0.18 | 4.68 | 0.20 | yes |
VBA | 4.04 | 0.18 | 3.87 | 0.15 | 5.07 | 0.23 | no |
Case 1 | Case 2 | Case 3 | ||||
---|---|---|---|---|---|---|
Methods | Conv_Time | Infer_Time | Conv_Time | Infer_Time | Conv_Time | Infer_Time |
ERM | 20.91 s | 1.55 s | 15.83 s | 1.04 s | 25.47 s | 1.53 s |
DRO | 29.95 s | 2.89 s | 23.57 s | 1.91 s | 37.20 s | 3.01 s |
KerHRM | 25.37 s | 2.53 s | 23.69 s | 1.86 s | 29.42 s | 2.58 s |
EIIL | 33.50 s | 2.26 s | 29.16 s | 1.81 s | 36.11 s | 2.30 s |
IRM | 17.75 s | 2.13 s | 16.36 s | 1.46 s | 19.53 s | 2.16 s |
VBA | 18.56 s | 2.50 s | 17.20 s | 1.83 s | 18.76 s | 2.39 s |
Cases | |||||
---|---|---|---|---|---|
Case 1 | 0.02 (0) | 0.47 (0.5) | 0.97 (1) | 1.50 (1.5) | 1.91 (2) |
Case 2 | −0.01 (0) | 0.49 (0.5) | 1.03 (1) | 1.47 (1.5) | 1.94 (2) |
Case 3 | 0.06 (0) | 0.09 (0) | 0.11 (0) | 0.09 (0) | 0.09 (0) |
Methods | Train_Acc | Test_Acc | Gener_Gap | Conv_Time | Infer_Time | Need Environment Labels? |
---|---|---|---|---|---|---|
ERM | 0.94 | 0.48 | 0.46 | 33.32 s | 5.10 s | no |
DRO | 0.79 | 0.55 | 0.24 | 87.12 s | 8.31 s | no |
KerHRM | 0.81 | 0.65 | 0.16 | 42.60 s | 9.02 s | no |
EIIL | 0.85 | 0.64 | 0.21 | 79.35 s | 6.38 s | no |
IRM | 0.81 | 0.68 | 0.13 | 35.38 s | 5.94 s | yes |
VBA | 0.84 | 0.75 | 0.09 | 31.61 s | 8.40 s | no |
Methods | Train_Acc | Test_Acc | Gener_Gap | Conv_Time | Infer_Time | Need Environment Labels? |
---|---|---|---|---|---|---|
ERM | 0.90 | 0.54 | 0.36 | 1648.64 s | 10.91 s | no |
DRO | 0.79 | 0.55 | 0.24 | 4681.59 s | 16.89 s | no |
KerHRM | 0.81 | 0.62 | 0.19 | 3980.41 s | 19.90 s | no |
EIIL | 0.80 | 0.58 | 0.18 | 5023.60 s | 16.32 s | no |
IRM | 0.85 | 0.69 | 0.16 | 2351.21 s | 15.64 s | yes |
VBA | 0.83 | 0.73 | 0.10 | 1631.63 s | 17.75 s | no |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, H.; Wang, W. An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment. Mathematics 2024, 12, 85. https://doi.org/10.3390/math12010085
Su H, Wang W. An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment. Mathematics. 2024; 12(1):85. https://doi.org/10.3390/math12010085
Chicago/Turabian StyleSu, Hang, and Wei Wang. 2024. "An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment" Mathematics 12, no. 1: 85. https://doi.org/10.3390/math12010085
APA StyleSu, H., & Wang, W. (2024). An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment. Mathematics, 12(1), 85. https://doi.org/10.3390/math12010085