C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation
Abstract
:1. Introduction
2. Related Work
2.1. Soft Actor Critic (SAC)
2.2. Self-Supervised Learning
2.3. Network Randomization
3. Proposed Convolutional–Contrastive Learning for RL (C2RL)
3.1. Randomized Input Observation
Image Blending
3.2. Strong Convolutional–Contrastive Learning
3.2.1. Self-Pretraining for Strong Augmentation
3.2.2. Convolutional–Contrastive Learning Strategy for Reinforcement Learning
4. Results
4.1. Augmentation Methods for Convolutional–Contrastive Learning
4.2. Comparison with Existing Reinforcement Learning Networks
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Vinyals, O.; Ewalds, T.; Bartunov, S.; Georgiev, P.; Vezhnevets, A.S.; Yeo, M.; Makhzani, A.; Küttler, H.; Agapiou, J.; Schrittwieser, J.; et al. Starcraft ii: A new challenge for reinforcement learning. arXiv 2017, arXiv:1708.04782. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Jaderberg, M.; Mnih, V.; Czarnecki, W.M.; Schaul, T.; Leibo, J.Z.; Silver, D.; Kavukcuoglu, K. Reinforcement learning with unsupervised auxiliary tasks. arXiv 2016, arXiv:1611.05397. [Google Scholar]
- Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Jaderberg, M.; Czarnecki, W.M.; Dunning, I.; Marris, L.; Lever, G.; Castaneda, A.G.; Beattie, C.; Rabinowitz, N.C.; Morcos, A.S.; Ruderman, A.; et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 2019, 364, 859–865. [Google Scholar] [CrossRef]
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of the Conference on Robot Learning, PMLR, Zürich, Switzerland, 29–31 October 2018. [Google Scholar]
- Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef]
- Kaiser, L.; Babaeizadeh, M.; Milos, P.; Osinski, B.; Campbell, R.H.; Czechowski, K.; Erhan, D.; Finn, C.; Kozakowski, P.; Levine, S.; et al. Model-based reinforcement learning for atari. arXiv 2019, arXiv:1903.00374. [Google Scholar]
- Laskin, M.; Srinivas, A.; Abbeel, P. Curl: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020. [Google Scholar]
- Zhang, C.; Vinyals, O.; Munos, R.; Bengio, S. A study on overfitting in deep reinforcement learning. arXiv 2018, arXiv:1804.06893. [Google Scholar]
- Cobbe, K.; Klimov, O.; Hesse, C.; Kim, T.; Schulman, J. Quantifying generalization in reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Ma, G.; Wang, Z.; Yuan, Z.; Wang, X.; Yuan, B.; Tao, D. A comprehensive survey of data augmentation in visual reinforcement learning. arXiv 2022, arXiv:2210.04561. [Google Scholar]
- Hansen, N.; Wang, X. Generalization in reinforcement learning by soft data augmentation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
- Tassa, Y.; Doron, Y.; Muldal, A.; Erez, T.; Li, Y.; Casas, D.D.; Budden, D.; Abdolmaleki, A.; Merel, J.; Lefrancq, A.; et al. Deepmind control suite. arXiv 2018, arXiv:1801.00690. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Zhang, R.; Yang, S.; Zhang, Q.; Xu, L.; He, Y.; Zhang, F. Graph-based few-shot learning with transformed feature propagation and optimal class allocation. Neurocomputing 2022, 470, 247–256. [Google Scholar] [CrossRef]
- Ding, B.; Zhang, R.; Xu, L.; Liu, G.; Yang, S.; Liu, Y.; Zhang, Q. U2D2 Net: Unsupervised Unified Image Dehazing and Denoising Network for Single Hazy Image Enhancement. IEEE Trans. Multimed. 2023, 1–16. [Google Scholar] [CrossRef]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Oord, A.V.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Osband, I.; Aslanides, J.; Cassirer, A. Randomized prior functions for deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2018, 31, 8626–8638. [Google Scholar]
- Burda, Y.; Edwards, H.; Storkey, A.; Klimov, O. Exploration by random network distillation. arXiv 2018, arXiv:1810.12894. [Google Scholar]
- Lee, K.; Lee, K.; Shin, J.; Lee, H. Network randomization: A simple technique for generalization in deep reinforcement learning. arXiv 2019, arXiv:1910.05396. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010. [Google Scholar]
- Hansen, N.; Jangir, R.; Sun, Y.; Alenyà, G.; Abbeel, P.; Efros, A.A.; Pinto, L.; Wang, X. Self-supervised policy adaptation during deployment. arXiv 2020, arXiv:2007.04309. [Google Scholar]
- Laskin, M.; Lee, K.; Stooke, A.; Pinto, L.; Abbeel, P.; Srinivas, A. Reinforcement learning with augmented data. Adv. Neural Inf. Process. Syst. 2020, 33, 19884–19895. [Google Scholar]
- Kostrikov, I.; Yarats, D.; Fergus, R. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv 2020, arXiv:2004.13649. [Google Scholar]
Color-Hard | SAC | CURL | C2RL(0.8) | C2RL(0.2) | C2RL(+SP) |
---|---|---|---|---|---|
Walker, walk | 414 74 | 445 99 | 707 43 | 617 46 | 899 15 |
Walker, stand | 719 74 | 662 54 | 874 46 | 912 27 | 954 16 |
Cartpole, swingup | 592 50 | 454 110 | 790 59 | 375 39 | 794 20 |
Cartpole, balance | 857 60 | 782 13 | 921 15 | 970 22 | 978 12 |
Ball in cup, catch | 411 183 | 231 92 | 713 166 | 713 93 | 893 44 |
Finger, turn_easy | 270 43 | 202 32 | 438 95 | 454 133 | 464 111 |
Cheetah, run | 154 41 | 202 22 | 251 33 | 274 ± 13 | 292 5 |
Reacher, easy | 163 45 | 325 32 | 317 67 | 212 91 | 332 61 |
Video-Easy | SAC | CURL | C2RL(0.8) | C2RL(0.2) | C2RL(+SP) |
---|---|---|---|---|---|
Walker, walk | 616 80 | 556 133 | 784 34 | 689 46 | 948 15 |
Walker, stand | 899 53 | 852 75 | 766 47 | 891 35 | 969 23 |
Cartpole, swingup | 375 90 | 404 67 | 589 44 | 415 38 | 60016 |
Cartpole, balance | 693 109 | 850 91 | 926 13 | 942 18 | 948 12 |
Ball in cup, catch | 393 175 | 316 119 | 692 ± 85 | 643 93 | 747 79 |
Finger, turn_easy | 355 108 | 248 56 | 461 188 | 367 154 | 421 143 |
Cheetah, run | 194 30 | 154 50 | 287 21 | 234 32 | 265 24 |
Color-Hard | CURL | RAD | DrQ | PAD | C2RL + SP (Ours) |
---|---|---|---|---|---|
Walker, walk | 445 99 | 400 61 | 520 91 | 468 47 | 899 15 |
Walker, stand | 662 54 | 644 88 | 770 71 | 797 46 | 954 16 |
Cartpole, swingup | 454 110 | 590 53 | 586 52 | 630 63 | 794 20 |
Ball in cup, catch | 231 92 | 541 29 | 365 210 | 563 50 | 893 44 |
Video-Easy | CURL | RAD | DrQ | PAD | C2RL + SP (Ours) |
---|---|---|---|---|---|
Walker, walk | 556 133 | 606 63 | 682 89 | 717 79 | 948 15 |
Walker, stand | 852 75 | 745 146 | 873 ± 83 | 935 20 | 969 ± 23 |
Cartpole, swingup | 404 67 | 373 72 | 485 105 | 521 76 | 600 16 |
Ball in cup, catch | 316 119 | 481 26 | 318 157 | 436 55 | 747 19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, S.; Kim, J.; Jeong, H.-Y.; Kim, T.-K.; Yoo, J. C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation. Sensors 2023, 23, 4946. https://doi.org/10.3390/s23104946
Park S, Kim J, Jeong H-Y, Kim T-K, Yoo J. C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation. Sensors. 2023; 23(10):4946. https://doi.org/10.3390/s23104946
Chicago/Turabian StylePark, Sanghoon, Jihun Kim, Han-You Jeong, Tae-Kyoung Kim, and Jinwoo Yoo. 2023. "C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation" Sensors 23, no. 10: 4946. https://doi.org/10.3390/s23104946
APA StylePark, S., Kim, J., Jeong, H. -Y., Kim, T. -K., & Yoo, J. (2023). C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation. Sensors, 23(10), 4946. https://doi.org/10.3390/s23104946