A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework
Abstract
:1. Introduction
Reinforcement Learning
- State-value function for policy π (), which assigns the expected accumulative reward to each state as the agent will receive if the interaction starts in state and follows the policy π.
- Action-value function for policy π ()), which determines the expected accumulative reward for each action-state pair as the agent will receive if it starts the in state , takes action and follows the policy π thereafter.
2. Paper Research and Evaluation
2.1. Methodology
- To perform the bibliometric analysis, the Bibliometrix tool was used [43], which allows for the quick extraction of bibliographic information from a given bibliographic export. Such bibliographic export is obtained from a search engine of the academic publisher https://www.scopus.com (accessed on the 15 November 2022). To this end, the main research question must be formulated to define the bibliometric analysis: What are the main deep reinforcement learning approaches used in manufacturing processes? Thus, this question must be translated in a search through this engine. This search must be carried out with keywords and logic operators that delimit the search field on which this review focuses. The most accurate search was performed on 1 November 2022 with the following query: TITLE-ABS-KEY ((“DEEP REINFORCEMENT LEARNING” OR DRL OR “DEEP RL”) AND (MANUFACTURING OR ROBOTS OR “PRODUCTION SYSTEM” OR AUTOMATION) AND (“MODEL-FREE” OR “MODEL-BASED” OR “ON-POLICY” OR “OFF-POLICY”)).
- In this way, the result of the search will return documents that focus on the different approaches based on DRL algorithms and techniques in manufacturing processes, such as model-free, model-based, on-policy or off-policy algorithms. The list of scientific publications contains the necessary information to answer the previous research question.
- The next step involves creating the citation network, for which the bibliographic export is processed through a script programmed in R [44]. For its visualisation, an open-source software, Cytoscape [45] was employed. Furthermore, the methodology to remove unwanted information, based on Zuluaga et al.’s (2016) article [46] was implemented to achieve a cleaner visualisation of the results.
2.2. Bibliometric Analysis
2.3. Citation Network
2.4. Analysis of Results
3. Deep Reinforcement Learning
- In which state the planning starts;
- How many computational resources are assigned to it;
- Which optimisation or planning algorithm is used;
- What is the relation between the planning and the DRL algorithm.
- Forward model: based on the current state and the selected action by the agent, it estimates the next state;
- Backward model: a retrospective model that predicts which state and action led to the current state;
- Inverse model: it assesses which action makes moving from one state to another.
Integration in the Industry
- Dealing with high-dimensional problems and datasets with moderate effort;
- Capability to simplify potentially difficult outputs and establish an intuitive interaction with operators;
- Adapting to changes in the environment in a cost-effective manner, ideally with some degree of automation;
- Expanding the previous knowledge with the acquired experience;
- Ability to deal with available manufacturing data without particular needs for the initial capture of very detailed information;
- Capability to discover relevant intra- and inter-process relationships and, preferably, correlation and/or causation.
- Stability. In industrial RL applications, the sample efficiency of off-policy algorithms is desirable. However, these show an unstable performance in high-dimensional problems, which worsens if the state and action spaces are continuous. To mitigate this deficiency, two approaches predominate: (i) reducing the brittleness to hyperparameter tuning and (ii) avoiding local optima and delayed rewards. The former can be solved by using tools that optimise the selection of hyperparameters values, such as Optuna [106], or employing algorithms that internally optimise some hyperparameters, such as SAC [107]. The other approach can be addressed by stochastic policies, for example, introducing entropy maximisation such as SAC and improved exploration strategies [108].
- Sample efficiency. Learning better policies with less experience is key for efficient RL applications in industrial processes. This is because, in many cases, the data availability is limited, and it is preferable to train an algorithm in the shortest possible time. As stated before, among model-free DRL algorithms, off-policy algorithms are more sample efficient than on-policy ones. In addition, model-based algorithms have better performance, but obtaining an accurate model of the environment is often challenging in the industry. Other alternatives to enhance sample efficiency are input remapping, which is often implemented with high-dimensional observations [109], and offline training, which consists of training the algorithm with a simulated environment [110].
- Training with real processes. Albeit training directly with the real systems is possible, it is very time consuming and entails the wear and tear of robots and automatons [105]. Moreover, human supervision is needed to guarantee safety conditions. Therefore, simulated environments are used in practice, allowing the generation of much experience at a lower cost and faster training. Nonetheless, a real gap exists between simulated and real-world environments, making applying the policy learned during the training difficult [111]. Some techniques to overcome this issue are presented in Section 5.
- Sparse reward. Manufacturing tasks usually involve a large set of steps until reaching their goal. Generally, this is modelled with a zero-reward most of the time and a high reward at the end if the goal is reached [112]. This can discourage the agent in the exploration phase, thus attaining a poor performance. To this end, some solutions are aggregating demonstration data to the experience of the agent in a model-based RL algorithm to learn better models; including scripted policies to initialise the training, such as in QT-Opt [113] and reward shaping provides additional guidance to exploration, boosting the learning process.
- Reward function. The reward is the most important signal the agent receives because it guides the learning process [114]. For this reason, clearly specifying the goals and rewards is key to achieving a successful learning process. This becomes more complex as the task and the environment becomes more complicated, e.g., industrial environments and manufacturing tasks. To mitigate this problem, some alternatives are integrating intelligent sensors to provide more information, using heuristics techniques and replacing the reward function with a model that predicts that reward [115].
4. Deep Reinforcement Learning in the Production Industry
4.1. Path Planning
4.2. Process Control
4.3. Robotics
4.4. Scheduling
4.5. Maintenance
4.6. Energy Management
5. Simulation-to-Reality Transfer
- System identification. This is a wide field that involves creating a precise mathematical representation of the actual world in order to increase the realism of the simulation [222]. To this end, the observed data (inputs and outputs) and the knowledge of the process dynamics are used to build the model [223]. This approach has certain drawbacks, which are exacerbated when applied to industrial processes, such as the need for parameter calibration, data gathering and modelling of how external influences affect the operation of the robot (e.g., the wear-and-tear of its joints) [224]. Nonetheless, as observed in the aforementioned applications [69,158,159], the development of digital twin models is trending in the building of accurate mathematical models, becoming a fundamental pillar in smart manufacturing [225].
- Domain randomisation. To accommodate a wider variety of environmental setups and potential scenarios, simulation parameters are randomly generated [226]. This includes two groups of techniques that bridge the gap between the actual world and the virtual one. On the one hand, visual randomisation addresses the inclusion of randomness in the observation, such as changing the camera location, artificial light, and textures [227]. On the other hand, dynamics randomisation entails changing the simulator’s physical settings, such as item sizes, friction factors, or joint motor characteristics [228]. Further, the variation of these parameters is not simply random, and its distribution affects the performance of the algorithm [229]. Indeed, many variants are proposed to obtain more robust performance, such as Active Domain Randomisation [230] and Neural Posterior Domain Randomisation (NPDR) [231].
- Domain adaptation. It aims to harmonise the spaces in order to bridge the gap between the simulated and actual environments. Although action, reward and transition spaces are often similar in simulation and reality, state spaces show more pronounced differences [232]. This is mainly due to the tools for perception and feature extraction employed [233]. For this reason, this technique can be found in many applications as state adaptation. Among the most popular domain adaptation methods, the discrepancy-based [234], adversarial-based [235] and reconstruction-based approaches [236] are noteworthy.
Domain Randomisation | Domain Adaptation | System Identification | |
---|---|---|---|
Deep Reinforcement Learning for Robotic Control in High-Dexterity Assembly Tasks—A Reward Curriculum Approach [103] | ✓ | ||
Towards Real-World Force-Sensitive Robotic Assembly through Deep Reinforcement Learning in Simulations [154] | ✓ | ||
Variable Compliance Control for Robotic Peg-in-Hole Assembly: A Deep-Reinforcement-Learning Approach [104] | ✓ | ||
A flexible manufacturing assembly system with deep reinforcement learning [158] | ✓ | ||
A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping [159] | ✓ | ||
A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence [69] | ✓ | ||
Learning to Centralise Dual-Arm Assembly [237] | ✓ | ||
Sim-to-Real Visual Grasping via State Representation Learning Based on Combining Pixel-Level and Feature-Level Domain Adaptation [238] | ✓ | ||
Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks [110] | ✓ | ||
Preparing for the Unknown: Learning a Universal Policy with Online System Identification [239] | ✓ | ||
MANGA: Method Agnostic Neural-policy Generalisation and Adaptation [220] | ✓ | ||
Sim-to-real transfer reinforcement learning for control of thermal effects of an atmospheric pressure plasma jet [240] | ✓ | ||
Policy Transfer via Kinematic Domain Randomisation and Adaptation [241] | ✓ | ✓ | |
Latent Attention Augmentation for Robust Autonomous Driving Policies [242] | ✓ |
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
Abbreviations
A2C | Advantage Actor-Critic |
A3C | Asynchronous Advantage Actor-Critic |
AI | Artificial intelligence |
ANN | Artificial neural network |
AOD | Active object detection |
CNC | Computer numerical control |
CPS | Cyber-physical Systems |
CSTR | Continuous stirred tank reactor |
DDPG | Deep Deterministic Policy Gradient |
DDQN | Double DQN |
DNN | Deep neural network |
DQN | Deep Q-Networks |
DRL | Deep reinforcement learning |
FIFO | First-in first-out |
GDP | Gross domestic product |
HER | Hindsight experience replay |
HRL | Hierarchical reinforcement learning |
I2A | Imagination-Augmented Agents |
I4.0 | Industry 4.0 |
I5.0 | Industry 5.0 |
IoT | Internet of Things |
MDP | Markov Decision Process |
ME-MADDPG | Mixed Experience Multi-agent DDPG |
ML | Machine learning |
NPDR | Neural Posterior Domain Randomisation |
PCB | Printed circuit board |
P-DQN | Prioritised DQN |
PER | Priority experience replay |
PID | Proportional-integral-derivative |
PO | Policy optimisation |
PPO | Proximal policy optimisation |
PRM | Probabilistic Roadmap |
QR-DQN | Quantile Regression DQN |
RL | Reinforcement learning |
SAC | Soft Actor-Critic |
SSTGCN | Social Spatial–Temporal Graph Convolution Network |
TD3 | Twin Delayed Deep Deterministic Policy Gradient |
References
- Pereira, A.; Romero, F. A review of the meanings and the implications of the Industry 4.0 concept. Procedia Manuf. 2017, 13, 1206–1214. [Google Scholar] [CrossRef]
- Lasi, H.; Fettke, P.; Kemper, H.G.; Feld, T.; Hoffmann, M. Industry 4.0. Bus. Inf. Syst. Eng. 2014, 6, 239–242. [Google Scholar] [CrossRef]
- Meena, M.; Wangtueai, S.; Mohammed Sharafuddin, A.; Chaichana, T. The Precipitative Effects of Pandemic on Open Innovation of SMEs: A Scientometrics and Systematic Review of Industry 4.0 and Industry 5.0. J. Open Innov. Technol. Mark. Complex. 2022, 8, 152. [Google Scholar] [CrossRef]
- Industry 5.0—Publications Office of the EU. Available online: https://op.europa.eu/en/publication-detail/-/publication/468a892a-5097-11eb-b59f-01aa75ed71a1/ (accessed on 10 October 2022).
- Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0—Inception, conception and perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
- Crnjac, Z.M.; Mladineo, M.; Gjeldum, N.; Celent, L. From Industry 4.0 towards Industry 5.0: A Review and Analysis of Paradigm Shift for the People, Organization and Technology. Energies 2022, 15, 5221. [Google Scholar] [CrossRef]
- The World Bank. Manufacturing, Value Added (% of GDP)—World|Data. 2021. Available online: https://data.worldbank.org/indicator/NV.IND.MANF.ZS (accessed on 11 October 2022).
- The World Bank. Manufacturing, Value Added (% of GDP)—European Union|Data. 2021. Available online: https://data.worldbank.org/indicator/NV.IND.MANF.ZS?locations=EU&name_desc=false (accessed on 11 October 2022).
- Yin, R. Concept and Theory of Dynamic Operation of the Manufacturing Process. In Theory and Methods of Metallurgical Process Integration; Academic Press: Cambridge, MA, USA, 2016; pp. 13–53. [Google Scholar] [CrossRef]
- Stavropoulos, P.; Chantzis, D.; Doukas, C.; Papacharalampopoulos, A.; Chryssolouris, G. Monitoring and Control of Manufacturing Processes: A Review. Procedia CIRP 2013, 8, 421–425. [Google Scholar] [CrossRef] [Green Version]
- Wuest, T.; Weimer, D.; Irgens, C.; Thoben, K.D. Machine learning in manufacturing: Advantages, challenges, and applications. Prod. Manuf. Res. 2016, 4, 23–45. [Google Scholar] [CrossRef] [Green Version]
- Panzer, M.; Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 2021, 60, 4316–4341. [Google Scholar] [CrossRef]
- Maddikunta, P.K.R.; Pham, Q.-V.; B, P.; Deepa, N.; Dev, K.; Gadekallu, T.R.; Ruby, R.; Liyanage, M. Industry 5.0: A survey on enabling technologies and potential applications. J. Ind. Inf. Integr. 2022, 26, 100257. [Google Scholar] [CrossRef]
- Bigan, C. Trends in Teaching Artificial Intelligence for Industry 5.0. In Sustainability and Innovation in Manufacturing Enterprises; Springer: Singapore, 2022; pp. 257–274. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Finitie Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, 2020; pp. 47–68. Available online: http://incompleteideas.net/book/RLbook2020.pdf (accessed on 17 September 2022).
- Virvou, M.; Alepis, E.; Tsihrintzis, G.A.; Jain, L.C. Machine Learning Paradigms; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Coursera. 3 Types of Machine Learning You Should Know. 2022. Available online: https://www.coursera.org/articles/types-of-machine-learning (accessed on 5 November 2022).
- Wiering, M.; Otterlo, M. Reinforcement learning. Adaptation, learning, and optimization. In Reinforcement Learning State-of-the-Art; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
- Bellman, R.E. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. In Reinforcement Learning; Springer: Berlin, Germany, 2012; Volume 12. [Google Scholar] [CrossRef]
- Dayan, P.; Balleine, B.W. Reward, motivation, and reinforcement learning. Neuron 2002, 36, 285–298. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hayes, C.F.; Rădulescu, R.; Bargiacchi, E.; Källström, J.; Macfarlane, M.; Reymond, M. A practical guide to multi-objective reinforcement learning and planning. Auton. Agents Multi-Agent Syst. 2022, 36, 1–59. [Google Scholar] [CrossRef]
- Yogeswaran, M.; Ponnambalam, S.G. Reinforcement learning: Exploration-exploitation dilemma in multi-agent foraging task. OPSEARCH 2012, 49, 223–236. [Google Scholar] [CrossRef]
- Coggan, M. Exploration and exploitation in reinforcement learning. In CRA-W DMP Project; Working Paper of the Research Supervised by Prof. Doina Precup; McGill University: Montreal, QC, Canada, 2004. [Google Scholar]
- Mcfarlane, R. A Survey of Exploration Strategies in Reinforcement Learning. J. Mach. Learn. Res. 2018, 1, 10. [Google Scholar]
- Furelos-Blanco, D.; Law, M.; Jonsson, A.; Broda, K.; Russo, A. Induction and exploitation of subgoal automata for reinforcement learning. J. Artif. Intell. Res. 2021, 70, 1031–1116. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Perez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4909–4926. [Google Scholar] [CrossRef]
- Polvara, R.; Patacchiola, M.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R.; Cangelosi, A. Autonomous quadrotor landing using deep reinforcement learning. ArXiv 2017, arXiv:1709.03339. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Lee, S.; Bang, H. Automatic Gain Tuning Method of a Quad-Rotor Geometric Attitude Controller Using A3C. Int. J. Aeronaut. Space Sci. 2020, 21, 469–478. [Google Scholar] [CrossRef]
- Laud, A.D. Theory and Application of Reward Shaping in Reinforcement Learning. Ph.D. Dissertation, University of Illinois, Urbana-Champaign IL, USA, 2004. [Google Scholar]
- Marom, O.; Rosman, B. Belief Reward Shaping in Reinforcement Learning. Proc. AAAI Conf. Artif. Intell. 2018, 32, 3762–3769. [Google Scholar] [CrossRef]
- Clark, J.; Amodei, D. Faulty Reward Functions in the Wild. 2016. Available online: https://openai.com/blog/faulty-reward-functions/ (accessed on 9 November 2022).
- Irpan, A. Deep Reinforcement Learning Doesn’t Work Yet. 2018. Available online: https://www.alexirpan.com/2018/02/14/rl-hard.html (accessed on 9 November 2022).
- Ladosz, P.; Weng, L.; Kim, M.; Oh, H. Exploration in deep reinforcement learning: A survey. Inf. Fusion 2022, 85, 1–22. [Google Scholar] [CrossRef]
- Asiain, E.; Clempner, J.B.; Poznyak, A.S. Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies. Soft Comput. 2019, 23, 3591–3604. [Google Scholar] [CrossRef]
- Schäfer, L.; Christianos, F.; Hanna, J.; Albrecht, S.V. Decoupling exploration and exploitation in reinforcement learning. ArXiv 2021, arXiv:2107.08966. [Google Scholar]
- Chen, W.H. Perspective view of autonomous control in unknown environment: Dual control for exploitation and exploration vs reinforcement learning. Neurocomputing 2022, 497, 50–63. [Google Scholar] [CrossRef]
- Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
- McLaren, C.D.; Bruner, M.W. Citation network analysis. Int. Rev. Sport Exerc. Psychol. 2022, 15, 179–198. [Google Scholar] [CrossRef]
- Shi, Y.; Blainey, S.; Sun, C.; Jing, P. A literature review on accessibility using bibliometric analysis techniques. J. Transp. Geogr. 2020, 87, 102810. [Google Scholar] [CrossRef]
- Aria, M.; Cuccurullo, C. Bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- R-Project. Available online: https://www.r-project.org (accessed on 1 November 2022).
- Cytoscape. Available online: https://cytoscape.org (accessed on 1 November 2022).
- Zuluaga, M.; Robledo, S.; Osorio-Zuluaga, G.A.; Yathe, L.; Gonzalez, D.; Taborda, G. Metabolomics and pesticides: Systematic literature review using graph theory for analysis of references. Nova 2016, 14, 121–138. [Google Scholar] [CrossRef]
- Thakur, D.; Wang, J.; Cozzens, S. What does international co-authorship measure? In Proceedings of the 2011 Atlanta Conference on Science and Innovation Policy, Atlanta, GA, USA, 15–17 September 2011. [Google Scholar] [CrossRef]
- Khor, K.A.; Yu, L.G. Influence of international co-authorship on the research citation impact of young universities. Scientometrics 2016, 107, 1095–1110. [Google Scholar] [CrossRef] [Green Version]
- Nash-Stewart, C.E.; Kruesi, L.M.; del Mar, C.B. Does Bradford’s Law of Scattering predict the size of the literature in Cochrane Reviews? J. Med. Libr. Assoc. 2012, 100, 135. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-End Training of Deep Visuomotor Policies. J. Mach. Learn. Res. 2016, 17, 1–40. [Google Scholar]
- Uhlenbeck, G.E.; Ornstein, L.S. On the Theory of the Brownian Motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
- Maciejewski, A.A.; Klein, C.A. Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments. Int. J. Robot. Res. 1985, 4, 109–117. [Google Scholar] [CrossRef] [Green Version]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An Introduction to Deep Reinforcement Learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef] [Green Version]
- Chen, L. Deep reinforcement learning. In Deep Learning and Practice with MindSpore; Springer: Singapore, 2021. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Sewak, M. Deep Reinforcement Learning—Frontiers of Artificial Intelligence, 1st ed.; Springer: Singapore, 2019. [Google Scholar]
- Yang, Y.; Kayid, A.; Mehta, D. State-of-the-Art Reinforcement Learning Algorithms. IJERT J. Int. J. Eng. Res. Technol. 2020, 8, 6. [Google Scholar]
- Moerland, T.M.; Broekens, J.; Jonker, C.M. Model-based Reinforcement Learning: A Survey. arXiv 2020, arXiv:2006.16712. [Google Scholar]
- Kaiser, Ł.; Babaeizadeh, M.; Miłos, P.; Osinski, B.; Campbell, R.H.; Czechowski, K.; Erhan, D.; Finn, C.; Kozakowski, P.; Levine, S.; et al. Model-Based Reinforcement Learning for Atari. In Proceedings of the International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Plaat, A.; Kosters, W.; Preuss, M. Deep model-based reinforcement learning for high-dimensional problems, a survey. arXiv 2020, arXiv:2008.05598. [Google Scholar]
- Janner, M.; Fu, J.; Zhang, M.; Levine, S. When to trust your model: Model-based policy optimization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Wang, T.; Bao, X.; Clavera, I.; Hoang, J.; Wen, Y.; Langlois, E.; Zhang, S.; Zhang, G.; Abbeel, P.; Ba, J. Benchmarking model-based reinforcement learning. arXiv 2019, arXiv:1907.02057. [Google Scholar]
- Sun, W.; Jiang, N.; Krishnamurthy, A.; Agarwal, A.; Langford, J. Model-based RL in contextual decision processes: PAC bounds and exponential improvements over model-free approaches. In Proceedings of the Thirty-Second Conference on Learning Theory, Phoenix, AZ, USA, 25–28 June 2019. [Google Scholar]
- Luo, F.-M.; Xu, T.; Lai, H.; Chen, X.-H.; Zhang, W.; Yu, Y. A survey on model-based reinforcement learning. arXiv 2022, arXiv:2206.09328. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of Go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Feng, D.; Gomes, C.P.; Selman, B. Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), Yokohama, Japan, 7–15 January 2021; pp. 2198–2205. [Google Scholar]
- Matulis, M.; Harvey, C. A robot arm digital twin utilising reinforcement learning. Comput. Graph. 2021, 95, 106–114. [Google Scholar] [CrossRef]
- Xia, K.; Sacco, C.; Kirkpatrick, M.; Saidy, C.; Nguyen, L.; Kircaliali, A.; Harik, R. A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence. J. Manuf. Syst. 2021, 58, 210–230. [Google Scholar] [CrossRef]
- Wiering, M.A.; Withagen, M.; Drugan, M.M. Model-based multi-objective reinforcement learning. In Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Orlando, FL, USA, 9–12 December 2014. [Google Scholar] [CrossRef] [Green Version]
- Kurutach, T.; Clavera, I.; Duan, Y.; Tamar, A.; Abbeel, P. METRPO: Model-ensemble trust-region policy optimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Rajeswaran, A.; Mordatch, I.; Kumar, V. A game theoretic framework for model based reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual, 13–18 July 2020. [Google Scholar]
- Shen, J.; Zhao, H.; Zhang, W.; Yu, Y. Model-based policy optimization with unsupervised model adaptation. Adv. Neural Inf. Process. Syst. 2020, 33, 2823–2834. [Google Scholar]
- Ha, D.; Schmidhuber, J. World Models. Forecast. Bus. Econ. 2018, 201–209. [Google Scholar] [CrossRef]
- Racanière, S.; Weber, T.; Reichert, D.; Buesing, L.; Guez, A.; Rezende, D.J.; Badia, A.P.; Vinyals, O.; Heess, N.; Li, Y.; et al. Imagination-Augmented Agents for Deep Reinforcement Learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5691–5702. [Google Scholar]
- Edwards, A.D.; Downs, L.; Davidson, J.C. Forward-backward reinforcement learning. arXiv 2018, arXiv:1803.10227. [Google Scholar]
- van Hasselt, H.; Hessel, M.; Aslanides, J. When to use parametric models in reinforcement learning? In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Ramírez, J.; Yu, W.; Perrusquía, A. Model-free reinforcement learning from expert demonstrations: A survey. Artif. Intell. Rev. 2022, 55, 3213–3241. [Google Scholar] [CrossRef]
- Otto, F. Model-Free Deep Reinforcement Learning—Algorithms and Applications. In Reinforcement Learning Algorithms: Analysis and Applications; Springer: Cham, Switzerland, 2021; Volume 883. [Google Scholar] [CrossRef]
- Hausknecht, M.; Stone, P.; Mc, O. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), New York City, NY, USA, 9–15 July 2016. [Google Scholar]
- Tan, Z.; Karakose, M. On-Policy Deep Reinforcement Learning Approach to Multi Agent Problems. In Interdisciplinary Research in Technology and Management; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Raichuk, A.; Stańczyk, P.; Orsini, M.; Girgin, S.; Marinier, R.; Hussenot, L.; Geist, M.; Pietquin, O.; Michalski, M.; et al. What matters in on-policy reinforcement learning? a large-scale empirical study. arXiv 2020, arXiv:2006.05990. [Google Scholar]
- Agarwal, R.; Schuurmans, D.; Norouzi, M. Striving for Simplicity in Off-Policy Deep Reinforcement Learning. 2019. Available online: https://openreview.net/forum?id=ryeUg0VFwr (accessed on 14 October 2022).
- Zimmer, M.; Boniface, Y.; Dutech, A. Off-Policy Neural Fitted Actor-Critic. In Proceedings of the Deep Reinforcement Learning Workshop (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Fujimoto, S.; Meger, D.; Precup, D. Off-policy deep reinforcement learning without exploration. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Clemente, A.V.; Castejón, H.N.; Chandra, A. Efficient Parallel Methods for Deep Reinforcement Learning. arXiv 2017, arXiv:1705.04862. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Openai, O.K. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Huang, Y. Deep Q-networks. In Deep Reinforcement Learning: Fundamentals, Research and Applications; Dong, H., Ding, Z., Zhang, S., Eds.; Springer Nature: Singapore, 2020. [Google Scholar]
- Dabney, W.; Rowland, M.; Bellemare, M.G.; Munos, R. Distributional Reinforcement Learning with Quantile Regression. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, Hilton New Orleans Riverside, New Orleans, LA, USA, 2–7 February 2018; pp. 2892–2901. [Google Scholar]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight Experience Replay. Available online: https://goo.gl/SMrQnI (accessed on 4 October 2022).
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. January 2018. Available online: http://arxiv.org/abs/1801.01290 (accessed on 4 October 2022).
- Casas, N. Deep deterministic policy gradient for urban traffic light control. arXiv 2017, arXiv:1703.09035. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; pp. 2587–2601. [Google Scholar]
- Saeed, M.; Nagdi, M.; Rosman, B.; Ali, H.H.S.M. Deep Reinforcement Learning for Robotic Hand Manipulation. In Proceedings of the 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE 2020), Khartoum, Sudan, 26 February–1 March 2021. [Google Scholar] [CrossRef]
- Serrano-Ruiz, J.C.; Mula, J.; Poler, R. Smart manufacturing scheduling: A literature review. J. Manuf. Syst. 2021, 61, 265–287. [Google Scholar] [CrossRef]
- Kuo, R.J.; Cohen, P.H. Manufacturing process control through integration of neural networks and fuzzy model. Fuzzy Sets Syst. 1998, 98, 15–31. [Google Scholar] [CrossRef]
- Chien, C.F.; Dauzère-Pérès, S.; Huh, W.T.; Jang, Y.J.; Morrison, J.R. Artificial intelligence in manufacturing and logistics systems: Algorithms, applications, and case studies. Int. J. Prod. Res. 2020, 58, 2730–2731. [Google Scholar] [CrossRef]
- Morgan, J.; Halton, M.; Qiao, Y.; Breslin, J.G. Industry 4.0 smart reconfigurable manufacturing machines. J. Manuf. Syst. 2020, 59, 481–506. [Google Scholar] [CrossRef]
- Oliff, H.; Liu, Y.; Kumar, M.; Williams, M.; Ryan, M. Reinforcement learning for facilitating human-robot-interaction in manufacturing. J. Manuf. Syst. 2020, 56, 326–340. [Google Scholar] [CrossRef]
- Lin, C.C.; Deng, D.J.; Chih, Y.L.; Chiu, H.T. Smart Manufacturing Scheduling with Edge Computing Using Multiclass Deep Q Network. IEEE Trans. Ind. Inform. 2019, 15, 4276–4284. [Google Scholar] [CrossRef]
- Rodríguez, M.L.R.; Kubler, S.; de Giorgio, A.; Cordy, M.; Robert, J.; Le Traon, Y. Multi-agent deep reinforcement learning based Predictive Maintenance on parallel machines. Robot. Comput. Integr. Manuf. 2022, 78, 102406. [Google Scholar] [CrossRef]
- Leyendecker, L.; Schmitz, M.; Zhou, H.A.; Samsonov, V.; Rittstieg, M.; Lutticke, D. Deep Reinforcement Learning for Robotic Control in High-Dexterity Assembly Tasks-A Reward Curriculum Approach. In Proceedings of the 2021 Fifth IEEE International Conference on Robotic Computing (IRC), Taichung, Taiwan, 15–17 November 2021; pp. 35–42. [Google Scholar] [CrossRef]
- Beltran-Hernandez, C.C.; Petit, D.; Ramirez-Alpizar, I.G.; Harada, K. Variable Compliance Control for Robotic Peg-in-Hole Assembly: A Deep-Reinforcement-Learning Approach. Appl. Sci. 2020, 10, 6923. [Google Scholar] [CrossRef]
- Ibarz, J.; Tan, J.; Finn, C.; Kalakrishnan, M.; Pastor, P.; Levine, S. How to train your robot with deep reinforcement learning: Lessons we have learned. Int. J. Robot. Res. 2021, 40, 698–721. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the Proceedings of the 25th International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Yang, T.; Tang, H.; Bai, C.; Liu, J.; Hao, J.; Meng, Z.; Liu, P.; Wang, Z. Exploration in Deep Reinforcement Learning: A Comprehensive Survey. arXiv 2021, arXiv:2109.06668. [Google Scholar]
- He, L.; Aouf, N.; Whidborne, J.F.; Song, B. Integrated moment-based LGMD and deep reinforcement learning for UAV obstacle avoidance. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 7491–7497. [Google Scholar] [CrossRef]
- Aumjaud, P.; McAuliffe, D.; Rodríguez-Lera, F.J.; Cardiff, P. Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks. Adv. Intell. Syst. Comput. 2021, 1285, 318–331. [Google Scholar] [CrossRef]
- Salvato, E.; Fenu, G.; Medvet, E.; Pellegrino, F.A. Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access 2021, 9, 153171–153187. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Frontiers. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, 2020; pp. 459–475. Available online: http://incompleteideas.net/book/RLbook2020.pdf (accessed on 1 October 2022).
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. arXiv 2018, arXiv:1806.10293. [Google Scholar]
- Matignon, L.; Laurent, G.J.; le Fort-Piat, N. Reward function and initial values: Better choices for accelerated goal-directed reinforcement learning. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 2006; pp. 840–849. [Google Scholar] [CrossRef]
- Eschmann, J. Reward Function Design in Reinforcement Learning. Stud. Comput. Intell. 2021, 883, 25–33. [Google Scholar] [CrossRef]
- Lee, J.; Bagheri, B.; Kao, H.A. A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems. Manuf. Lett. 2015, 3, 18–23. [Google Scholar] [CrossRef]
- OpenAI. Available online: https://openai.com (accessed on 1 November 2022).
- DeepMind. Available online: https://www.deepmind.com (accessed on 1 November 2022).
- Azeem, M.; Haleem, A.; Javaid, M. Symbiotic Relationship between Machine Learning and Industry 4.0: A Review. J. Ind. Integr. Manag. 2021, 7. [Google Scholar] [CrossRef]
- Nguyen, H.; La, H. Review of Deep Reinforcement Learning for Robot Manipulation. In Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019. [Google Scholar] [CrossRef]
- Liu, Y.; Ping, Y.; Zhang, L.; Wang, L.; Xu, X. Scheduling of decentralized robot services in cloud manufacturing with deep reinforcement learning. Robot. Comput.-Integr. Manuf. 2023, 80, 102454. [Google Scholar] [CrossRef]
- Xing, Q.; Chen, Z.; Zhang, T.; Li, X.; Sun, K.Y. Real-time optimal scheduling for active distribution networks: A graph reinforcement learning method. Int. J. Electr. Power Energy Syst. 2023, 145, 108637. [Google Scholar] [CrossRef]
- Rupprecht, T.; Wang, Y. A survey for deep reinforcement learning in markovian cyber–physical systems: Common problems and solutions. Neural Netw. 2022, 153, 13–26. [Google Scholar] [CrossRef] [PubMed]
- Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review. J. Mod. Power Syst. Clean Energy 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
- Sun, Y.; Jia, J.; Xu, J.; Chen, M.; Niu, J. Path, feedrate and trajectory planning for free-form surface machining: A state-of-the-art review. Chin. J. Aeronaut. 2022, 35, 12–29. [Google Scholar] [CrossRef]
- Sánchez-Ibáñez, J.R.; Pérez-Del-Pulgar, C.J.; García-Cerezo, A. Path planning for autonomous mobile robots: A review. Sensors 2021, 21, 7898. [Google Scholar] [CrossRef]
- Jiang, J.; Ma, Y. Path planning strategies to optimize accuracy, quality, build time and material use in additive manufacturing: A review. Micromachines 2020, 11, 633. [Google Scholar] [CrossRef]
- Patle, B.; L, G.B.; Pandey, A.; Parhi, D.; Jagadeesh, A. A review: On path planning strategies for navigation of mobile robot. Def. Technol. 2019, 15, 582–606. [Google Scholar] [CrossRef]
- Qiu, T.; Cheng, Y. Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning. J. Electron. Res. Appl. 2021, 5, 25–29. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, Y.; Wang, S. A Review of Mobile Robot Path Planning Based on Deep Reinforcement Learning Algorithm. J. Phys. Conf. Ser. 2021, 2138, 012011. [Google Scholar] [CrossRef]
- Huo, Q. Multi-objective vehicle path planning based on DQN. In Proceedings of the International Conference on Cloud Computing, Performance Computing, and Deep Learning (CCPCDL 2022), Wuhan, China, 11–13 March 2022; p. 12287. [Google Scholar]
- Wang, J.; Zhang, T.; Ma, N.; Li, Z.; Ma, H.; Meng, F.; Meng, M.Q. A survey of learning-based robot motion planning. IET Cyber-Syst. Robot. 2021, 3, 302–314. [Google Scholar] [CrossRef]
- Fang, F.; Liang, W.; Wu, Y.; Xu, Q.; Lim, J.-H. Self-Supervised Reinforcement Learning for Active Object Detection. IEEE Robot. Autom. Lett. 2022, 7, 10224–10231. [Google Scholar] [CrossRef]
- Lv, L.; Zhang, S.; Ding, D.; Wang, Y. Path Planning via an Improved DQN-Based Learning Policy. IEEE Access 2019, 7, 67319–67330. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, Z.; Li, Y.; Lu, M.; Chen, C.; Zhang, X. Robot Search Path Planning Method Based on Prioritized Deep Reinforcement Learning. Int. J. Control. Autom. Syst. 2022, 20, 2669–2680. [Google Scholar] [CrossRef]
- Wang, Y.; Fang, Y.; Lou, P.; Yan, J.; Liu, N. Deep Reinforcement Learning based Path Planning for Mobile Robot in Unknown Environment. J. Phys. Conf. Ser. 2020, 1576, 012009. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhu, P.; Zeng, Z.; Xiao, J.; Lu, H.; Zhou, Z. Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning and Online Planning. Appl. Intell. 2022, 52, 15600–15616. [Google Scholar] [CrossRef]
- Lu, Y.; Ruan, X.; Huang, J. Deep Reinforcement Learning Based on Social Spatial–Temporal Graph Convolution Network for Crowd Navigation. Machines 2022, 10, 703. [Google Scholar] [CrossRef]
- Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef]
- Wu, D.; Wan, K.; Gao, X.; Hu, Z. Multiagent Motion Planning Based on Deep Reinforcement Learning in Complex Environments. In Proceedings of the 2021 6th International Conference on Control and Robotics Engineering (ICCRE 2021), Beijing, China, 16–18 April 2021; pp. 123–128. [Google Scholar] [CrossRef]
- Nolan, D.P. Process Controls. In Handbook of Fire and Explosion Protection Engineering Principles, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2011; pp. 113–118. [Google Scholar] [CrossRef]
- Karigiannis, J.N.; Laurin, P.; Liu, S.; Holovashchenko, V.; Lizotte, A.; Roux, V.; Boulet, P. Reinforcement Learning Enabled Self-Homing of Industrial Robotic Manipulators in Manufacturing. Manuf. Lett. 2022, 33, 909–918. [Google Scholar] [CrossRef]
- Szarski, M.; Chauhan, S. Composite temperature profile and tooling optimization via Deep Reinforcement Learning. Compos. Part A Appl. Sci. Manuf. 2021, 142, 106235. [Google Scholar] [CrossRef]
- Deng, J.; Sierla, S.; Sun, J.; Vyatkin, V. Reinforcement learning for industrial process control: A case study in flatness control in steel industry. Comput. Ind. 2022, 143, 103748. [Google Scholar] [CrossRef]
- Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
- Fusayasu, H.; Heya, A.; Hirata, K. Robust control of three-degree-of-freedom spherical actuator based on deep reinforcement learning. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 749–756. [Google Scholar] [CrossRef]
- Ma, Y.; Zhu, W.; Benton, M.G.; Romagnoli, J. Continuous control of a polymerization system with deep reinforcement learning. J. Process. Control. 2019, 75, 40–47. [Google Scholar] [CrossRef]
- Neumann, M.; Palkovits, D.S. Reinforcement Learning Approaches for the Optimization of the Partial Oxidation Reaction of Methane. Ind. Eng. Chem. Res. 2022, 61, 3910–3916. Available online: https://doi.org/10.1021/ACS.IECR.1C04622/ASSET/IMAGES/LARGE/IE1C04622_0010.JPEG (accessed on 31 October 2022).
- Yifei, Y.; Lakshminarayanan, S. Multi-Agent Reinforcement Learning System for Multiloop Control of Chemical Processes. In Proceedings of the 2022 IEEE International Symposium on Advanced Control of Industrial Processes (AdCONIP), Vancouver, BC, Canada, 7–9 August 2022; pp. 48–53. [Google Scholar] [CrossRef]
- Dutta, D.; Upreti, S.R. Upreti. A survey and comparative evaluation of actor-critic methods in process control. Can. J. Chem. Eng. 2022, 100, 2028–2056. [Google Scholar] [CrossRef]
- Suomalainen, M.; Karayiannidis, Y.; Kyrki, V. A survey of robot manipulation in contact. Robot. Auton. Syst. 2022, 156, 104224. [Google Scholar] [CrossRef]
- Mohammed, M.Q.; Kwek, L.C.; Chua, S.C.; Al-Dhaqm, A.; Nahavandi, S.; Eisa, T.A.E.; Miskon, M.F.; Al-Mhiqani, M.N.; Ali, A.; Abaker, M.; et al. Review of Learning-Based Robotic Manipulation in Cluttered Environments. Sensors 2022, 22, 7938. [Google Scholar] [CrossRef]
- Zhou, Z.; Ni, P.; Zhu, X.; Cao, Q. Compliant Robotic Assembly based on Deep Reinforcement Learning. In Proceedings of the 2021 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Chongqing, China, 9–11 July 2021. [Google Scholar] [CrossRef]
- Hebecker, M.; Lambrecht, J.; Schmitz, M. Towards real-world force-sensitive robotic assembly through deep reinforcement learning in simulations. In Proceedings of the 2021 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Delft, The Netherlands, 12–16 July 2021; pp. 1045–1051. [Google Scholar] [CrossRef]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv 2020, arXiv:2003.04960. [Google Scholar]
- Bosch, A.V.D.; Hengst, B.; Lloyd, J.; Miikkulainen, R.; Blockeel, H. Hierarchical Reinforcement Learning. In Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2011; pp. 495–502. [Google Scholar] [CrossRef]
- Wang, C.; Lin, C.; Liu, B.; Su, C.; Xu, P.; Xie, L. Deep Reinforcement Learning with Shaping Exploration Space for Robotic Assembly. In Proceedings of the 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), Changzhou, China, 24–26 September 2021. [Google Scholar] [CrossRef]
- Li, J.; Pang, D.; Zheng, Y.; Guan, X.; Le, X. A flexible manufacturing assembly system with deep reinforcement learning. Control Eng. Pract. 2022, 118, 104957. [Google Scholar] [CrossRef]
- Liu, Y.; Xu, H.; Liu, D.; Wang, L. Wang. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping. Robot. Comput. Integr. Manuf. 2022, 78, 102365. [Google Scholar] [CrossRef]
- Lobbezoo, A.; Qian, Y.; Kwon, H.-J. Reinforcement Learning for Pick and Place Operations in Robotics: A Survey. Robotics 2021, 10, 105. [Google Scholar] [CrossRef]
- Zeng, R.; Liu, M.; Zhang, J.; Li, X.; Zhou, Q.; Jiang, Y. Manipulator Control Method Based on Deep Reinforcement Learning. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 415–420. [Google Scholar] [CrossRef]
- Dai, J.; Zhu, M.; Feng, Y. Stiffness Control for a Soft Robotic Finger based on Reinforcement Learning for Robust Grasping. In Proceedings of the 2021 27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Shanghai, China, 26–28 November 2021; pp. 540–545. [Google Scholar] [CrossRef]
- Marzari, L.; Pore, A.; Dall’Alba, D.; Aragon-Camarasa, G.; Farinelli, A.; Fiorini, P. Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks. In Proceedings of the 2021 20th International Conference on Advanced Robotics (ICAR 2021), Virtual Event, 6–10 December 2021; pp. 640–645. [Google Scholar]
- Kim, M.; Han, D.-K.; Park, J.-H.; Kim, J.-S. Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay. Appl. Sci. 2020, 10, 575. [Google Scholar] [CrossRef] [Green Version]
- Shahid, A.A.; Piga, D.; Braghin, F.; Roveda, L. Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning. Auton. Robot. 2022, 46, 483–498. [Google Scholar] [CrossRef]
- Wang, L.; Pan, Z.; Wang, J. A Review of Reinforcement Learning Based Intelligent Optimization for Manufacturing Scheduling. Complex Syst. Model. Simul. 2022, 1, 257–270. [Google Scholar] [CrossRef]
- Prashar, A.; Tortorella, G.L.; Fogliatto, F.S. Production scheduling in Industry 4.0: Morphological analysis of the literature and future research agenda. J. Manuf. Syst. 2022, 65, 33–43. [Google Scholar] [CrossRef]
- Rosenberger, J.; Urlaub, M.; Rauterberg, F.; Lutz, T.; Selig, A.; Bühren, M.; Schramm, D. Deep Reinforcement Learning Multi-Agent System for Resource Allocation in Industrial Internet of Things. Sensors 2022, 22, 4099. [Google Scholar] [CrossRef]
- Hu, C.; Wang, Q.; Gong, W.; Yan, X. Multi-objective deep reinforcement learning for emergency scheduling in a water distribution network. Memetic Comput. 2022, 14, 211–223. [Google Scholar] [CrossRef]
- Baer, S.; Bakakeu, J.; Meyes, R.; Meisen, T. Multi-agent reinforcement learning for job shop scheduling in flexible manufacturing systems. In Proceedings of the 2019 Second International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills, CA, USA, 25–27 September 2019. [Google Scholar] [CrossRef]
- Esteso, A.; Peidro, D.; Mula, J.; Díaz-Madroñero, M. Reinforcement learning applied to production planning and control. Int. J. Prod. Res. 2022. [Google Scholar] [CrossRef]
- Liu, L.; Zhu, J.; Chen, J.; Ye, H. Cooperative optimal scheduling strategy of source and storage in microgrid based on soft actor-critic. Dianli Zidonghua Shebei/Electr. Power Autom. Equip. 2022, 42. [Google Scholar] [CrossRef]
- Andreiana, D.S.; Galicia, L.E.A.; Ollila, S.; Guerrero, C.L.; Roldán, Á.O.; Navas, F.D.; Torres, A.D.R. Steelmaking Process Optimised through a Decision Support System Aided by Self-Learning Machine Learning. Processes 2022, 10, 434. [Google Scholar] [CrossRef]
- Roldán, Á.O.; Gassner, G.; Schlautmann, M.; Galicia, L.E.A.; Andreiana, D.S.; Heiskanen, M.; Guerrero, C.L.; Navas, F.D.; Torres, A.D.R. Optimisation of Operator Support Systems through Artificial Intelligence for the Cast Steel Industry: A Case for Optimisation of the Oxygen Blowing Process Based on Machine Learning Algorithms. J. Manuf. Mater. Process. 2022, 6, 34. [Google Scholar] [CrossRef]
- Fu, F.; Kang, Y.; Zhang, Z.; Yu, F.R. Transcoding for live streaming-based on vehicular fog computing: An actor-critic DRL approach. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020. [Google Scholar] [CrossRef]
- Xu, Y.; Zhao, J. Actor-Critic with Transformer for Cloud Computing Resource Three Stage Job Scheduling. In Proceedings of the 2022 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 22–24 April 2022; pp. 33–37. [Google Scholar] [CrossRef]
- Fu, F.; Kang, Y.; Zhang, Z.; Yu, F.R.; Wu, T. Soft Actor-Critic DRL for Live Transcoding and Streaming in Vehicular Fog-Computing-Enabled IoV. IEEE Internet Things J. 2020, 8, 1308–1321. [Google Scholar] [CrossRef]
- Palombarini, J.A.; Martinez, E.C. Automatic Generation of Rescheduling Knowledge in Socio-technical Manufacturing Systems using Deep Reinforcement Learning. In Proceedings of the 2018 IEEE Biennial Congress of Argentina (ARGENCON), San Miguel de Tucuman, Argentina, 6–8 June 2018. [Google Scholar] [CrossRef]
- Palombarini, J.A.; Martínez, E.C. Closed-loop rescheduling using deep reinforcement learning. IFAC-PapersOnLine 2019, 52, 231–236. [Google Scholar] [CrossRef]
- Park, I.-B.; Huh, J.; Kim, J.; Park, J. A Reinforcement Learning Approach to Robust Scheduling of Semiconductor Manufacturing Facilities. IEEE Trans. Autom. Sci. Eng. 2020, 17. [Google Scholar] [CrossRef]
- Upkeep. Industrial Maintenance. Available online: https://www.upkeep.com/learning/industrial-maintenance (accessed on 28 October 2022).
- ATS. The Evolution of Industrial Maintenance. Available online: https://www.advancedtech.com/blog/evolution-of-industrial-maintenance/ (accessed on 28 October 2022).
- Moubray, J. RCM II-Reliability-centered Maintenance; Butterworth-Heinemann: London, UK, 1997. [Google Scholar]
- Menčík, J. Maintenance. In Concise Reliability for Engineers; IntechOpen: London, UK, 2016; Available online: https://www.intechopen.com/chapters/50096 (accessed on 29 October 2022).
- Pelantová, V. The Maintenance Management. In Maintenance Management-Current Challenges, New Developments, and Future Directions; IntechOpen: London, UK, 2022; Available online: https://www.intechopen.com/online-first/82473 (accessed on 29 October 2022).
- Nguyen, V.-T.; Do, P.; Vosin, A.; Iung, B. Artificial-intelligence-based maintenance decision-making and optimization for multi-state component systems. Reliab. Eng. Syst. Saf. 2022, 228, 108757. [Google Scholar] [CrossRef]
- Yan, Q.; Wu, W.; Wang, H. Deep Reinforcement Learning Approach for Maintenance Planning in a Flow-Shop Scheduling Problem. Machines 2022, 10, 210. [Google Scholar] [CrossRef]
- Mohammadi, R.; He, Q. A deep reinforcement learning approach for rail renewal and maintenance planning. Reliab. Eng. Syst. Saf. 2022, 225, 108615. [Google Scholar] [CrossRef]
- Ong, K.S.H.; Wang, W.; Hieu, N.Q.; Niyato, D.; Friedrichs, T. Predictive Maintenance Model for IIoT-Based Manufacturing: A Transferable Deep Reinforcement Learning Approach. IEEE Internet Things J. 2022, 9, 15725–15741. [Google Scholar] [CrossRef]
- Acernese, A.; Yerudkar, A.; Del Vecchio, C. A Novel Reinforcement Learning-based Unsupervised Fault Detection for Industrial Manufacturing Systems. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 2650–2655. [Google Scholar] [CrossRef]
- Li, C.; Chang, Q. Hybrid feedback and reinforcement learning-based control of machine cycle time for a multi-stage production system. J. Manuf. Syst. 2022, 65, 351–361. [Google Scholar] [CrossRef]
- Yousefi, N.; Tsianikas, S.; Coit, D.W. Dynamic maintenance model for a repairable multi-component system using deep reinforcement learning. Qual. Eng. 2022, 34, 16–35. [Google Scholar] [CrossRef]
- United Nations for Climate Change (UNFCCC). The Paris Agreement. Available online: https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement (accessed on 1 November 2022).
- Cheng, L.; Yu, T. A new generation of AI: A review and perspective on machine learning technologies applied to smart energy and electric power systems. Int. J. Energy Res. 2019, 43, 1928–1973. [Google Scholar] [CrossRef]
- Perera, A.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 2021, 137, 110618. [Google Scholar] [CrossRef]
- Leng, J.; Ruan, G.; Song, Y.; Liu, Q.; Fu, Y.; Ding, K.; Chen, X. A loosely-coupled deep reinforcement learning approach for order acceptance decision of mass-individualized printed circuit board manufacturing in industry 4.0. J. Clean. Prod. 2021, 280, 124405. [Google Scholar] [CrossRef]
- Lu, R.; Li, Y.-C.; Li, Y.; Jiang, J.; Ding, Y. Multi-agent deep reinforcement learning based demand response for discrete manufacturing systems energy management. Appl. Energy 2020, 276, 115473. [Google Scholar] [CrossRef]
- Deng, Y.; Hou, Z.; Yang, W.; Xu, J. Sample-Efficiency, Stability and Generalization Analysis for Deep Reinforcement Learning on Robotic Peg-in-Hole Assembly. In International Conference on Intelligent Robotics and Applications; Springer: Cham, Switzerland, 2021; pp. 393–403. [Google Scholar] [CrossRef]
- Mohammed, M.Q.; Chung, K.L.; Chyi, C.S. Review of deep reinforcement learning-based object grasping: Techniques, open challenges, and recommendations. IEEE Access 2020, 8, 178450–178481. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; pp. 737–744. [Google Scholar] [CrossRef]
- Arents, J.; Greitans, M. Smart Industrial Robot Control Trends, Challenges and Opportunities within Manufacturing. Applied Sciences 2022, 12, 937. [Google Scholar] [CrossRef]
- Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar] [CrossRef] [Green Version]
- Juliani, A.; Berges, V.-P.; Teng, E.; Cohen, A.; Harper, J.; Elion, C.; Goy, C.; Gao, Y.; Henry, H.; Mattar, M.; et al. Unity: A General Platform for Intelligent Agents. arXiv 2018, arXiv:1809.02627. [Google Scholar]
- Bullet Real-Time Physics Simulation|Home of Bullet and PyBullet: Physics Simulation for Games, Visual Effects, Robotics and Reinforcement Learning. Available online: https://pybullet.org/wordpress/ (accessed on 1 November 2022).
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar] [CrossRef]
- Kastner, L.; Bhuiyan, T.; Le, T.A.; Treis, E.; Cox, J.; Meinardus, B.; Kmiecik, J.; Carstens, R.; Pichel, D.; Fatloun, B.; et al. Arena-Bench: A Benchmarking Suite for Obstacle Avoidance Approaches in Highly Dynamic Environments. IEEE Robot. Autom. Lett. 2022, 7, 9477–9484. [Google Scholar] [CrossRef]
- Joshi, S.; Kumra, S.; Sahin, F. Robotic Grasping using Deep Reinforcement Learning. In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE 2020), Virtual Event, 20–21 August 2020; pp. 1461–1466. [Google Scholar]
- ZLi, Z.; Xin, J.; Li, N. End-To-End Autonomous Exploration for Mobile Robots in Unknown Environments through Deep Reinforcement Learning. In Proceedings of the 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR), Guiyang, China, 17–22 July 202; pp. 475–480. [CrossRef]
- Zhou, S.; Li, B.; Ding, C.; Lu, L.; Ding, C. An Efficient Deep Reinforcement Learning Framework for UAVs. In Proceedings of the 2020 21st International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 25–26 March 2020; pp. 323–328. [Google Scholar] [CrossRef]
- Krishnan, K.G.; Mohan, A.; Vishnu, S.; Eapen, S.A.; Raj, A.; Jacob, J. Path Planning of Mobile Robot Using Reinforcement Learning. J. Trends Comput. Sci. Smart Technol. 2022, 4, 153–162. [Google Scholar] [CrossRef]
- Gurnani, R.; Rastogi, S.; Chitkara, S.S.; Kumari, S.; Gagneja, A. Goal-Oriented Obstacle Avoidance by Two-Wheeled Self Balancing Robot. Smart Innov. Syst. Technol. 2022, 269, 345–360. [Google Scholar] [CrossRef]
- De Blasi, S.; Klöser, S.; Müller, A.; Reuben, R.; Sturm, F.; Zerrer, T. KIcker: An Industrial Drive and Control Foosball System automated with Deep Reinforcement Learning. J. Intell. Robot. Syst. Theory Appl. 2021, 102, 20. [Google Scholar] [CrossRef]
- Yang, J.; Liu, L.; Zhang, Q.; Liu, C. Research on Autonomous Navigation Control of Unmanned Ship Based on Unity3D. In Proceedings of the 2019 IEEE International Conference on Control, Automation and Robotics (ICCAR), Beijing, China, 19–22 April 2019; pp. 422–426. [Google Scholar] [CrossRef]
- Sun, L.; Zhai, J.; Qin, W. Crowd Navigation in an Unknown and Dynamic Environment Based on Deep Reinforcement Learning. IEEE Access 2019, 7, 109544–109554. [Google Scholar] [CrossRef]
- Lin, M.; Shan, L.; Zhang, Y. Research on robot arm control based on Unity3D machine learning. J. Phys. Conf. Ser. 2020, 1633, 012007. [Google Scholar] [CrossRef]
- Chen, L.; Jiang, Z.; Cheng, L.; Knoll, A.C.; Zhou, M. Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints. Front. Neurorobot. 2022, 16, 80. [Google Scholar] [CrossRef]
- Remman, S.B.; Lekkas, A.M. Robotic Lever Manipulation using Hindsight Experience Replay and Shapley Additive Explanations. In Proceedings of the 2021 European Control Conference (ECC), Delft, The Netherlands, 29 June–2 July 2021; pp. 586–593. [Google Scholar]
- Bellegarda, G.; Nguyen, Q. Robust Quadruped Jumping via Deep Reinforcement Learning. arXiv 2020, arXiv:2011.07089. [Google Scholar]
- Shahid, A.A.; Roveda, L.; Piga, D.; Braghin, F. Learning Continuous Control Actions for Robotic Grasping with Reinforcement Learning. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada,, 11–14 October 2020; pp. 4066–4072. [Google Scholar] [CrossRef]
- Bharadhwaj, H.; Yamaguchi, S.; Maeda, S.-I. MANGA: Method Agnostic Neural-policy Generalization and Adaptation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3356–3362. [Google Scholar]
- Hong, Z.W.; Shann, T.Y.; Su, S.Y.; Chang, Y.H.; Fu, T.J.; Lee, C.Y. Diversity-Driven Exploration Strategy for Deep Reinforcement Learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montreal, QC, Canada, 2–8 December 2018; pp. 10489–10500. [Google Scholar]
- Farid, K.; Sakr, N. Few Shot System Identification for Reinforcement Learning. In Proceedings of the 2021 6th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2021), Online, 16–18 July 2021; pp. 76–82. [Google Scholar]
- Keesman, K.J. System Identification: An Introduction, 2nd ed.; Springer: London, UK, 2011. [Google Scholar]
- Ljung, L. Perspectives on system identification. Annu. Rev. Control 2010, 34, 1–12. [Google Scholar] [CrossRef]
- Jiang, Y.; Yin, S.; Li, K.; Luo, H.; Kaynak, O. Industrial applications of digital twins. Philos. Trans. R. Soc. A 2021, 379, 20200360. [Google Scholar] [CrossRef]
- Chen, X.; Hu, J.; Jin, C.; Li, L.; Wang, L. Understanding Domain Randomization for Sim-to-real Transfer. arXiv 2021, arXiv:2110.03239. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar] [CrossRef] [Green Version]
- Osinski, B.; Jakubowski, A.; Ziecina, P.; Milos, P.; Galias, C.; Homoceanu, S.; Michalewski, H. Simulation-Based Reinforcement Learning for Real-World Autonomous Driving. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6411–6418. [Google Scholar] [CrossRef]
- Vuong, Q.; Vikram, S.; Su, H.; Gao, S.; Christensen, H.I. How to Pick the Domain Randomization Parameters for Sim-to-Real Transfer of Reinforcement Learning Policies? arXiv 2019. http://arxiv.org/abs/1903.11774. [Google Scholar]
- Mehta, B.; Mila, M.D.; Golemo Mila, F.; Pal Mila, C.J.; Montréal, P.; Liam Paull, C. Active Domain Randomization. Available online: https://proceedings.mlr.press/v100/mehta20a.html (accessed on 8 November 2022).
- Muratore, F.; Gruner, T.; Wiese, F.; Belousov, B.; Gienger, M.; Peters, J. Neural Posterior Domain Randomization. Available online: https://proceedings.mlr.press/v164/muratore22a.html (accessed on 8 November 2022).
- Xing, J.; Nagata, T.; Chen, K.; Zou, X.; Neftci, E.; Krichmar, J.L. Domain Adaptation in Reinforcement Learning via Latent Unified State Representation. Proc. Conf. AAAI Artif. Intell. 2021, 35, 10452–10459. [Google Scholar] [CrossRef]
- Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A Brief Review of Domain Adaptation. In Advances in Data Science and Information Engineering; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar] [CrossRef]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; Volume 1, pp. 97–105. [Google Scholar]
- Carr, T.; Chli, M.; Vogiatzis, G. Domain Adaptation for Reinforcement Learning on the Atari. In Proceedings of the 17th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), Stockholm, Sweden, 10–15 July 2018; pp. 1859–1861. [Google Scholar]
- Ghifary, M.; Kleijn, W.B.; Zhang, M.; Balduzzi, D.; Li, W. Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 8–16 October 2016; pp. 597–613. [Google Scholar]
- Alles, M.; Aljalbout, E. Learning to Centralize Dual-Arm Assembly. Front. Robot. AI 2021, 9, 830007. [Google Scholar] [CrossRef] [PubMed]
- Park, Y.; Lee, S.H.; Suh, I.H. Sim-to-Real Visual Grasping via State Representation Learning Based on Combining Pixel-Level and Feature-Level Domain Adaptation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
- Yu, W.; Tan, J.; Liu, C.K.; Turk, G. Preparing for the Unknown: Learning a Universal Policy with Online System Identification. In Proceedings of the Robotics: Science and Systems (RSS 2017), Cambridge, MA, USA, 12–16 July 2017. [Google Scholar]
- Witman, M.; Gidon, D.; Graves, D.B.; Smit, B.; Mesbah, A. Sim-to-real transfer reinforcement learning for control of thermal effects of an atmospheric pressure plasma jet. Plasma Sources Sci. Technol. 2019, 28, 095019. [Google Scholar] [CrossRef]
- Exarchos, I.; Jiang, Y.; Yu, W.; Liu, C.K. Policy Transfer via Kinematic Domain Randomization and Adaptation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
- Cheng, R.; Agia, C.; Shkurti, F.; Meger, D.; Dudek, G. Latent Attention Augmentation for Robust Autonomous Driving Policies. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar] [CrossRef]
Simulator | Example |
---|---|
Gazebo |
|
Unity3D |
|
PyBullet | |
MuJoCo |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
del Real Torres, A.; Andreiana, D.S.; Ojeda Roldán, Á.; Hernández Bustos, A.; Acevedo Galicia, L.E. A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework. Appl. Sci. 2022, 12, 12377. https://doi.org/10.3390/app122312377
del Real Torres A, Andreiana DS, Ojeda Roldán Á, Hernández Bustos A, Acevedo Galicia LE. A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework. Applied Sciences. 2022; 12(23):12377. https://doi.org/10.3390/app122312377
Chicago/Turabian Styledel Real Torres, Alejandro, Doru Stefan Andreiana, Álvaro Ojeda Roldán, Alfonso Hernández Bustos, and Luis Enrique Acevedo Galicia. 2022. "A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework" Applied Sciences 12, no. 23: 12377. https://doi.org/10.3390/app122312377
APA Styledel Real Torres, A., Andreiana, D. S., Ojeda Roldán, Á., Hernández Bustos, A., & Acevedo Galicia, L. E. (2022). A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework. Applied Sciences, 12(23), 12377. https://doi.org/10.3390/app122312377