Adaptive Sliding Mode Disturbance Observer and Deep Reinforcement Learning Based Motion Control for Micropositioners
Abstract
:1. Introduction
- (1)
- A DDPG-ID algorithm based on deep reinforcement learning is introduced as a basic micropositioner system motion controller, which avoided the limitation of traditional control strategies to the accuracy and comprehensiveness of the dynamic model;
- (2)
- To eliminate the collective effect caused by the lumped disturbances from the micropositioner system and inaccurate estimation of the value function in deep reinforcement learning, an adaptive sliding mode disturbance observer (ASMDO) is proposed;
- (3)
- An integral differential compensator is introduced in DDPG-ID to compensate for the feedback state of the system, which improves the accuracy and response time of the controller, and further improves the robustness of the controller subject to external disturbances.
2. System Description
3. Design of ASMDO and DDPG-ID Algorithm
3.1. Design of Adaptive Sliding Mode Disturbance Observer
3.2. Design of DDPG-ID Algorithm for Micropositioner
3.2.1. Critic Update
3.2.2. Actor Update
Algorithm 1 DDPG-ID Algorithm. |
|
4. Simulation and Experimental Results
4.1. Simulation Results
4.2. Experimental Results
5. Conclusions and Future Works
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
Proportional–integral–derivative control | |
Radial basis neural network | |
Reinforcement learning | |
State-Action-Reward-State-Action | |
Q | The Value of Action in reinforcement learning |
Deep reinforcement learning | |
Deep neural networks | |
Deep Q network | |
Policy gradient | |
Deep deterministic policy gradient | |
Integral differential compensator | |
The magnetic force | |
y | The working air gap in micropositioner |
The excitation current in micropositioner | |
The electron-magnetic actuator | |
The input voltage from the electron-magnetic actuator | |
R | The resistance of the coil in micropositioner |
H | The coil inductance in micropositioner |
u | The control input |
D | The lumped system disturbance |
Adaptive Sliding Mode Disturbance Observer | |
The state at time t in reinforcement learning | |
The action at time t in reinforcement learning | |
The reward at time t in reinforcement learning | |
Rectified linear unit activation function | |
Hyperbolic tangent activation function |
References
- Català-Castro, F.; Martín-Badosa, E. Positioning Accuracy in Holographic Optical Traps. Micromachines 2021, 12, 559. [Google Scholar] [CrossRef] [PubMed]
- Bettahar, H.; Clévy, C.; Courjal, N.; Lutz, P. Force-Position Photo-Robotic Approach for the High-Accurate Micro-Assembly of Photonic Devices. IEEE Robot. Autom. Lett. 2020, 5, 6396–6402. [Google Scholar] [CrossRef]
- Cox, L.M.; Martinez, A.M.; Blevins, A.K.; Sowan, N.; Ding, Y.; Bowman, C.N. Nanoimprint lithography: Emergent materials and methods of actuation. Nano Today 2020, 31, 100838. [Google Scholar] [CrossRef]
- Dai, C.; Zhang, Z.; Lu, Y.; Shan, G.; Wang, X.; Zhao, Q.; Ru, C.; Sun, Y. Robotic manipulation of deformable cells for orientation control. IEEE Trans. Robot. 2019, 36, 271–283. [Google Scholar] [CrossRef]
- Zhang, P.; Yang, Z. A robust adaboost. rt based ensemble extreme learning machine. Math. Probl. Eng. 2015, 2015, 260970. [Google Scholar] [CrossRef]
- Yang, Z.; Wong, P.; Vong, C.; Zong, J.; Liang, J. Simultaneous-fault diagnosis of gas turbine generator systems using a pairwise-coupled probabilistic classifier. Math. Probl. Eng. 2013, 2013, 827128. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Zhou, L.; Yang, Z.; Cui, Y.; Wang, L.; Jiang, J.; Guo, L. A new testing method for the dielectric response of oil-immersed transformer. IEEE Trans. Ind. Electron. 2019, 67, 10833–10843. [Google Scholar] [CrossRef]
- Roshandel, N.; Soleymanzadeh, D.; Ghafarirad, H.; Koupaei, A.S. A modified sensorless position estimation approach for piezoelectric bending actuators. Mech. Syst. Signal Process. 2021, 149, 107231. [Google Scholar] [CrossRef]
- Ding, B.; Yang, Z.X.; Xiao, X.; Zhang, G. Design of reconfigurable planar micro-positioning stages based on function modules. IEEE Access 2019, 7, 15102–15112. [Google Scholar] [CrossRef]
- García-Martínez, J.R.; Cruz-Miguel, E.E.; Carrillo-Serrano, R.V.; Mendoza-Mondragón, F.; Toledano-Ayala, M.; Rodríguez-Reséndiz, J. A PID-type fuzzy logic controller-based approach for motion control applications. Sensors 2020, 20, 5323. [Google Scholar] [CrossRef]
- Salehi Kolahi, M.R.; Gharib, M.R.; Heydari, A. Design of a non-singular fast terminal sliding mode control for second-order nonlinear systems with compound disturbance. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2021, 235, 7343–7352. [Google Scholar] [CrossRef]
- Nguyen, M.H.; Dao, H.V.; Ahn, K.K. Adaptive Robust Position Control of Electro-Hydraulic Servo Systems with Large Uncertainties and Disturbances. Appl. Sci. 2022, 12, 794. [Google Scholar] [CrossRef]
- Cruz-Miguel, E.E.; García-Martínez, J.R.; Rodríguez-Reséndiz, J.; Carrillo-Serrano, R.V. A new methodology for a retrofitted self-tuned controller with open-source fpga. Sensors 2020, 20, 6155. [Google Scholar] [CrossRef]
- Montalvo, V.; Estévez-Bén, A.A.; Rodríguez-Reséndiz, J.; Macias-Bobadilla, G.; Mendiola-Santíbañez, J.D.; Camarillo-Gómez, K.A. FPGA-Based Architecture for Sensing Power Consumption on Parabolic and Trapezoidal Motion Profiles. Electronics 2020, 9, 1301. [Google Scholar] [CrossRef]
- García-Martínez, J.R.; Rodríguez-Reséndiz, J.; Cruz-Miguel, E.E. A new seven-segment profile algorithm for an open source architecture in a hybrid electronic platform. Electronics 2019, 8, 652. [Google Scholar] [CrossRef] [Green Version]
- Fei, J.; Fang, Y.; Yuan, Z. Adaptive Fuzzy Sliding Mode Control for a Micro Gyroscope with Backstepping Controller. Micromachines 2020, 11, 968. [Google Scholar] [CrossRef] [PubMed]
- Ruan, W.; Dong, Q.; Zhang, X.; Li, Z. Friction Compensation Control of Electromechanical Actuator Based on Neural Network Adaptive Sliding Mode. Sensors 2021, 21, 1508. [Google Scholar] [CrossRef] [PubMed]
- Gharib, M.R.; Koochi, A.; Ghorbani, M. Path tracking control of electromechanical micro-positioner by considering control effort of the system. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2021, 235, 984–991. [Google Scholar] [CrossRef]
- Han, M.; Tian, Y.; Zhang, L.; Wang, J.; Pan, W. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica 2021, 129, 109689. [Google Scholar] [CrossRef]
- de Orio, R.L.; Ender, J.; Fiorentini, S.; Goes, W.; Selberherr, S.; Sverdlov, V. Optimization of a spin-orbit torque switching scheme based on micromagnetic simulations and reinforcement learning. Micromachines 2021, 12, 443. [Google Scholar] [CrossRef]
- Adda, C.; Laurent, G.J.; Le Fort-Piat, N. Learning to control a real micropositioning system in the STM-Q framework. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 4569–4574. [Google Scholar]
- Li, J.; Li, Z.; Chen, J. Reinforcement learning based precise positioning method for a millimeters-sized omnidirectional mobile microrobot. In Proceedings of the International Conference on Intelligent Robotics and Applications, Wuhan, China, 15–17 October 2008; pp. 943–952. [Google Scholar]
- Shi, H.; Shi, L.; Sun, G.; Hwang, K.S. Adaptive Image-Based Visual Servoing for Hovering Control of Quad-Rotor. IEEE Trans. Cogn. Dev. Syst. 2019, 12, 417–426. [Google Scholar] [CrossRef]
- Zheng, N.; Ma, Q.; Jin, M.; Zhang, S.; Guan, N.; Yang, Q.; Dai, J. Abdominal-waving control of tethered bumblebees based on sarsa with transformed reward. IEEE Trans. Cybern. 2018, 49, 3064–3073. [Google Scholar] [CrossRef]
- Tang, L.; Yang, Z.X.; Jia, K. Canonical correlation analysis regularization: An effective deep multiview learning baseline for RGB-D object recognition. IEEE Trans. Cogn. Dev. Syst. 2018, 11, 107–118. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Sutton, R.S.; McAllester, D.A.; Singh, S.P.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 2000; pp. 1057–1063. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning (PMLR), Bejing, China, 22–24 June 2014; pp. 387–395. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Latifi, K.; Kopitca, A.; Zhou, Q. Model-free control for dynamic-field acoustic manipulation using reinforcement learning. IEEE Access 2020, 8, 20597–20606. [Google Scholar] [CrossRef]
- Leinen, P.; Esders, M.; Schütt, K.T.; Wagner, C.; Müller, K.R.; Tautz, F.S. Autonomous robotic nanofabrication with reinforcement learning. Sci. Adv. 2020, 6, eabb6987. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Y.; Wang, G.; Xu, B. A basal ganglia network centric reinforcement learning model and its application in unmanned aerial vehicle. IEEE Trans. Cogn. Dev. Syst. 2017, 10, 290–303. [Google Scholar] [CrossRef]
- Guo, X.; Yan, W.; Cui, R. Event-triggered reinforcement learning-based adaptive tracking control for completely unknown continuous-time nonlinear systems. IEEE Trans. Cybern. 2019, 50, 3231–3242. [Google Scholar] [CrossRef]
- Zhang, J.; Shi, P.; Xia, Y.; Yang, H.; Wang, S. Composite disturbance rejection control for Markovian Jump systems with external disturbances. Automatica 2020, 118, 109019. [Google Scholar] [CrossRef]
- Ahmed, S.; Wang, H.; Tian, Y. Adaptive high-order terminal sliding mode control based on time delay estimation for the robotic manipulators with backlash hysteresis. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1128–1137. [Google Scholar] [CrossRef]
- Chen, M.; Xiong, S.; Wu, Q. Tracking flight control of quadrotor based on disturbance observer. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1414–1423. [Google Scholar] [CrossRef]
- Zhao, Z.; He, X.; Ahn, C.K. Boundary disturbance observer-based control of a vibrating single-link flexible manipulator. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 2382–2390. [Google Scholar] [CrossRef]
- Alibekov, E.; Kubalík, J.; Babuška, R. Policy derivation methods for critic-only reinforcement learning in continuous spaces. Eng. Appl. Artif. Intell. 2018, 69, 178–187. [Google Scholar] [CrossRef] [Green Version]
- Hasselt, H. Double Q-learning. Adv. Neural Inf. Process. Syst. 2010, 23, 2613–2621. [Google Scholar]
- Zhang, S.; Sun, C.; Feng, Z.; Hu, G. Trajectory-Tracking Control of Robotic Systems via Deep Reinforcement Learning. In Proceedings of the 2019 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), Bangkok, Thailand, 18–20 November 2019; pp. 386–391. [Google Scholar]
- Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H.; Lewis, F.L. Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2042–2062. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Zhang, H.; Wang, Z. Policy Gradient Reinforcement Learning for Parameterized Continuous-Time Optimal Control. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 59–64. [Google Scholar]
- Xiao, X.; Xi, R.; Li, Y.; Tang, Y.; Ding, B.; Ren, H.; Meng, M.Q.H. Design and control of a novel electromagnetic actuated 3-DoFs micropositioner. Microsyst. Technol. 2021, 27, 1–10. [Google Scholar] [CrossRef]
- Tommasino, P.; Caligiore, D.; Mirolli, M.; Baldassarre, G. A reinforcement learning architecture that transfers knowledge between skills when solving multiple tasks. IEEE Trans. Cogn. Dev. Syst. 2016, 11, 292–317. [Google Scholar]
- Srikant, R.; Ying, L. Finite-time error bounds for linear stochastic approximation andtd learning. In Proceedings of the Conference on Learning Theory (PMLR), Phoenix, AZ, USA, 25–28 June 2019; pp. 2803–2830. [Google Scholar]
- Feng, Z.; Ming, M.; Ling, J.; Xiao, X.; Yang, Z.X.; Wan, F. Fractional delay filter based repetitive control for precision tracking: Design and application to a piezoelectric nanopositioning stage. Mech. Syst. Signal Process. 2022, 164, 108249. [Google Scholar] [CrossRef]
Method | Advantages | Disadvantages |
---|---|---|
PID control | Simple design structure Easy to implementation | Mainly used in linear systems Requirement of full-state feedback Lack of adaptivity |
SMC control | Simple design structure Easy to implementation High robustness | Excessive chattering effect Lack of adaptivity |
Adaptive control | Lower initial cost Lower cost of redundancy High reliability and performance | Stability is not treated rigorously High gain observes needed Slow convergence |
Backstepping control | Global stability Simple design structure Easy to be integrated | Low anti-interference ability Sensitive to system models Lack of adaptivity |
RL control | No need of accurate model Improved control performance High adaptivity | Poor anti-interference ability Easy to generate state error |
Network Layer Name | Number of Nodes |
---|---|
StateLayer | 5 |
CriticStateFC1 | 120 |
CriticStateFC2 | 60 |
CriticStateFC3 | 60 |
ActionInput | 1 |
CriticActionFC1 | 60 |
addLayer | 2 |
CriticOutput | 1 |
Network Layer Name | Number of Nodes |
---|---|
StateLayer | 5 |
ActorFC1 | 30 |
ActorOutput | 1 |
Notation | Value | Unit |
---|---|---|
13.21 | ||
0.67 | ||
a | ||
R | 43.66 | |
c | ||
k | ||
m | 0.0272 |
Hyperparameters | Value |
---|---|
Learning rate for actor | 0.001 |
Learning rate for critic | 0.001 |
Discount factor | 0.99 |
Initial exploration | 1 |
Experience replay buffer size | 100,000 |
Minibatch size M | 64 |
Max episode | 1500 |
Soft update factor | 0.05 |
Max exploration steps T | 250 (25 s) |
Time step | 0.01 s |
Intergal gain | 0.01 |
Differential gain | 0.001 |
RMSE | MAX | MEAN | |
---|---|---|---|
DDPG-ID | |||
DDPG | |||
PID |
RMSE | MAX | MEAN | |
---|---|---|---|
DDPG-ID | |||
DDPG | |||
PID |
RMSE | MAX | MEAN | |
---|---|---|---|
DDPG-ID | |||
DDPG |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, S.; Xi, R.; Xiao, X.; Yang, Z. Adaptive Sliding Mode Disturbance Observer and Deep Reinforcement Learning Based Motion Control for Micropositioners. Micromachines 2022, 13, 458. https://doi.org/10.3390/mi13030458
Liang S, Xi R, Xiao X, Yang Z. Adaptive Sliding Mode Disturbance Observer and Deep Reinforcement Learning Based Motion Control for Micropositioners. Micromachines. 2022; 13(3):458. https://doi.org/10.3390/mi13030458
Chicago/Turabian StyleLiang, Shiyun, Ruidong Xi, Xiao Xiao, and Zhixin Yang. 2022. "Adaptive Sliding Mode Disturbance Observer and Deep Reinforcement Learning Based Motion Control for Micropositioners" Micromachines 13, no. 3: 458. https://doi.org/10.3390/mi13030458
APA StyleLiang, S., Xi, R., Xiao, X., & Yang, Z. (2022). Adaptive Sliding Mode Disturbance Observer and Deep Reinforcement Learning Based Motion Control for Micropositioners. Micromachines, 13(3), 458. https://doi.org/10.3390/mi13030458