Multi-Phase Focused PID Adaptive Tuning with Reinforcement Learning
Round 1
Reviewer 1 Report
In order to improve the performance of PID controller, the authors proposed a multi-phase PID parameter adaptive tuning method. The tracking error of PID controller has decreased by 16-30% based on their proposed method. My concerns are listed in the following.
- Can you provide a result table for the MSE of the several algorithms?
- How do you choose the parameters for DDPG? Are the parameters in Table 1 the optimal ones?
- In Line 188, please correct [31?].
- R^(th2) of -7 and -3 are worse than -5. It is not enough to prove that -5 is the best parameter. Do you have any plan to investigate the continuous value of R^(th2)?
Author Response
Dear Professor,
I hope this message finds you well. I am truly honored that my paper has been considered for your review. Your insightful and professional feedback has been immensely valuable to me, and it has significantly contributed to enhancing my academic experience. After carefully reviewing your comments, I have diligently revised the paper. Allow me to provide you with a summary of my responses to your feedback:
Comments 1:
The table pertaining to MSE has been added, as seen in Table 3.
Comment 2:
Regarding the selection of DDPG parameters, I primarily drew inspiration from two seminal papers in the field of reinforcement learning: "Human-Level Control through Deep Reinforcement Learning" and "Deterministic Policy Gradient Algorithms." The reason for this choice lies in the fact that, unlike classical machine learning tasks such as image recognition, the computational requirements for learning PID parameters are relatively modest. To conserve computational resources and expedite training, we opted to reduce both the depth and the number of neurons in our neural network. Under this premise, the computational workload of gradient backpropagation is correspondingly reduced, and we introduced layer normalization as part of our operations. The relative simplicity of the neural network architecture renders the algorithm less sensitive to changes in hyperparameters, which we consider to be one of the strengths of our approach. With this in mind, we believe that the selection of hyperparameters in this paper is appropriate, and experimental results have similarly substantiated this assertion.
Comments 3:
The citation error has been rectified.
Comments 4:
We have conducted additional experiments with R^(th2) values of -6 and -4, building upon our existing experimental foundation(Fig 6 (a)&(b)). This makes the experimental design for threshold selection more continuous. The results continue to support that setting R^(th2) to -5 is reasonable. When R^(th2) is set to values less than -7 or greater than -3, it is evident from the experimental results that this significantly hinders training effectiveness. Therefore, values outside this range need not be considered. Furthermore, based on the reward function design in this article, when the reward differences between different training outcomes are within 1, the corresponding PID control performance does not exhibit a significant impact. Therefore, the design of reward values with a step size of 1 in the threshold experiments is justified. Please refer to the highlighted explanation in lines 380-397 of the manuscript for further details.
Thank you again for your valuable feedback, and I wish you a joyful life.
Best regards.
Reviewer 2 Report
The authors present the article entitled “Multi-phase Focused PID Adaptive Tuning with Reinforcement Learning”
This paper proposes a multi-phase focused PID adaptive tuning method, leveraging the deep deterministic policy gradient (DDPG) algorithm to automatically establish reference values for PID tuning.
The article presents the following concerns:
-
Add a short introduction or description between sections and subsections.
-
Avoid using We, instead, use passive voice.
-
It is necessary that in subsection 1.2 add quantitative results of the analyzed works.
-
Avoid mentioning the exact figure more than once in a paragraph, such as in the section between lines 202 and 211.
-
The meaning of the variables st, at, and rt is unclear. Please clarify it.
-
Move figure 3 after the mentioned.
-
Add hyperlinks to Figures, Tables, and references.
-
Please justify using the established hyperparameters or the methodology implemented to reach them.
-
It is necessary to analyze and describe each of the subfigures of Figures 6 and 7.
-
Please, change the reference 1 and 2 for the followings in line 22 for example for the followings to have up-to date citations: Implementation of ann-based auto-adjustable for a pneumatic servo system embedded on fpga; self-tuning neural network pid with dynamic response control; A pid-type fuzzy logic controller-based approach for motion control applications; Comparison of pd, pid and sliding-mode position controllers for v-tail quadcopter stability; Auto-regression model-based off-line pid controller tuning: an adaptive strategy for dc motor control.
-
Please carry out a deeper analysis of Figure 9.
-
Add a comparative table in the discussion section with the most outstanding results of this work and similar works.
-
Please update the references used.
-
Writing the text in the third person or passive voice is recommended in this kind of text.
The following misspellings should be checked:
-
line 73: “which reduce the demand of…” should be rewritten by “reducing the demand for…”
-
line 166: “shown as Figure 1…” It seems that preposition use may be incorrect here. Should be rewritten by “shown by Figure 1…”
-
line 245: The word “effective” is often overused. Consider using a more specific synonym to improve the sharpness of your writing, like “adequate”.
-
line 31: The noun phrase “threshold” seems to be missing a determiner before it. Consider adding an article “the threshold” or “a threshold”
Author Response
Dear Professor,
I hope this message finds you well. I am truly honored that my paper has been considered for your review. Your insightful and professional feedback has been immensely valuable to me, and it has significantly contributed to enhancing my academic experience. After carefully reviewing your comments, I have diligently revised the paper. Allow me to provide you with a summary of my responses to your feedback:
Comments 1:
Descriptions have been added between each section and subsection.
Comment 2:
All the first-person narrative modes have been replaced by third person or passive voice.
Comments 3:
Quantitative results of the analyzed works have been added in subsection 1.2.
Comments 4:
The issue of repeatedly mentioning a particular figure has been resolved.
Comments 5:
The meaning of the variables st, at, and rt have been explained in subsection 2.2, 1st paragraph.
Comments 6:
Figure 3 has been moved after mentioned.
Comments 7:
All the Figures, Tables, and references have been provided with hyperlinks.
Comments 8:
Regarding the selection of DDPG parameters, I primarily drew inspiration from two seminal papers in the field of reinforcement learning: "Human-Level Control through Deep Reinforcement Learning" and "Deterministic Policy Gradient Algorithms." The reason for this choice lies in the fact that, unlike classical machine learning tasks such as image recognition, the computational requirements for learning PID parameters are relatively modest. To conserve computational resources and expedite training, we opted to reduce both the depth and the number of neurons in our neural network. Under this premise, the computational workload of gradient backpropagation is correspondingly reduced, and we introduced layer normalization as part of our operations. The relative simplicity of the neural network architecture renders the algorithm less sensitive to changes in hyperparameters, which we consider to be one of the strengths of our approach. With this in mind, we believe that the selection of hyperparameters in this paper is appropriate, and experimental results have similarly substantiated this assertion. And the parameters in Table 2 were explained in subsection 4.1 and 4.2.
Comments 9:
Subfigures of Figures 6 and 7 have been explained in subsection 4.2 and 4.3.
Comments 10:
The references have been changed.
Comments 11:
Analysis of Figure 9 has been applied in subsection 4.3, 2nd paragraph.
Comments 12:
Subection 4.5 has been added, which includes a comparison of MF-DDPG with several classical PID adaptive tuning algorithms, accompanied by figures and tables.
Comments 13:
References have been updated.
Comments 14:
All the first-person narrative modes have been replaced by third person or passive voice.
The misspelling issues:
The mentioned misspellings have been corrected.
Thank you again for your valuable feedback, and I wish you a joyful life.
Best regards.
Reviewer 3 Report
Authors contributions:
The authors have introduced a novel method for tuning PID controllers in industrial control systems. This method incorporates a multi-phase approach that constrains agent actions based on reward thresholds. This approach aims to automatically establish reference values for PID tuning, providing better adaptability and maintaining control stability even in situations with limited prior knowledge. The proposed method leverages the DDPG algorithm, a reinforcement learning (RL) technique, to fine-tune PID parameters. This utilization of RL allows for the exploration of a broader range of PID parameters independently, which can lead to improved controller performance.
The authors acknowledge the potential issue of gradient vanishing that may occur due to action constraints. To counteract this problem, a residual structure is incorporated into the actor network, enhancing the training stability of the RL-based algorithm. The proposed method is evaluated through experiments conducted on both first-order and second-order systems. The results indicate that the new approach can reduce tracking error in PID controllers by 16%-30% when compared to baseline methods. Additionally, it maintains system stability during the tuning process.
Limitations of this work:
The proposed method relies on deep reinforcement learning techniques, specifically the DDPG algorithm. RL methods can be computationally intensive and require a significant amount of training data. This could limit the practicality of the method in real-time control systems or systems with limited computational resources.
Implementing an RL-based approach in practical industrial control systems may be challenging. It often involves setting up appropriate simulations, defining reward functions, and fine-tuning hyperparameters, which can be complex and time-consuming.
RL algorithms are sensitive to hyperparameters, and finding the right combination of hyperparameters can be a non-trivial task. Poorly tuned hyperparameters may lead to suboptimal results or unstable controller behavior.
The method may assume stationarity in the controlled systems, which may not hold true in some dynamic or evolving industrial processes. Adapting the method to non-stationary systems may require additional considerations.
I have some reviewer notes:
Abstract. Where the results can be implemented in practice.
Introduction. The aim of this research is not clearly presented.
2.1. PID Control. You have to describe the reason for choosing parallel PID control.
Line 259. Use appropriate citation (3).
Figure 11. It will be good to add axis titles.
Discussion part is missing. You have to compare your results with minimum three other papers (more is better).
Conclusion. How your work improves the known solutions in this study area? How your results can be implemented in practice? Show some values of obtained accuracy over the known solutions.
I have some suggestions:
Make better description of your methods and the reason why you choose them. Improve the presentation of your figures. Make more comparative analyses with other papers. It will be good to compare your results with more PID tuning methods, that are appropriate for your system.
Minor editing of English language required.
Author Response
Dear Professor,
I hope this message finds you well. I am truly honored that my paper has been considered for your review. Your insightful and professional feedback has been immensely valuable to me, and it has significantly contributed to enhancing my academic experience. After carefully reviewing your comments, I have diligently revised the paper. Allow me to provide you with a summary of my responses to your feedback:
Comments 1:
The practical applications of this study have been included in the abstract, in line 1~5.
Comment 2:
The aim of the research has been explained in subsection 1.1, 4th paragraph.
Comments 3:
The reason for choosing parallel PID controller has been explained in subsection 2.1.
Comments 4:
The citatino issue has been .corrected.
Comments 5:
Figure 11 represents a commonly used gradient visualization method in the field of machine learning. In this figure, the color depth of each block corresponds to the magnitude of the gradient – the lighter the color, the larger the gradient. The numerical values on the axes are intuitively mapped to the color depth and do not hold any specific physical significance. Therefore, axis titles have not been included in this figure.
Comments 6:
Subection 4.5 has been added as the disscussion part, which includes a comparison of MF-DDPG with several classical PID adaptive tuning algorithms, accompanied by figures and tables.
Comments 7:
The primary focus of this research is on the theoretical study of PID controllers themselves and does not involve specific control systems or application scenarios. The rapid and stable adaptive tuning of PID controllers has practical significance across various application domains. Our research on the practical application of this algorithm in specific domains has been summarized in another paper titled 'MLEAT: An RL-based Approach for Vibration Damping System Control.' In this paper, we applied the algorithm proposed in this study to the control of vibration damping systems and achieved promising results. This paper has been accepted by IEEE and is scheduled for publication. We welcome your feedback and valuable insights when it becomes available.
Thank you again for your valuable feedback, and I wish you a joyful life.
Best regards.
Round 2
Reviewer 2 Report
the manuscript has been improved, it can be accepted for publication.