An Improved Adam’s Algorithm for Stomach Image Classification
Abstract
:1. Introduction
2. Design of the CG-Adam Algorithm
2.1. The Adam Optimization Algorithm
2.2. Gradient Norm Joint Clipping
2.3. Control Restart Strategy
2.4. The CG-Adam Algorithm
Algorithm 1 CG-Adam |
1: Input: initial point, first moment decay β1, second moment decay β2, Gradient clipping threshold clip_value, Restart period T. |
2: Initialize: set the initial value of momentum and to 0. Initialize t’ to 0 for cycle counting. Initialize restart_step to 0 to control the restart strategy. Set the initial step of the optimizer to 0. i.e., , , restart_step = 0, step = 0. |
3: For t = 1 to T do 4: 5: |
6: 7: if t’ == 0 then 8: restart_step = 0 9: end for |
10: if restart_step == 0 then |
11: 12: 13: for each parameter in group[ params ] do 14: 15: 16: |
17: 18: 19: 20: end for |
21: update t’ to the value of the next cycle, |
22: restart_step = (restart_step + 1) mod T |
end for |
Return: Returns the final optimized parameter . |
3. Experimental Design and Analysis of Results
3.1. Experimental Environment Configuration
3.2. Experimental Results and Analysis
- (1)
- The epoch in this experiment is set to 100 to ensure that the algorithms are compared with the same number of training rounds.
- (2)
- The batch size in this experiment is set to 128 to maintain the consistency of the experiment.
- (3)
- The initial learning rate of all algorithms in this experiment is set to 0.001.
- (4)
- Due to the need to adjust the restart period and cropping value of the CG-Adam algorithm, multiple experiments are performed on each dataset, and the best experimental results are selected.
- (5)
- Comparison experiments are conducted on the MNIST, CIFAR10, and Stomach datasets, respectively.
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CG-Adam | Control restart strategy gradient norm joint clipping Adam |
AdamMCMC | Combining Metropolis-adjusted Langevin with momentum-based optimization |
SADAM | Stochastic Adam |
lr | Learning rate |
ELRA | Exponential learning rate adaption gradient descent |
References
- Yun, J. StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling. arXiv 2024, arXiv:2310.17042. [Google Scholar] [CrossRef]
- Xia, L.; Massei, S. AdamL: A fast adaptive gradient method incorporating loss function. arXiv 2023, arXiv:2312.15295. [Google Scholar] [CrossRef]
- Tang, Q.; Shpilevskiy, F.; Lécuyer, M. DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction). arXiv 2023, arXiv:2312.14334. [Google Scholar] [CrossRef]
- Kleinsorge, A.; Kupper, S.; Fauck, A.; Rothe, F. ELRA: Exponential learning rate adaption gradient descent optimization method. arXiv 2023, arXiv:2309.06274. [Google Scholar] [CrossRef]
- Hong, Y.; Lin, J. High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise. arXiv 2023, arXiv:2311.02000. [Google Scholar] [CrossRef]
- Shao, Y.; Fan, S.; Sun, H.; Tan, Z.; Cai, Y.; Zhang, C.; Zhang, L. Multi-Scale Lightweight Neural Network for Steel Surface Defect Detection. Coatings 2023, 13, 1202. [Google Scholar] [CrossRef]
- Zhuang, Z. Adaptive Strategies in Non-convex Optimization. arXiv 2023, arXiv:2306.10278. [Google Scholar] [CrossRef]
- Zhang, G.; Zhang, D.; Zhao, S.; Liu, D.; Toptan, C.M.; Liu, H. Asymmetric Momentum: A Rethinking of Gradient Descent. arXiv 2023, arXiv:2309.02130. [Google Scholar] [CrossRef]
- Song, Z.; Yang, C. An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent. arXiv 2023, arXiv:2310.11291. [Google Scholar] [CrossRef]
- Zhang, W.; Bao, Y. SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer. arXiv 2022, arXiv:2205.10247. [Google Scholar] [CrossRef]
- Wang, R.; Klabjan, D. Divergence Results and Convergence of a Variance Reduced Version of ADAM. arXiv 2022, arXiv:2210.05607. [Google Scholar] [CrossRef]
- Li, H.; Rakhlin, A.; Jadbabaie, A. Convergence of Adam Under Relaxed Assumptions. arXiv 2023, arXiv:2304.13972. [Google Scholar] [CrossRef]
- He, M.; Liang, Y.; Liu, J.; Xu, D. Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case. arXiv 2023, arXiv:2307.11782. [Google Scholar] [CrossRef]
- Bu, Z.; Wang, Y.-X.; Zha, S.; Karypis, G. Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger. arXiv 2023, arXiv:2206.07136. [Google Scholar] [CrossRef]
- Shao, Y.; Zhang, C.; Xing, L.; Sun, H.; Zhao, Q.; Zhang, L. A new dust detection method for photovoltaic panel surface based on Pytorch and its economic benefit analysis. Energy AI 2024, 16, 100349. [Google Scholar] [CrossRef]
- Notsawo, P.J.T. Stochastic Average Gradient: A Simple Empirical Investigation. arXiv 2023, arXiv:2310.12771. [Google Scholar] [CrossRef]
- Chen, B.; Wang, H.; Ba, C. Differentiable Self-Adaptive Learning Rate. arXiv 2022, arXiv:2210.10290. [Google Scholar] [CrossRef]
- Chen, A.C.H. Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms. arXiv 2022, arXiv:2212.12279. [Google Scholar] [CrossRef]
- Bieringer, S.; Kasieczka, G.; Steffen, M.F.; Trabs, M. AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization. arXiv 2023, arXiv:2312.14027. [Google Scholar] [CrossRef]
- Zhang, C.; Shao, Y.; Sun, H.; Xing, L.; Zhao, Q.; Zhang, L. The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms. Math. Biosci. Eng. 2023, 21, 1270–1285. [Google Scholar] [CrossRef] [PubMed]
Software | Version |
---|---|
Python | 3.10 |
torch | 2.0.1 |
torchvision | 0.15.0 |
lightning | 2.1.2 |
Wandb | 0.16.0 |
Dataset | Number of Samples | Training Set | Test Set | Validation Set | Category |
---|---|---|---|---|---|
MNIST | 70,000 | 55,000 | 10,000 | 5000 | 10 |
CIFAR10 | 60,000 | 45,000 | 10,000 | 5000 | 10 |
Stomach | 1885 | 900 | 485 | 500 | 8 |
Dataset | Optimization Algorithm | Accuracy | Loss |
---|---|---|---|
SGD | 97.36% | 0.091 | |
Adagrad | 97.84% | 0.066 | |
Adadelta | 96.42% | 0.133 | |
MNIST | Adam | 98.52% | 0.064 |
Nadam | 98.50% | 0.062 | |
StochGradAdam | 97.82% | 0.078 | |
CG-Adam | 98.59% | 0.059 | |
SGD | 49.09% | 1.467 | |
Adagrad | 33.54% | 1.852 | |
Adadelta | 25.58% | 1.979 | |
CIFAR10 | Adam | 69.55% | 1.232 |
Nadam | 68.87% | 1.638 | |
StochGradAdam | 68.07% | 1.04 | |
CG-Adam | 70.7% | 1.181 | |
SGD | 57.6% | 1.17 | |
Adagrad | 68.40% | 0.992 | |
Adadelta | 55.80% | 2.110 | |
Stomach | Adam | 69.80% | 1.046 |
Nadam | 67.66% | 2.10 | |
StochGradAdam | 67.2% | 1.166 | |
CG-Adam | 73.2% | 1.020 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Yu, H.; Shao, Y.; Wang, J.; Xing, L.; Zhang, L.; Zhao, Q. An Improved Adam’s Algorithm for Stomach Image Classification. Algorithms 2024, 17, 272. https://doi.org/10.3390/a17070272
Sun H, Yu H, Shao Y, Wang J, Xing L, Zhang L, Zhao Q. An Improved Adam’s Algorithm for Stomach Image Classification. Algorithms. 2024; 17(7):272. https://doi.org/10.3390/a17070272
Chicago/Turabian StyleSun, Haijing, Hao Yu, Yichuan Shao, Jiantao Wang, Lei Xing, Le Zhang, and Qian Zhao. 2024. "An Improved Adam’s Algorithm for Stomach Image Classification" Algorithms 17, no. 7: 272. https://doi.org/10.3390/a17070272
APA StyleSun, H., Yu, H., Shao, Y., Wang, J., Xing, L., Zhang, L., & Zhao, Q. (2024). An Improved Adam’s Algorithm for Stomach Image Classification. Algorithms, 17(7), 272. https://doi.org/10.3390/a17070272