LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search
Abstract
:1. Introduction
- A continuous strategy redistributes weights to accelerate updates for optional operations during the search, minimizing the impact of low-weight operations on classification results and reducing search iterations.
- A dynamic sampler prunes underperforming operations in real time, cutting memory usage and simplifying individual search processes.
- An adaptive downsampling search algorithm is proposed that sparsifies the dense connection matrix to reduce redundant connections while ensuring the performance of the network.
2. Related Work
3. LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search
3.1. Search Space
3.2. Weight Redistribution Softmax
3.3. Dynamic Sampling
3.4. Adaptively Downsampling
Algorithm 1 Adaptive downsampling strategy |
|
Algorithm 2 LMD-DARTS |
|
4. Experiment
4.1. Evaluation Criteria
- (1)
- Parameters serve as an indicator to measure the storage space consumed via a convolutional neural network. Fewer parameters mean less memory usage, allowing for larger batch sizes during training and, thus, reducing the training time. The parameters are mainly derived from the convolution layer, the fully connected layer, the fully connected layer in the extrusion excitation module, and the index layer of the learning group convolution, with the convolution layer contributing the most. The units M and G of the number of parameters indicate that the number of parameters is one million and one billion, respectively.
- (2)
- GPU days are used to measure the complexity of the neural architecture search algorithm, indicating the number of days required for the algorithm to complete its search on a single GPU. For example, if an algorithm searches for three days on four GPUs, it is represented as 12 GPU days. For faster searches, GPU time metrics can be utilized.
4.2. Compared Methods
4.3. Experimental Setting
4.4. Experimental Result
4.5. Ablation Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cong, S.; Zhou, Y. A review of convolutional neural network architectures and their optimizations. Artif. Intell. Rev. 2023, 56, 1905–1969. [Google Scholar] [CrossRef]
- Xie, X.; Song, X.; Lv, Z.; Yen, G.G.; Ding, W. Efficient Evaluation Methods for Neural Architecture Search: A Survey. arXiv 2023, arXiv:2301.05919. [Google Scholar]
- Tian, S. Research on Neural Architecture Automatic Search and Neural Network Acceleration Technology; National University of Defense Technology: Changsha, China, 2021. (In Chinese) [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 4780–4789. [Google Scholar]
- Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing neural network architectures using reinforcement learning. arXiv 2016, arXiv:1611.02167. [Google Scholar]
- Zhong, Z.; Yan, J.; Wu, W.; Shao, J.; Liu, C.L. Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2423–2432. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
- Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; Wang, J. Efficient architecture search by network transformation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2787–2794. [Google Scholar]
- Qin, X.; Wang, Z. Nasnet: A neuron attention stage-by-stage net for single image deraining. arXiv 2019, arXiv:1912.03151. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2820–2828. [Google Scholar]
- Xie, L.; Yuille, A. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1379–1388. [Google Scholar]
- Suganuma, M.; Shirakawa, S.; Nagao, T. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017; pp. 497–504. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Schoenholz, S.S.; Le, Q.V. Intriguing properties of adversarial examples. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; Kavukcuoglu, K. Hierarchical representations for efficient architecture search. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- So, D.; Le, Q.; Liang, C. The evolved transformer. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5877–5886. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Cai, H.; Zhu, L.; Han, S. Proxylessnas: Direct neural architecture search on target task and hardware. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.J.; Tian, Q.; Xiong, H. Pc-darts: Partial channel connections for memory-efficient architecture search. arXiv 2019, arXiv:1907.05737. [Google Scholar]
- Chen, X.; Xie, L.; Wu, J.; Tian, Q. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1294–1303. [Google Scholar]
- Zheng, X.; Ji, R.; Tang, L.; Zhang, B.; Liu, J.; Tian, Q. Multinomial distribution learning for effective neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1304–1313. [Google Scholar]
- Hundt, A.; Jain, V.; Hager, G.D. sharpdarts: Faster and more accurate differentiable architecture search. arXiv 2019, arXiv:1903.09900. [Google Scholar]
- Wang, H.; Wang, Y.; Sun, R.; Li, B. Global convergence of maml and theory-inspired neural architecture search for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9797–9808. [Google Scholar]
- Xue, Y.; Qin, J. Partial connection based on channel attention for differentiable neural architecture search. IEEE Trans. Ind. Inform. 2023, 19, 6804–6813. [Google Scholar] [CrossRef]
- Huang, H.; Shen, L.; He, C.; Dong, W.; Liu, W. Differentiable neural architecture search for extremely lightweight image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 2672–2682. [Google Scholar] [CrossRef]
- Luo, X.; Liu, D.; Kong, H.; Huai, S.; Chen, H.; Liu, W. Surgenas: A comprehensive surgery on hardware-aware differentiable neural architecture search. IEEE Trans. Comput. 2023, 72, 1081–1094. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Yu, Z. DARTS-PAP: Differentiable neural architecture search by polarization of instance complexity weighted architecture parameters. In Proceedings of the International Conference on Multimedia Modeling, Bergen, Norway, 9–12 January 2023; Springer Nature: Cham, Switzerland, 2023; pp. 277–288. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Zhang, H.; Li, Y.; Chen, H.; Gong, C.; Bai, Z.; Shen, C. Memory-efficient hierarchical neural architecture search for image restoration. Int. J. Comput. Vis. 2022, 130, 157–178. [Google Scholar] [CrossRef]
- Priyadarshi, S.; Jiang, T.; Cheng, H.P.; Krishna, S.; Ganapathy, V.; Patel, C. DONNAv2-Lightweight Neural Architecture Search for Vision tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 1384–1392. [Google Scholar]
- Liu, C.; Chen, L.C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A.L.; Fei-Fei, L. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 82–92. [Google Scholar]
- Mandal, M.; Meedimale, Y.R.; Reddy, M.S.K.; Vipparthi, S.K. Neural architecture search for image dehazing. IEEE Trans. Artif. Intell. 2022, 4, 1337–1347. [Google Scholar] [CrossRef]
- Liu, Y.; Yan, Z.; Tan, J.; Li, Y. Multi-purpose oriented single nighttime image haze removal based on unified variational retinex model. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1643–1657. [Google Scholar] [CrossRef]
- Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4095–4104. [Google Scholar]
- Ye, P.; Li, B.; Li, Y.; Chen, T.; Fan, J.; Ouyang, W. b-darts: Beta-decay regularization for differentiable architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10864–10873. [Google Scholar]
- Lin, Y.; Endo, Y.; Lee, J.; Kamijo, S. Bandit-NAS: Bandit sampling and training method for Neural Architecture Search. Neurocomputing 2024, 597, 127684. [Google Scholar] [CrossRef]
- Yu, H.; Peng, H.; Huang, Y.; Fu, J.; Du, H.; Wang, L.; Ling, H. Cyclic differentiable architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 211–228. [Google Scholar] [CrossRef] [PubMed]
- Nayman, N.; Noy, A.; Ridnik, T.; Friedman, I.; Jin, R.; Zelnik, L. Xnas: Neural architecture search with expert advice. Adv. Neural Inf. Process. Syst. 2019, 32, 1975–1985. [Google Scholar]
- Cai, Z.; Chen, L.; Liu, H.L. EPC-DARTS: Efficient partial channel connection for differentiable architecture search. Neural Netw. 2023, 166, 344–353. [Google Scholar] [CrossRef] [PubMed]
- Xue, Y.; Han, X.; Wang, Z. Self-Adaptive Weight Based on Dual-Attention for Differentiable Neural Architecture Search. IEEE Trans. Ind. Inform. 2024, 20, 6394–6403. [Google Scholar] [CrossRef]
- He, H.; Liu, L.; Zhang, H.; Zheng, N. IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance. In Proceedings of the AAAI Conference on Artificial Intelligence, Stanford, CA, USA, 25–27 March 2024; pp. 12367–12375. [Google Scholar]
- Ma, B.; Zhang, J.; Xia, Y.; Tao, D. VNAS: Variational Neural Architecture Search. Int. J. Comput. Vis. 2024, 1–25. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Xiao, H.; Wang, Z.; Zhu, Z.; Zhou, J.; Lu, J. Shapley-NAS: Discovering operation contribution for neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11892–11901. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Initial value | 0.15 | 0.1 | 0.05 | 0 | −0.05 | −0.1 | −0.15 |
Softmax | 0.165 | 0.157 | 0.150 | 0.142 | 0.135 | 0.129 | 0.122 |
WRD-softmax(t = 0.9) | 0.174 | 0.160 | 0.145 | 0.130 | 0.130 | 0.130 | 0.130 |
WRD-softmax(t = 0.8) | 0.179 | 0.162 | 0.145 | 0.129 | 0.129 | 0.129 | 0.129 |
WRD-softmax(t = 0.7) | 0.184 | 0.164 | 0.145 | 0.127 | 0.127 | 0.127 | 0.127 |
WRD-softmax(t = 0.6) | 0.192 | 0.168 | 0.145 | 0.124 | 0.124 | 0.124 | 0.124 |
WRD-softmax(t = 0.5) | 0.203 | 0.173 | 0.145 | 0.120 | 0.120 | 0.120 | 0.120 |
WRD-softmax(t = 0.4) | 0.220 | 0.180 | 0.145 | 0.114 | 0.114 | 0.114 | 0.114 |
WRD-softmax(t = 0.3) | 0.250 | 0.191 | 0.143 | 0.104 | 0.104 | 0.104 | 0.104 |
WRD-softmax(t = 0.2) | 0.314 | 0.210 | 0.136 | 0.085 | 0.085 | 0.085 | 0.085 |
WRD-softmax(t = 0.1) | 0.519 | 0.233 | 0.098 | 0.038 | 0.038 | 0.038 | 0.038 |
Model | Paper Results/% | Our Impl/% | Parm/M | Search Time/GPU Days | Search Strategy |
---|---|---|---|---|---|
ResNet-50 [45] | 93.03 | - | 25.56 | - | Manual |
DenseNet121 [46] | 94.04 | - | 6.96 | - | Manual |
SENet [47] | 95.23 | - | 11.2 | - | Manual |
NASNet [8] | 97.35 | 97.33 | 3.3 | 1800 †† | RL |
ENAS [34] | 97.11 | 97.14 | 4.6 | 0.45 | RL |
Beta-DARTS [35] | 97.49 | - | 3.78 | 0.4 # | RL |
CDARTS [37] | 97.52 | - | 3.98 | 0.3 * | Gradient |
AmoebaNet-B [5] | 97.42 | 97.47 | 3.2 | 3150 ‡‡ | Gradient |
XNAS [38] | 98.2 | 97.45 | 3.79 | 0.3 | Gradient |
DARTS [17] | 97.08 | 97.09 | 4.38 | 1.0 | Gradient |
PC-DARTS [19] | 97.36 | 97.18 | 3.98 | 0.15 | Gradient |
EPC-DARTS [39] | 97.6 | - | 3.2 | 0.2 † | Gradient |
SWD-NAS [40] | 97.49 | - | 3.17 | 0.13 ‡ | Gradient |
IS-DARTS [41] | 97.6 | - | 4.47 | 0.42 ‡ | Gradient |
Bandit-NAS [36] | 97.06 | - | 3.4 | 0.3 # | RL |
VNAS [42] | 97.69 | - | 3.5 | 0.3 * | Gradient |
LMD-DARTS a | 97.34 | - | 4.25 | 0.12 | Gradient |
LMD-DARTS b | 97.2 | - | 4.03 | 0.11 | Gradient |
LMD-DARTS | 97.42 | - | 4.23 | 0.12 | Gradient |
Model | Top-1 ACC/% | Top-5 ACC/% | Parm/M | FLOPs | Search Cost/GPU Days | Search Method |
---|---|---|---|---|---|---|
VGG-16 [48] | 71.93 | 90.67 | 138.36 | 15.48 G | - | Manual |
ResNet-101 [22] | 80.13 | 95.4 | 44.55 | 7.83 G | - | Manual |
MobileNet V2 [49] | 71.8 | 91 | 3.5 | 0.3 G | - | Manual |
EfficientNet [43] | 77.1 | 93.3 | 5.3 | 399 M | - | Grid Search |
NASNet [8] | 74 | 91.6 | 5.3 | 564 M | 1800 †† | RL |
DARTS [17] | 74.3 | 91.3 | 4.7 | 574 M | 4 | Gradient |
PC-DARTS [19] | 74.9 | 92.2 | 5.3 | 586 M | 3.8 | Gradient |
Shapley-NAS [44] | 76.1 | - | 5.4 | 582 M | 4.2 # | Gradient |
VNAS [42] | 76.3 | 92.9 | 5.4 | 599 M | 5 * | Gradient |
LMD-DARTS | 75.2 | 93.2 | 5.5 | 602 M | 2.9 | Gradient |
Module | Results/% | Parm/M |
---|---|---|
Adaptively downsampling | 97.08 | 4.38 |
DAS-Block | 97.36 | 3.98 |
LMD-DARTS | 97.42 | 4.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Xu, Y.; Ying, P.; Chen, H.; Sun, R.; Xu, X. LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search. Electronics 2024, 13, 2743. https://doi.org/10.3390/electronics13142743
Li Z, Xu Y, Ying P, Chen H, Sun R, Xu X. LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search. Electronics. 2024; 13(14):2743. https://doi.org/10.3390/electronics13142743
Chicago/Turabian StyleLi, Zhongnian, Yixin Xu, Peng Ying, Hu Chen, Renke Sun, and Xinzheng Xu. 2024. "LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search" Electronics 13, no. 14: 2743. https://doi.org/10.3390/electronics13142743
APA StyleLi, Z., Xu, Y., Ying, P., Chen, H., Sun, R., & Xu, X. (2024). LMD-DARTS: Low-Memory, Densely Connected, Differentiable Architecture Search. Electronics, 13(14), 2743. https://doi.org/10.3390/electronics13142743