Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search
Abstract
:1. Introduction
- We propose a zero-cost proxy for NAS called Incorporated-Score. It leverages zen-score and entropy factors to make an efficient proxy to rank networks. The entropy of the network plays a role as the auxiliary optimization factor to zen score in the proposed method.
- EZenNet, which is designed by Incorporated-Score, outperforms the baseline SOTA ZenNet, which is designed by Zen-NAS, and it achieves a new SOTA on the CIFAR-10/CIFAR-100/ImageNet datasets at a lightweight scale.
2. Related Works
2.1. Information Theory in Deep Learning
2.2. Neural Architecture Search (NAS)
3. Preliminary
3.1. L-Layer Neural Network Notation
3.2. Vanilla Convolutional Neural Network
3.3. Entropy of MLP Models
3.4. Effectiveness of Controlling Extremely Deep Networks in MLP
4. Method
4.1. Zen-Score
4.2. Maximizing Entropy as Expressivity of Network
4.2.1. Entropy of CNN
4.2.2. Effectiveness Entropy for Scoring Networks
4.3. Effectiveness of the Proposed Incorporated-Score as a Proxy for NAS
4.4. Incorporated-NAS with Optimized Incorporated-Score
Algorithm 1 Incorporated-NAS |
|
Algorithm 2 Mutation |
|
5. Experiments
5.1. Experimental Setting
5.1.1. NAS Settings
- Incorporated-Score-l: Incorporated-Score is generated by using the logarithm function as the normalization function.
- Incorporated-Score-s: Incorporated-Score is generated by using the square-root function as the normalization function.
- Incorporated-Score-w/o-ls: To explain the effect of our normalization approach, we experiment with Incorporated-Score without any normalization and balanced weights.
5.1.2. Training Setting
- Dataset: CIFAR-10 and CIFAR-100 are two benchmark datasets for image classification. CIFAR-10 consists of 50,000 images for training and 10,000 images for testing in 10 classes. Each image has a resolution of . CIFAR-100 has a similar number of samples for training and testing but is divided into 100 classes. ImageNet-1k is a large dataset that includes over 1.2 million images for training and 50,000 test images divided into 1000 classes. We experiment with the official training and validation dataset.
- Optimizer: The optimizer is SGD with a momentum of 0.9 for all experiments. The weight decay is and for CIFAR-10/100 and ImageNet, respectively. The batch size is 256, with an initial learning rate of 0.1 and cosine learning rate decay [63]. For CIFAR-10 and CIFAR-100, we train the models up to 1440 epochs. For ImageNet, the number of train epochs is 480. Following previous research [17,18,64], we use EfficientNet-B3 as a teacher network when training EZenNets.
5.2. Search for Hyperparameters (,) for Each Normalization Proxy
5.3. Comparison of Results between Incorporated-Score and Zen-Score on CIFAR-10 and CIFAR-100 Datasets
5.4. The Effectiveness of Normalization
5.5. Incorporated-Score for Lightweight Model on ImageNet Dataset
5.6. Architecture Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; Sun, J. Single Path One-Shot Neural Architecture Search with Uniform Sampling. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 544–560. [Google Scholar] [CrossRef]
- Luo, R.; Tian, F.; Qin, T.; Chen, E.; Liu, T.Y. Neural architecture optimization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 3–8 December 2018; pp. 7827–7838. [Google Scholar]
- Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 29–31 January 2019. [Google Scholar] [CrossRef]
- Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 2902–2911. [Google Scholar]
- Xie, L.; Yuille, A. Genetic CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1388–1397. [Google Scholar] [CrossRef]
- Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing Neural Network Architectures using Reinforcement Learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Wen, W.; Liu, H.; Chen, Y.; Li, H.; Bender, G.; Kindermans, P.J. Neural Predictor for Neural Architecture Search. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 660–676. [Google Scholar] [CrossRef]
- Luo, R.; Tan, X.; Wang, R.; Qin, T.; Chen, E.; Liu, T.Y. Semi-Supervised Neural Architecture Search. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.; Tian, Q.; Xiong, H. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Zhou, D.; Zhou, X.; Zhang, W.; Loy, C.C.; Yi, S.; Zhang, X.; Ouyang, W. EcoNAS: Finding Proxies for Economical Neural Architecture Search. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11393–11401. [Google Scholar]
- Yang, Z.; Wang, Y.; Chen, X.; Shi, B.; Xu, C.; Xu, C.; Tian, Q.; Xu, C. CARS: Continuous Evolution for Efficient Neural Architecture Search. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2019; pp. 1826–1835. [Google Scholar]
- Xie, S.; Zheng, H.; Liu, C.; Lin, L. SNAS: Stochastic neural architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Cai, H.; Zhu, L.; Han, S. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Benyahia, Y.; Yu, K.; Smires, K.B.; Jaggi, M.; Davison, A.C.; Salzmann, M.; Musat, C. Overcoming Multi-model Forgetting. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 594–603. [Google Scholar]
- Wan, A.; Dai, X.; Zhang, P.; He, Z.; Tian, Y.; Xie, S.; Wu, B.; Yu, M.; Xu, T.; Chen, K.; et al. FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 13–19 June 2020; pp. 12962–12971. [Google Scholar] [CrossRef]
- Cai, H.; Gan, C.; Wang, T.; Zhang, Z.; Han, S. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Li, C.; Peng, J.; Yuan, L.; Wang, G.; Liang, X.; Lin, L.; Chang, X. Block-Wisely Supervised Neural Architecture Search with Knowledge Distillation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2019; pp. 1986–1995. [Google Scholar]
- Yu, K.; Sciuto, C.; Jaggi, M.; Musat, C.; Salzmann, M. Evaluating The Search Phase of Neural Architecture Search. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Ying, C.; Klein, A.; Christiansen, E.; Real, E.; Murphy, K.; Hutter, F. NAS-Bench-101: Towards Reproducible Neural Architecture Search. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 7105–7114. [Google Scholar]
- Abdelfattah, M.S.; Mehrotra, A.; Dudziak, Ł.; Lane, N.D. Zero-Cost Proxies for Lightweight {NAS}. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Tanaka, H.; Kunin, D.; Yamins, D.L.K.; Ganguli, S. Pruning neural networks without any data by iteratively conserving synaptic flow. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 6–12 December 2020. [Google Scholar]
- Chen, W.; Gong, X.; Wang, Z. Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Mellor, J.; Turner, J.; Storkey, A.; Crowley, E.J. Neural Architecture Search without Training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 7588–7598. [Google Scholar]
- Lin, M.; Wang, P.; Sun, Z.; Chen, H.; Sun, X.; Qian, Q.; Li, H.; Jin, R. Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 337–346. [Google Scholar]
- Shen, X.; Wang, Y.; Lin, M.; Huang, Y.; Tang, H.; Sun, X.; Wang, Y. DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Montúfar, G.; Pascanu, R.; Cho, K.; Bengio, Y. On the number of linear regions of deep neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA, 8–14 December 2014; pp. 2924–2932. [Google Scholar]
- Daniely, A.; Frostig, R.; Singer, Y. Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Red Hook, NY, USA, 5–10 December 2016; pp. 2261–2269. [Google Scholar]
- Liang, S.; Srikant, R. Why Deep Neural Networks for Function Approximation? In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Poole, B.; Lahiri, S.; Raghu, M.; Sohl-Dickstein, J.; Ganguli, S. Exponential expressivity in deep neural networks through transient chaos. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Red Hook, NY, USA, 5–10 December 2016; pp. 3368–3376. [Google Scholar]
- Cohen, N.; Shashua, A. Inductive Bias of Deep Convolutional Networks through Pooling Geometry. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Lu, Z.; Pu, H.; Wang, F.; Hu, Z.; Wang, L. The expressive power of neural networks: A view from the width. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, 4–9 December 2017; pp. 6232–6240. [Google Scholar]
- Raghu, M.; Poole, B.; Kleinberg, J.; Ganguli, S.; Sohl-Dickstein, J. On the Expressive Power of Deep Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 2847–2854. [Google Scholar]
- Rolnick, D.; Tegmark, M. The power of deeper networks for expressing natural functions. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Serra, T.; Tjandraatmadja, C.; Ramalingam, S. Bounding and Counting Linear Regions of Deep Neural Networks. In Proceedings of the 35th International Conference on Machine Learning, Stockholm Sweden, 10–15 July 2018; Volume 80, pp. 4558–4566. [Google Scholar]
- Hanin, B.; Rolnick, D. Complexity of Linear Regions in Deep Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 2596–2604. [Google Scholar]
- Xiong, H.; Huang, L.; Yu, M.; Liu, L.; Zhu, F.; Shao, L. On the number of linear regions of convolutional neural networks. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, Virtual, 13–18 July 2020. [Google Scholar]
- Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
- Sun, Z.; Ge, C.; Wang, J.; Lin, M.; Chen, H.; Li, H.; Sun, X. Entropy-Driven Mixed-Precision Quantization for Deep Network Design. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Sun, Z.; Lin, M.; Sun, X.; Tan, Z.; Li, H.; Jin, R. MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
- Wang, J.; Sun, Z.; Qian, Y.; Gong, D.; Sun, X.; Lin, M.; Pagnucco, M.; Song, Y. Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition. arXiv 2023, arXiv:2303.02693. [Google Scholar]
- Yu, Y.; Chan, K.H.R.; You, C.; Song, C.; Ma, Y. Learning diverse and discriminative representations via the principle of maximal coding rate reduction. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 6–12 December 2020. [Google Scholar]
- Chan, K.H.R.; Yu, Y.; You, C.; Qi, H.; Wright, J.; Ma, Y. ReduNet: A white-box deep network from the principle of maximizing rate reduction. J. Mach. Learn. Res. 2022, 23, 1–103. [Google Scholar]
- Roberts, D.A.; Yaida, S.; Hanin, B. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar] [CrossRef]
- Saxe, A.M.; Bansal, Y.; Dapello, J.; Advani, M.; Kolchinsky, A.; Tracey, B.D.; Cox, D.D. On the Information Bottleneck Theory of Deep Learning. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.J.; Fei-Fei, L.; Yuille, A.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 19–35. [Google Scholar] [CrossRef]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2815–2823. [Google Scholar] [CrossRef]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- Kakade, S.M.; Sridharan, K.; Tewari, A. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization. In Proceedings of the NIPS, Vancouver, BC, Canada, 12 December 2008; pp. 793–800. [Google Scholar]
- Bartlett, P.L.; Foster, D.J.; Telgarsky, M. Spectrally-normalized margin bounds for neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, 4–9 December 2017; pp. 6241–6250. [Google Scholar]
- Neyshabur, B.; Li, Z.; Bhojanapalli, S.; LeCun, Y.; Srebro, N. The role of over-parametrization in generalization of neural networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Ruderman, D.L. The statistics of natural images. Netw. Comput. Neural Syst. 1994, 5, 517. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing Network Design Spaces. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10425–10433. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient Neural Architecture Search via Parameters Sharing. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 4095–4104. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Mané, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Strategies From Data. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 113–123. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13001–13008. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Aguilar, G.; Ling, Y.; Zhang, Y.; Yao, B.; Fan, X.; Guo, C. Knowledge Distillation from Internal Representations. Proc. AAAI Conf. Artif. Intell. 2020, 34, 7350–7357. [Google Scholar]
- Li, X.; Zhou, Y.; Pan, Z.; Feng, J. Partial Order Pruning: For Best Speed/Accuracy Trade-off in Neural Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Proxy | CIFAR-10 | CIFAR-100 | FLOPs | |
---|---|---|---|---|
Incorporated-Score-s | (0.9; 0.1) | 96.10% | 79.00% | 280 M |
(0.8; 0.2) | 96.86% | 81.10% | 309 M | |
(0.7; 0.3) | 96.86% | 80.50% | 472 M | |
(0.6; 0.4) | 97.21% | 79.85% | 592 M | |
(0.5; 0.5) | 96.63% | 79.57% | 315 M | |
(0.4; 0.6) | 96.74% | 81.00% | 572 M | |
(0.3; 0.7) | 96.80% | 79.61% | 692 M | |
(0.2; 0.8) | 97.08% | 80.10% | 630 M | |
(0.1; 0.9) | 96.71% | 80.40% | 634 M | |
Incorporated-Score-l | (0.9; 0.1) | 96.66% | 80.67% | 112 M |
(0.8; 0.2) | 96.50% | 80.18% | 411 M | |
(0.7; 0.3) | 96.09% | 79.32% | 472 M | |
(0.6; 0.4) | 96.56% | 78.03% | 592 M | |
(0.5; 0.5) | 96.29% | 79.19% | 315 M | |
(0.4; 0.6) | 96.21% | 79.17% | 572 M | |
(0.3; 0.7) | 96.53% | 79.49% | 692 M | |
(0.2; 0.8) | 96.39% | 79.40% | 630 M | |
(0.1; 0.9) | 96.33% | 79.66% | 634 M | |
Incorporated-Score-w/o-ls | (0.5; 0.5) | 96.66% | 75.97% | 560 M |
(0.9999; ) | 96.33% | 80.10% | 411 M |
Proxy | CIFAR-10 | CIFAR-100 |
---|---|---|
zen-score | 96.20% | 80.10% |
grad | 92.80% | 65.40% |
synflow | 95.10% | 75.90% |
TE-Score | 96.10% | 77.20% |
Incorporated-Score-l | 96.66% | 80.67% |
Incorporated-Score-s | 96.86% | 81.10% |
Incorporated-Score-w/o-ls | 96.59% | 75.97% |
Proxy | Model | N | Time |
---|---|---|---|
Incorporated-Score | ZenNet-400M-imagenet | 16 | 0.0345 |
ZenNet-1M-cifar | 16 | 0.0348 | |
zen-score | ZenNet-400M-imagenet | 16 | 0.0337 |
ZenNet-1M-cifar | 16 | 0.0346 |
Model | Top1-Acc (%) | FLOPs |
---|---|---|
EZenNet-400M-SE-l | 78.30 | 405 M |
EZenNet-400M-SE-s | 78.29 | 418 M |
ZenNet-400M-SE [25] | 78.00 | 410 M |
EZenNet-600M-SE-l | 79.64 | 610 M |
EZenNet-600M-SE-s | 79.73 | 609 M |
ZenNet-600M-SE [25] | 79.10 | 611 M |
Model | Resolution | Params | FLOPs | Top1-Acc |
---|---|---|---|---|
MobileNetV2-0.25 | 224 | 1.5 M | 44 M | 51.80% |
MobileNetV2-0.5 | 224 | 2.0 M | 108 M | 64.40% |
MobileNetV2-0.75 | 224 | 2.6 M | 226 M | 69.40% |
MobileNetV2-1.0 | 224 | 3.5 M | 320 M | 74.70% |
MobileNetV2-1.4 | 224 | 6.1 M | 610 M | 74.70% |
DFNet-1 | 224 | 8.5 M | 746 M | 69.80% |
RegNetY-200 MF | 224 | 3.2 M | 200 M | 70.40% |
RegNetY-400 MF | 224 | 4.3 M | 400 M | 74.10% |
RegNetY-600 MF | 224 | 6.1 M | 600 M | 75.50% |
RegNetY-800 MF | 224 | 6.3 M | 800 M | 76.30% |
OFANet-9 ms (+) | 224 | 5.2 M | 313 M | 75.30% |
OFANet-11 ms (+) | 224 | 6.2 M | 352 M | 76.10% |
OFANet-389 M (*) | 224 | 8.4 M | 389 M | 76.30% |
OFANet-482 M (*) | 224 | 9.1 M | 482 M | 78.80% |
OFANet-595 M (*) | 224 | 9.1 M | 595 M | 79.80% |
EfficientNet-B0 | 224 | 5.3 M | 390 M | 76.30% |
EfficientNet-B1 | 240 | 7.8 M | 700 M | 78.80% |
EfficientNet-B2 | 260 | 9.2 M | 1.0G | 79.80% |
DNANet-a | 224 | 4.2 M | 348 M | 77.10% |
DNANet-b | 224 | 4.9 M | 406 M | 77.50% |
DNANet-c | 224 | 5.3 M | 466 M | 77.80% |
DNANet-d | 224 | 6.4 M | 611 M | 78.40% |
MnasNet-1.0 | 224 | 4.4 M | 330 M | 74.20% |
Deep MAD-B0 | 224 | 5.3 M | 390 M | 76.10% |
ZenNet-400 M-SE | 224 | 5.7 M | 410 M | 78.00% |
ZenNet-600 M-SE | 224 | 7.1 M | 611 M | 79.10% |
ZenNet-900 M-SE | 224 | 13.3 M | 926 M | 80.80% |
EZenNet-400 M-SE-l | 224 | 7.1 M | 405 M | 78.30% |
EZenNet-400 M-SE-s | 224 | 7.2 M | 418 M | 78.29% |
EZenNet-600 M-SE-l | 224 | 9.1 M | 610 M | 79.64% |
EZenNet-600 M-SE-s | 224 | 9.6 M | 609 M | 79.73% |
EZenNet-800 M-SE-s | 224 | 12.9 M | 801 M | 80.10% |
Model | Method | Top1-Acc (%) | GPU Day |
---|---|---|---|
CARS-I [12] | EA | 75.20 | 0.40 |
PC-DARTS [10] | GD | 75.80 | 3.80 |
FBNetV2 [16] | GD | 77.20 | 25.00 |
MetaQNN [6] | RL | 77.40 | 96.00 |
TE-NAS [23] | ZS | 74.10 | 0.20 |
OFANet [17] | PS | 80.10 | 51.60 |
EZenNet-400M-SE-l | ZS | 78.30 | 0.13 |
EZenNet-400M-SE-s | ZS | 78.29 | 0.09 |
EZenNet-600M-SE-l | ZS | 79.64 | 0.13 |
EZenNet-600M-SE-s | ZS | 79.73 | 0.11 |
EZenNet-800M-SE-s | ZS | 80.10 | 0.20 |
Block | Kernel | Input | Output | Stride | Bottlenecks | # Layers | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 88 | 24 | 1 | 1 | - | - | 1 | 1 |
Btn | Res | 7 | 5 | 88 | 24 | 120 | 88 | 1 | 1 | 16 | 8 | 1 | 1 |
Btn | Btn | 7 | 7 | 120 | 88 | 192 | 304 | 2 | 2 | 16 | 16 | 3 | 5 |
Btn | Res | 5 | 5 | 192 | 304 | 224 | 48 | 1 | 1 | 24 | 8 | 4 | 3 |
Btn | Btn | 5 | 7 | 224 | 48 | 96 | 304 | 2 | 1 | 24 | 16 | 2 | 4 |
Btn | Btn | 3 | 5 | 96 | 304 | 168 | 80 | 2 | 2 | 40 | 32 | 3 | 2 |
Btn | Btn | 3 | 5 | 168 | 80 | 112 | 256 | 1 | 2 | 48 | 40 | 3 | 1 |
Conv | Conv | 1 | 1 | 112 | 256 | 512 | 232 | 1 | 1 | - | - | 1 | 1 |
Block | Kernel | Input | Output | Stride | Bottlenecks | # Layers | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 88 | 56 | 1 | 1 | - | - | 1 | 1 |
Btn | Res | 7 | 7 | 88 | 56 | 120 | 48 | 1 | 1 | 16 | 8 | 1 | 5 |
Btn | Btn | 7 | 3 | 120 | 48 | 192 | 200 | 2 | 2 | 16 | 32 | 3 | 3 |
Btn | Btn | 5 | 5 | 192 | 200 | 224 | 160 | 1 | 1 | 24 | 24 | 4 | 3 |
Btn | Btn | 5 | 7 | 224 | 160 | 96 | 624 | 2 | 2 | 24 | 16 | 2 | 4 |
Btn | Btn | 3 | 3 | 96 | 624 | 168 | 48 | 2 | 2 | 40 | 40 | 3 | 1 |
Btn | - | 3 | - | 168 | - | 112 | - | 1 | - | 48 | - | 3 | - |
Conv | Conv | 1 | 1 | 112 | 48 | 512 | 304 | 1 | 1 | - | - | 1 | 1 |
Block | Kernel | Input | Output | Stride | Bottleneck | Expansion | # Layers | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 16 | 32 | 2 | 2 | - | - | - | - | 1 | 1 |
MB | MB | 7 | 7 | 16 | 32 | 40 | 40 | 2 | 2 | 40 | 40 | 1 | 1 | 1 | 1 |
MB | MB | 7 | 7 | 40 | 40 | 64 | 64 | 2 | 2 | 64 | 40 | 1 | 2 | 1 | 1 |
MB | MB | 7 | 7 | 64 | 64 | 96 | 128 | 2 | 2 | 96 | 176 | 4 | 2 | 5 | 1 |
MB | MB | 7 | 7 | 96 | 128 | 224 | 256 | 2 | 2 | 224 | 152 | 2 | 4 | 5 | 3 |
- | MB | - | 7 | - | 256 | - | 104 | - | 1 | - | 152 | - | 6 | - | 4 |
Conv | Conv | 1 | 1 | 224 | 104 | 2048 | 2048 | 1 | 1 | - | - | - | - | 1 | 1 |
Block | Kernel | Input | Output | Stride | Bottleneck | Expansion | # Layers | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 16 | 40 | 2 | 2 | - | - | - | - | 1 | 1 |
MB | MB | 7 | 7 | 16 | 40 | 40 | 80 | 2 | 2 | 40 | 16 | 1 | 1 | 1 | 1 |
MB | MB | 7 | 7 | 40 | 80 | 64 | 80 | 2 | 2 | 64 | 48 | 1 | 1 | 1 | 1 |
MB | MB | 7 | 7 | 64 | 80 | 96 | 336 | 2 | 2 | 96 | 112 | 4 | 1 | 5 | 1 |
MB | MB | 7 | 7 | 96 | 336 | 224 | 360 | 2 | 2 | 224 | 360 | 2 | 1 | 5 | 3 |
- | MB | - | 7 | - | 360 | - | 360 | - | 1 | - | 360 | - | 1 | - | 4 |
Conv | Conv | 1 | 1 | 224 | 360 | 2048 | 2048 | 1 | 1 | - | - | - | - | 1 | 1 |
Block | Kernel | Input | Output | Stride | Bottleneck | Expansion | # Layers | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 24 | 16 | 2 | 2 | - | - | - | - | 1 | 1 |
MB | MB | 7 | 7 | 24 | 16 | 48 | 32 | 2 | 2 | 16 | 32 | 1 | 4 | 1 | 1 |
MB | MB | 7 | 7 | 48 | 32 | 72 | 72 | 2 | 2 | 16 | 40 | 2 | 4 | 1 | 1 |
MB | MB | 7 | 7 | 72 | 72 | 96 | 304 | 2 | 2 | 24 | 96 | 6 | 2 | 5 | 1 |
MB | MB | 7 | 7 | 96 | 304 | 192 | 360 | 2 | 2 | 24 | 360 | 4 | 1 | 5 | 3 |
- | MB | - | 7 | - | 360 | - | 176 | - | 1 | 40 | 240 | - | 4 | - | 5 |
Conv | Conv | 1 | 1 | 192 | 176 | 2048 | 2048 | 1 | 1 | - | - | - | - | 1 | 1 |
Block | Kernel | Input | Output | Stride | Bottleneck | Expansion | # Layers | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Conv | Conv | 3 | 3 | 3 | 3 | 24 | 16 | 2 | 2 | - | - | - | - | 1 | 1 |
MB | MB | 7 | 7 | 24 | 16 | 48 | 24 | 2 | 2 | 16 | 64 | 1 | 2 | 1 | 1 |
MB | MB | 7 | 7 | 48 | 24 | 72 | 120 | 2 | 2 | 16 | 120 | 2 | 1 | 1 | 1 |
MB | MB | 7 | 7 | 72 | 120 | 96 | 192 | 2 | 2 | 24 | 144 | 6 | 2 | 5 | 1 |
MB | MB | 7 | 7 | 96 | 192 | 192 | 320 | 2 | 2 | 24 | 224 | 4 | 2 | 5 | 4 |
- | MB | - | 7 | - | 320 | - | 384 | - | 1 | 40 | 384 | - | 1 | - | 5 |
Conv | Conv | 1 | 1 | 192 | 384 | 2048 | 2048 | 1 | 1 | - | - | - | - | 1 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nguyen, T.-T.; Han, J.-H. Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search. Electronics 2024, 13, 3325. https://doi.org/10.3390/electronics13163325
Nguyen T-T, Han J-H. Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search. Electronics. 2024; 13(16):3325. https://doi.org/10.3390/electronics13163325
Chicago/Turabian StyleNguyen, Thi-Trang, and Ji-Hyeong Han. 2024. "Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search" Electronics 13, no. 16: 3325. https://doi.org/10.3390/electronics13163325
APA StyleNguyen, T. -T., & Han, J. -H. (2024). Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search. Electronics, 13(16), 3325. https://doi.org/10.3390/electronics13163325