Compression of Neural Networks for Specialized Tasks via Value Locality
Abstract
:1. Introduction
- We introduce the notion of value locality in the context of deep neural networks used for specialized tasks.
- We present the VELCRO algorithm, which exploits value locality to compress neural networks that are deployed for specialized tasks.
- VELCRO introduces a fast compression process which solely employs statistics gathering through the inference process and avoids heavy computations involved in backpropagation training, which is usually used by traditional compression approaches such as pruning.
- VELCRO can be used directly in conjunction with other compression methods such as pruning and quantization.
- The results of our experiments indicate that
- VELCRO produces a compression-saving ratio of computations in the range 20.0–27.7% for ResNet-18, 25–30% for GoogLeNet, and 13.5–20% for MobileNet V2 with no impact on model accuracy;
- VELCRO significantly improves accuracy by 2–20% for specialized-task CNNs when given a relatively small compression-savings target.
- We demonstrate the computational and energy savings of VELCRO by implementing the compression algorithm in hardware on FPGA. Our experimental results indicate a 13.5–30% reduction in energy consumption with VELCRO, which corresponds to the compression-saving ratio.
2. Prior Works
3. Method and Algorithm
3.1. Value Locality of Specialized Convolutional Neural Networks
3.2. VELCRO Algorithm for Specialized Neural Networks
- Preprocessing stage: In this stage, VELCRO makes an inference by applying the original CNN model to a small subset of images from the specialized task preprocessing dataset. Note that the performance of the compressed model is evaluated on a validation dataset which is distinct from the preprocessing dataset. This is discussed in detail in Section 4. During this stage, the variance tensor is calculated by using Equation (1) for each activation output in each convolution layer in the CNN model. Because the preprocessing stage of VELCRO relies only on inference, it involves a significantly smaller computational overhead with respect to traditional compression methods, which employ heavy backpropagation training processes that can last from a few hours up to hundreds of hours [61].
- Compression stage: The compression stage uses a tuple of threshold values provided by the user as a hyperparameter for the algorithm. Each threshold element in the tuple corresponds to an individual activation function in each convolution layer. The threshold value of each layer represents the percentile of elements in the variance tensor to be compressed by the algorithm. All elements in the activation tensor with a variance within the percentile threshold are replaced by the arithmetic average constant of the elements located in the same corresponding coordinates. All other activation elements remain unchanged. Replacing activation function output elements by constants avoids not only the activation function computation but also the particular convolution computation of their related OFM elements. In fact, the compression savings of each layer is determined by the corresponding threshold, so the user can determine the overall compression-saving ratio C for the model through the threshold tuple as follows:
Algorithm 1: VELCRO algorithm for specialized neural networks |
Input: A CNN model M with K activation-function outputs (each in a different convolution layer), N preprocessing images, and a threshold tuple T = {T0, T1, …, TK}, where . Output: A compressed CNN Model MC. Preprocessing stage Step 1: Let A(k) be the activation-function output tensor in convolution layer k and let A(m)(k) be the corresponding activation-tensor values at the inference of image m, , where the tensors A[k] and A(m)[k] have dimension and ck, wk, and hk are the number of channels, the width, and the height of the tensor at convolution layer k, respectively. Step 2: For every , , and : Let tensors S and K be initialized such that S[k][c][i][j] = 0 and Q[k][c][i][j] = 0 Step 3: For each image : Perform inference by model M on image m. For every convolution layer : For every , , and , Let the tensors S and Q be S[k][c][i][j] = S[k][c][i][j] + A(m)[k][c][i][j] Q[k][c][i][j] = Q[k][c][i][j] + (A(m)[k][c][i][j])2. Step 4: Let B[k] be the arithmetic average tensor in convolution layer k such that each tensor element is For every , , and , Step 5: Let V[k] be the variance tensor of convolution layer k such that each tensor element is For each , , and Compression stage: Step 6: For each convolution layer : Let p(x,Y) be the percentile function of element x in tensor Y. p returns the percentile value for x with respect to all elements in tensor Y. Let the tensor be For each , , and Step 7: Let the compressed CNN model MC be such that every activation function output tensor A[k] is replaced with for every convolution layer . |
4. Experimental Results and Discussion
4.1. Experimental Environment
4.2. Experimental Analysis of Value Locality
4.3. Performance of Compression Algorithm
4.4. Hardware Implementation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
Compression Saving Ratio | ResNet-18 Threshold Tuple | GoogLeNet Threshold Tuple | MobileNet V2 Threshold Tuple |
---|---|---|---|
10% | (3, 3, 4, 4, 10, 10, 10, 10, 10, 10, 10, 10, 70, 10, 80, 90) | (8, 7, 7, 7, 7, 7, 8, 7, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 5, 6, 5, 5, 5, 5, 5, 5, 14, 14, 5, 5, 5, 8, 5, 5, 5,15, 15, 15, 36, 37, 18, 20, 34, 34, 34, 25, 40, 34, 34, 90, 90, 96, 90, 92, 92) | (22, 21, 10, 11, 11, 11, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 40, 40, 90) |
13.5% | n/a | n/a | (40, 40, 7, 2, 5, 2, 1, 11, 2, 10, 10, 15, 10, 15, 20, 10, 10, 15, 2, 5, 5, 10, 10, 10, 10, 15, 7, 5, 40, 42, 40, 40, 40, 40, 90) |
20% | (17, 16, 17, 10, 10, 20, 20, 20, 20, 20, 30, 27, 70, 10, 80, 90) | (17, 15, 15, 15, 15, 15, 17, 15, 15, 15, 15, 19, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 22, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 37, 37, 37, 36, 37, 37, 40, 34, 34, 34, 34, 40, 34, 34, 90, 90, 96, 90, 92, 92) | (35, 35, 18, 17, 17, 20, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 4, 16, 0, 16, 3, 16, 16, 16, 23, 16, 42, 40, 40, 42, 40, 45, 90) |
27.3% | (34, 36, 30, 15, 10, 25, 20, 20, 20, 20, 30, 27, 70, 10, 80, 90) | n/a | n/a |
30% | (40, 40, 36, 15, 12, 25, 20, 20, 20, 20, 30, 27, 70, 10, 80, 90) | (34, 24, 24, 24, 24, 24, 27, 24, 24, 24, 24, 29, 25, 24, 25, 24, 24, 25, 26, 24, 25, 24, 33, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 24, 27, 56, 56, 54, 52, 56, 54, 60, 52, 52, 52, 52, 60, 52, 52, 90, 90, 96, 90, 92, 92) | (38, 38, 33, 32, 32, 31, 20, 20, 25, 31, 31, 26, 26, 26, 26, 26, 26, 26, 4, 28, 0, 27, 6, 26, 26, 25, 24, 17, 42, 40, 40, 42, 40, 45, 90) |
40% | (70, 61, 60, 15, 12, 25, 20, 20, 20, 20, 30, 27, 70, 10, 80, 90) | (46, 32, 32, 32, 32, 32, 36, 32, 32, 32, 32, 36, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 44, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 36, 75, 75, 75, 75, 75, 75, 80, 75, 75, 75, 75, 80, 75, 75, 92, 92, 96, 92, 94, 94) | (66, 66, 35, 17, 32, 20, 17, 38, 13, 52, 52, 64, 44, 58, 58, 42, 42, 54, 13, 33, 33, 36, 36, 36, 36, 54, 26, 21, 80, 80, 80, 80, 80, 80, 90) |
Compression Saving Ratio | ResNet-18 Threshold Tuple | GoogLeNet Threshold Tuple | MobileNet V2 Threshold Tuple |
---|---|---|---|
10% | (8, 8, 8, 3, 8, 9, 9, 9, 9, 9, 9, 17, 17, 6, 60, 80) | (7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 27, 16, 72, 72, 90, 92, 90, 91) | (20, 20, 10, 10, 10, 10, 5, 0, 10, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 40, 40, 90) |
19.76% | n/a | n/a | (29, 29, 16, 11, 15, 18, 13, 13, 36, 35, 43, 21, 13, 5, 16, 30, 23, 8, 5, 6, 0, 1, 7, 17, 10, 10, 5, 13, 40, 47, 41, 40, 42, 42, 90) |
20% | (21, 21, 20, 6, 10, 21, 20, 20, 20, 18, 33, 24, 30, 6, 71, 80) | (15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 20, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 55, 32, 72, 72, 90, 92, 90, 91) | n/a |
25.46% | n/a | (20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 22, 22, 20, 20, 20, 22, 20, 20, 20, 20, 20, 20, 20, 20, 20, 22, 20, 20, 20, 20, 20, 20, 20, 25, 50, 52, 50, 50, 50, 52, 50, 52, 50, 50, 55, 55, 68, 42, 92, 90, 90, 92, 90, 91) | n/a |
27.7% | (44, 28, 38, 12, 12, 21, 20, 20, 20, 20, 33, 24, 32, 10, 71, 87) | n/a | n/a |
30% | (49, 33, 44, 13, 12, 21, 20, 20, 20, 20, 33, 24, 32, 12, 72, 90) | (25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 26, 24, 24, 24, 26, 24, 24, 24, 24, 24, 24, 24, 24, 24, 26, 24, 24, 24, 24, 24, 24, 24, 30, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 65, 65, 68, 50, 92, 90, 90, 92, 90, 91) | (45, 45, 25, 27, 23, 28, 19, 19, 54, 53, 64, 31, 19, 7, 24, 45, 34, 12, 7, 9, 0, 1, 10, 25, 15, 15, 7, 20, 60, 70, 61, 60, 62, 62, 90) |
40% | (50, 50, 50, 50, 40, 21, 20, 20, 20, 20, 33, 24, 32, 10, 70, 90) | (36, 36, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 38, 36, 36, 36, 38, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 45, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 65, 65, 68, 50, 92, 90, 90, 92, 90, 91) | (67, 67, 34, 37, 31, 38, 28, 28, 73, 71, 64, 42, 27, 9, 32, 60, 45, 16, 9, 12, 0, 1, 13, 33, 20, 20, 9, 26, 65, 75, 65, 65, 65, 70, 90) |
Compression Saving Ratio | ResNet-18 Threshold Tuple | GoogLeNet Threshold Tuple | MobileNet V2 Threshold Tuple |
---|---|---|---|
10% | (3, 5, 5, 10, 10, 10, 10, 10, 10, 10, 10, 12, 30, 13, 50, 80) | (6, 9, 9, 9, 9, 9, 15, 7, 7, 7, 7, 7, 10, 7, 8, 7, 10, 7, 7, 7, 8, 8, 8, 10, 8, 8, 8, 10, 8, 10, 12, 9, 7, 7, 2, 8, 10, 7, 7, 5, 12, 10, 30, 1, 27, 27, 30, 1, 32, 27, 23, 40, 90, 90, 95, 90) | (20, 20, 20, 10, 10, 10, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 20, 30, 75) |
16.8% | n/a | n/a | (30, 31, 20, 10, 15, 15, 11, 10, 10, 21, 16, 11, 1, 0, 26, 17, 21, 20, 2, 10, 0, 2, 8, 5, 5, 20, 4, 5, 2, 22, 20, 23, 20, 30, 75) |
20% | (21, 21, 20, 20, 20, 20, 10, 20, 20, 10, 10, 12, 30, 13, 50, 80) | (15, 20, 20, 20, 20, 20, 30, 14, 14, 14, 14, 14, 20, 14, 16, 14, 20, 14, 14, 14, 17, 17, 17, 20, 17, 17, 17, 20, 17, 20, 24, 18, 14, 14, 4, 17, 20, 14, 14, 10, 24, 20, 60, 3, 55, 55, 60, 2, 65, 55, 50, 90, 90, 90, 95, 90) | (35, 35, 25, 12, 19, 19, 14, 13, 13, 26, 24, 13, 1, 0, 30, 17, 21, 20, 2, 12, 0, 2, 9, 6, 6, 23, 5, 6, 2, 22, 20, 23, 20, 30, 75) |
27.70% | n/a | (20, 30, 30, 30, 30, 30, 45, 20, 20, 20, 20, 20, 30, 20, 25, 20, 30, 20, 20, 20, 25, 25, 25, 30, 25, 25, 25, 30, 25, 30, 35, 25, 20, 20, 7, 25, 30, 20, 30, 15, 35, 20, 60, 3, 55, 55, 60, 2, 65, 55, 50, 90, 90, 90, 95, 90) | |
30% | (35, 30, 30, 30, 30, 30, 30, 30, 30, 10, 12, 12, 30, 30, 50, 85) | (24, 32, 32, 32, 32, 32, 49, 22, 22, 22, 22, 22, 32, 22, 27, 22, 33, 22, 22, 22, 28, 28, 27, 33, 28, 28, 27, 33, 27, 33, 38, 27, 22, 22, 7, 27, 32, 22, 33, 16, 38, 20, 66, 3, 60, 60, 66, 2, 71, 60, 55, 90, 90, 90, 94, 89) | (52, 52, 42, 20, 29, 29, 21, 19, 19, 39, 36, 19, 1, 0, 45, 25, 31, 30, 3, 18, 0, 2, 13, 9, 9, 34, 7,8, 3, 22, 20, 23, 20, 30, 75) |
40% | (50, 40, 40, 40, 40, 40, 40, 40, 40, 10, 20, 20, 35, 35, 80, 90) | (36, 44, 44, 44, 44, 44, 55, 32, 32, 32, 30, 30, 43, 30, 37, 30, 44, 30, 30, 30, 40, 40, 38, 44, 36, 36, 36, 44, 30, 44, 52, 37, 30, 30, 12, 40, 44, 40, 45, 18, 55, 30, 67, 10, 65, 65, 66, 60, 71, 60, 55, 90, 90, 90, 94, 89) | (67, 67, 34, 37, 31, 38, 28, 28, 73, 71, 64, 42, 27, 9, 32, 60, 45, 16, 9, 12, 0, 1, 13, 33, 20, 20, 9, 26, 65, 75, 65, 65, 65, 70, 90) |
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Bianco, S.; Cadene, R.; Celona, L.; Napoletano, P. Benchmark Analysis of Representative Deep Neural Network Architectures. IEEE Access 2018, 6, 64270–64277. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Reed, R. Pruning algorithms—A survey. IEEE Trans. Neural Netw. 1993, 4, 740–747. [Google Scholar] [CrossRef]
- LeCun, Y.; Denker, J.S.; Solla, S.; Howard, R.E.; Jackel, L.D. Optimal brain damage. In Advances in Neural Information Processing Systems (NIPS 1989); Touretzky, D., Ed.; Morgan Kaufmann: Denver, CO, USA, 1990; Volume 2. [Google Scholar]
- Hassibi, B.; Stork, D.G.; Wolff, G.J. Optimal Brain Surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, USA, 28 March–1 April 1993; Volume 1, pp. 293–299. [Google Scholar]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding Deep Learning Requires Rethinking Generalization. arXiv 2016, arXiv:1611.03530. [Google Scholar]
- Vanhoucke, V.; Senior, A.; Mao, M.Z. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop; NIPS: Granada, Spain, 2011. [Google Scholar]
- Gong, Y.; Liu, L.; Yang, M.; Bourdev, L. Compressing deep convolutional networks using vector quantization. arXiv 2014, arXiv:1412.6115. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.-P. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Bali, Indonesia, 8–12 December 2015. [Google Scholar]
- Lin, Z.; Courbariaux, M.; Memisevic, R.; Bengio, Y. Neural networks with few multiplications. arXiv 2015, arXiv:1510.03009. [Google Scholar]
- Shen, H.; Han, S.; Philipose, M.; Krishnamurthy, A. Fast Video Classification via Adaptive Cascading of Deep Models. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kang, D.; Emmons, J.; Abuzaid, F.; Bailis, P.; Zaharia, M. NoScope: Optimizing Neural Network Queries over Video at Scale. Proc. VLDB Endow. 2017, 10, 1586–1597. [Google Scholar] [CrossRef]
- Kosaian, J.; Phanishayee, A.; Philipose, M.; Dey, D.; Vinayek, R. Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference beyond Increasing Batch Size. In Proceedings of the Proceedings of the 38th International Conference on Machine Learning, PMLR 139, Long Beach, CA, USA, 18–24 July 2021. [Google Scholar]
- Violaand, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. I–511–I–518. [Google Scholar]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.V.; Hinton, G.E.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 22–24 June 2009. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both weights and connections for efficient neural networks. arXiv 2015, arXiv:1506.02626. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Castellano, G.; Fanelli, A.M.; Pelillo, M. An iterative pruning algorithm for feedforward neural networks. IEEE Trans. Neural Netw. 1997, 8, 519–531. [Google Scholar] [CrossRef] [PubMed]
- Collins, M.D.; Kohli, P. Memory bounded deep convolutional networks. arXiv 2014, arXiv:1412.1442. [Google Scholar]
- Stepniewski, S.W.; Keane, A.J. Pruning backpropagation neural networks using modern stochastic optimisation techniques. Neural. Comput. Appl. 1997, 5, 76–98. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the Value of Network Pruning. arXiv 2018, arXiv:1810.05270. [Google Scholar]
- Anwar, S.; Hwang, K.; Sung, W. Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 2017, 13, 1–18. [Google Scholar] [CrossRef] [Green Version]
- Lebedev, V.; Lempitsky, V. Fast ConvNets using group-wise brain damage. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Zhou, H.; Alvarez, J.M.; Porikli, F. Less is more: Towards compact CNNs. In Computer Vision—ECCV 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 662–677. ISBN 9783319464923. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in Deep Neural Networks. Adv. Neural Inf. Process. Syst. 2016, 29, 2074–2082. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
- Srinivas, S.; Babu, R.V. Data-Free Parameter Pruning for Deep Neural Networks. In Proceedings of the British Machine Vision Conference 2015, Swansea, UK, 7–10 September 2015; British Machine Vision Association: Guildford, UK, 2015. [Google Scholar]
- Rao, Y.; Lu, J.; Lin, J.; Zhou, J. Runtime Neural Pruning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 2181–2191. [Google Scholar]
- Shomron, G.; Weiser, U. Spatial Correlation and Value Prediction in Convolutional Neural Networks. IEEE Comput. Arch. Lett. 2019, 18, 10–13. [Google Scholar] [CrossRef] [Green Version]
- Shomron, G.; Banner, R.; Shkolnik, M.; Weiser, U. Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks. In Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 234–250. [Google Scholar]
- See, A.; Luong, M.-T.; Manning, C.D. Compression of Neural Machine Translation Models via Pruning. arXiv 2016, arXiv:1606.09274. [Google Scholar]
- Narang, S.; Elsen, E.; Diamos, G.; Sengupta, S. Exploring Sparsity in Recurrent Neural Networks. arXiv 2017, arXiv:1704.05119. [Google Scholar]
- Zhu, M.; Gupta, S. To prune, or not to prune: Exploring the efficacy of pruning for model compression. arXiv 2017, arXiv:1710.01878. [Google Scholar]
- Yu, R.; Li, A.; Chen, C.-F.; Lai, J.-H.; Morariu, V.I.; Han, X.; Gao, M.; Lin, C.-Y.; Davis, L.S. NISP: Pruning Networks Using Neuron Importance Score Propagation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Cireşan, D.C.; Meier, U.; Masci, J.; Gambardella, L.M.; Schmidhuber, J. High-Performance Neural Networks for Visual Object Classification. arXiv 2011, arXiv:1102.0183. [Google Scholar]
- Chen, W.; Wilson, J.T.; Tyree, S.; Weinberger, K.Q.; Chen, Y. Compressing Neural Networks with the Hashing Trick. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
- Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv 2016, arXiv:1611.06440. [Google Scholar]
- Luo, J.-H.; Wu, J. Neural Network Pruning with Residual-Connections and Limited-Data. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Choi, J.; Wang, Z.; Venkataramani, S.; Chuang, P.I.-J.; Srinivasan, V.; Gopalakrishnan, K. PACT: Parameterized Clipping acTivation for quantized neural networks. arXiv 2018, arXiv:1805.06085. [Google Scholar]
- Park, E.; Yoo, S.; Vajda, P. Value-aware quantization for training and inference of neural networks. In Computer Vision—ECCV 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 608–624. ISBN 9783030012243. [Google Scholar]
- Zhou, S.; Wu, Y.; Ni, Z.; Zhou, X.; Wen, H.; Zou, Y. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv 2016, arXiv:1606.06160. [Google Scholar]
- Banner, R.; Nahshan, Y.; Hoffer, E.; Soudry, D. Post-training 4-bit quantization of convolution networks for rapid-deployment. arXiv 2018, arXiv:1810.05723. [Google Scholar]
- Choukroun, Y.; Kravchik, E.; Yang, F.; Kisilev, P. Low-bit quantization of neural networks for efficient inference. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Fang, J.; Shafiee, A.; Abdel-Aziz, H.; Thorsley, D.; Georgiadis, G.; Hassoun, J.H. Post-training piecewise linear quantization for deep neural networks. In Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 69–86. ISBN 9783030585358. [Google Scholar]
- Shomron, G.; Gabbay, F.; Kurzum, S.; Weiser, U. Post-Training Sparsity-Aware Quantization. arXiv 2021, arXiv:2105.11010. [Google Scholar]
- Buciluǎ, C.; Caruana, R.; Niculescu-Mizil, A. Model Compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD’06, Philadelphia, PA, USA, 20–23 August 2006; ACM Press: New York, NY, USA. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Cai, H.; Gan, C.; Wang, T.; Zhang, Z.; Han, S. Once-for-All: Train One Network and Specialize It for Efficient Deployment. arXiv 2019, arXiv:1908.09791. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Computer Vision—ECCV 2014; Springer International Publishing: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object Detectors Emerge in Deep Scene CNNs. arXiv 2014, arXiv:1412.6856. [Google Scholar]
- Morcos, A.S.; Barrett, D.G.T.; Rabinowitz, N.C.; Botvinick, M. On the Importance of Single Directions for Generalization. arXiv 2018, arXiv:1803.06959. [Google Scholar]
- Zhou, B.; Sun, Y.; Bau, D.; Torralba, A. Revisiting the Importance of Individual Units in CNNs via Ablation. arXiv 2018, arXiv:1806.02891. [Google Scholar]
- Boone-Sifuentes, T.; Robles-Kelly, A.; Nazari, A. Max-Variance Convolutional Neural Network Model Compression. In Proceedings of the 2020 Digital Image Computing: Techniques and Applications (DICTA), Melbourne, Australia, 29 November–2 December 2020; pp. 1–6. [Google Scholar]
- Li, Y.; Lin, S.; Zhang, B.; Liu, J.; Doermann, D.; Wu, Y.; Huang, F.; Ji, R. Exploiting kernel sparsity and entropy for interpretable CNN compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, California, CA, USA, 16–20 June 2019; pp. 2800–2809. [Google Scholar]
- Wang, Y.; Zhang, X.; Xie, L.; Zhou, J.; Su, H.; Zhang, B.; Hu, X. Pruning from Scratch. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12273–12280. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Xilinx. Breathe New Life into Your Data Center with Alveo Adaptable Accelerator Cards. Xilinx White Paper, WP499 (v1.0). Available online: https://www.xilinx.com/support/documentation/white_papers/wp499-alveo-intro.pdf (accessed on 19 November 2018).
- Xilinx. Vivado Design Suite. Xilinx White Paper, WP416 (v1.1). Available online: https://www.xilinx.com/support/documentation/white_papers/wp416-Vivado-Design-Suite.pdf (accessed on 22 June 2012).
Specialized Tasks | ILSVRC-2012 Classes |
---|---|
Cats-2 | Egyptian cat Persian cat |
Cats-3 | Egyptian cat Persian cat Cougar |
Cats-4 (Cats) | Egyptian cat Persian cat Cougar Tiger cat |
Dogs | English setter Siberian husky English springer Scottish deerhound |
Cars | Beach wagon Cab Convertible Minivan |
Specialized Task | ResNet-18 | GoogLeNet | MobileNet V2 |
---|---|---|---|
Cats | 27.73% | 30.00% | 13.50% |
Dogs | 27.70% | 25.46% | 19.76% |
Cars | 20.00% | 27.70% | 16.80% |
Compression Method | Network | Specialized Task | Training Required | Computation Acceleration | Accuracy Loss |
---|---|---|---|---|---|
Taylor criterion [42] | AlexNet | Yes | Yes | 1.9X | 0.3% |
CURL [43] | MobileNet V2 ResNet-50 | Yes Yes | Yes Yes | 3X 4X | Up to 4% Up to 2% |
Deep compression [22] | Various CNN models | No | Yes | 3X | None |
Weights and connection learning [21] | AlexNet | No | Yes | 3X | None |
KSE [59] | ResNet-50 | No | Yes | 3.8–4.7X | 0.84–0.64% |
VELCRO | ResNet-18 GoogLeNet MobileNet V2 | Yes | No | 1.25–1.38X 1.38–1.42X 1.15–1.24X | None None None |
Specialized Task | ResNet-18 | GoogLeNet | MobileNet V2 |
---|---|---|---|
Cats | 0.08% | 0.56% | 14.91% |
Dogs | 0.20% | 0.63% | 10.48% |
Cars | 0.31% | 0.64% | 12.00% |
Specialized Task | ResNet-18 | GoogLeNet | MobileNet V2 |
---|---|---|---|
Cats | 13.00% | 20.00% | 3.50% |
Dogs | 8.50% | 11.00% | 2.50% |
Cars | 4.00% | 15.00% | 4.50% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gabbay, F.; Shomron, G. Compression of Neural Networks for Specialized Tasks via Value Locality. Mathematics 2021, 9, 2612. https://doi.org/10.3390/math9202612
Gabbay F, Shomron G. Compression of Neural Networks for Specialized Tasks via Value Locality. Mathematics. 2021; 9(20):2612. https://doi.org/10.3390/math9202612
Chicago/Turabian StyleGabbay, Freddy, and Gil Shomron. 2021. "Compression of Neural Networks for Specialized Tasks via Value Locality" Mathematics 9, no. 20: 2612. https://doi.org/10.3390/math9202612
APA StyleGabbay, F., & Shomron, G. (2021). Compression of Neural Networks for Specialized Tasks via Value Locality. Mathematics, 9(20), 2612. https://doi.org/10.3390/math9202612