Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models
Abstract
:1. Introduction
- We propose “selected path skipping“ during the quantization of recurrent units to decrease the error rates.
- We analyze and show a positive synergic effect of quantization on the delta networks method.
- We propose a method for delta threshold selection in a post-training scenario.
- We compare the four RNN models to find the smallest size and least number of operations at different error levels.
2. Related Work
3. Background
3.1. Recurrent Neural Network Models
3.2. Post-Training Quantization
3.3. Delta Networks Method
4. Method
4.1. Quantization of Recurrent Neural Networks Models
4.2. Applying the Delta Networks Method on Quantized Recurrent Models
5. Evaluation
5.1. Impact of Skipping Quantization in Selected Recurrent Paths
5.2. Post-Training Quantization of RNN Models
5.3. Redundant Operation Elimination
6. Discussion
6.1. Compressability Analysis
6.2. Absolute Comparison
7. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Rezk, N.M.; Purnaprajna, M.; Nordström, T.; Ul-Abdin, Z. Recurrent Neural Networks: An Embedded Computing Perspective. IEEE Access 2020, 8, 57967–57996. [Google Scholar] [CrossRef]
- Gao, C.; Neil, D.; Ceolini, E.; Liu, S.C.; Delbruck, T. DeltaRNN: A Power-Efficient Recurrent Neural Network Accelerator; Association for Computing Machinery: New York, NY, USA, 2018; pp. 21–30. [Google Scholar] [CrossRef] [Green Version]
- Jo, J.; Kung, J.; Lee, S.; Lee, Y. Similarity-based LSTM architecture for energy-efficient edge-level speech recognition. In Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Lausanne, Switzerland, 29–31 July 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Rezk, N.M.; Nordström, T.; Stathis, D.; Ul-Abdin, Z.; Aksoy, E.E.; Hemani, A. MOHAQ: Multi-objective Hardware-Aware Quantization of Recurrent Neural Networks. arXiv 2021, arXiv:2108.01192. [Google Scholar]
- Banner, R.; Nahshan, Y.; Soudry, D. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 7950–7958. [Google Scholar]
- Nagel, M.; Amjad, R.A.; Van Baalen, M.; Louizos, C.; Blankevoort, T. Up or down? Adaptive rounding for post-training quantization. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 7197–7206. [Google Scholar]
- Nahshan, Y.; Chmiel, B.; Baskin, C.; Zheltonozhskii, E.; Banner, R.; Bronstein, A.M.; Mendelson, A. Loss aware post-training quantization. Mach. Learn. 2021, 110, 3245–3262. [Google Scholar] [CrossRef]
- Zhao, R.; Hu, Y.; Dotzel, J.; De Sa, C.; Zhang, Z. Improving neural network quantization without retraining using outlier channel splitting. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 7543–7552. [Google Scholar]
- Cai, Y.; Yao, Z.; Dong, Z.; Gholami, A.; Mahoney, M.W.; Keutzer, K. Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13169–13178. [Google Scholar]
- Aji, A.F.; Heafield, K. Neural Machine Translation with 4-Bit Precision and Beyond. arXiv 2019, arXiv:1909.06091. [Google Scholar]
- Chen, T.; Goodfellow, I.; Shlens, J. Net2Net: Accelerating Learning via Knowledge Transfer. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Neil, D.; Lee, J.H.; Delbruck, T.; Liu, S.C. Delta networks for optimized recurrent network computation. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2584–2593. [Google Scholar]
- Gao, C.; Rios-Navarro, A.; Chen, X.; Delbruck, T.; Liu, S.C. EdgeDRNN: Enabling low-latency recurrent neural network edge inference. In Proceedings of the 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genova, Italy, 31 August–2 September 2020; pp. 41–45. [Google Scholar] [CrossRef] [Green Version]
- Shan, B.; Fang, Y. DRAC: A delta recurrent neural network-based arithmetic coding algorithm for edge computing. Complex Intell. Syst. 2021, 1–7. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; van Merrienboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014; pp. 103–111. [Google Scholar]
- Ravanelli, M.; Brakel, P.; Omologo, M.; Bengio, Y. Light Gated Recurrent Units for Speech Recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 92–102. [Google Scholar] [CrossRef] [Green Version]
- Lei, T.; Zhang, Y.; Wang, S.; Dai, H.; Artzi, Y. Simple Recurrent Units for Highly Parallelizable Recurrence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4470–4481. [Google Scholar]
- Ravanelli, M.; Parcollet, T.; Bengio, Y. The pytorch-kaldi speech recognition toolkit. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6465–6469. [Google Scholar]
- Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P.; et al. The Kaldi Speech Recognition Toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 11–15 December 2011. [Google Scholar]
- Garofolo, J.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; DARPA TIMIT Acoustic-Phonetic Continous Speech Corpus CD-ROM. NIST Speech Disc 1-1.1. 1993. Available online: https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir4930.pdf (accessed on 20 February 2022).
- Wang, S.; Li, Z.; Ding, C.; Yuan, B.; Qiu, Q.; Wang, Y.; Liang, Y. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 25–27 February 2018; pp. 11–20. [Google Scholar]
- Sharma, H.; Park, J.; Suda, N.; Lai, L.; Chau, B.; Chandra, V.; Esmaeilzadeh, H. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA, 1–6 June 2018; pp. 764–775. [Google Scholar]
- Lee, J.; Kim, C.; Kang, S.; Shin, D.; Kim, S.; Yoo, H.J. UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision. IEEE J. Solid-State Circuits 2019, 54, 173–185. [Google Scholar] [CrossRef]
- Sung, W.; Shin, S.; Hwang, K. Resiliency of Deep Neural Networks under Quantization. arXiv 2015, arXiv:1511.06488. [Google Scholar]
- Sung, W.; Park, J. Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference. arXiv 2018, arXiv:1803.11389. [Google Scholar]
Type | No Skip | Skip | Skip Part | ||||
---|---|---|---|---|---|---|---|
Size | Size | Size | |||||
LSTM | 8-bit | 16% | 13.6 | 14.2% | 14.8 | 14.4% | 14.2 |
4-bit | 17.2% | 6.8 | 15.1% | 8.5 | 15.1% | 7.7 | |
GRU | 8-bit | 19.3% | 13.3 | 15.5% | 14.8 | 15.6% | 14.2 |
4-bit | 21.9% | 6.7 | 16% | 8.8 | 16.3% | 8 | |
LiGRU | 8-bit | 21% | 9.6 | 14.9% | 10.5 | 14.9% | 10 |
4-bit | 21.6% | 4.8 | 15.5% | 6.2 | 15.5% | 5.7 |
W/A | W/A | W/A | LSTM | GRU | LiGRU | SRU | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Size | Size | Size | Size | ||||||||
Base | 32/32 | 32/32 | 32/32 | 14.1% | 54.4 | 15.4% | 53 | 14.5% | 38.2 | 17.2% | 21.2 |
F-pt. | 16/16 | 16/16 | 16/16 | 14.1% | 27.2 | 15.4% | 26.6 | 14.5% | 19.1 | 17.2% | 10.6 |
8-bit | 8/8 | 8/8 | 8/8 | 14.2% | 14.8 | 15.5% | 14.8 | 14.9% | 10.5 | 16.9% | 5.3 |
4-bit | 4/4 | 4/4 | 4/4 | 15.1% | 8.5 | 16% | 8.8 | 15.5% | 6.2 | 17.9% | 2.7 |
M1 | 8/8 | 2/8 | 8/8 | 14.7% | 7 | 15.4% | 7.4 | 16.6% | 5.6 | 17.9% | 2.9 |
M2 | 8/8 | 2/8 | 4/4 | 15.1% | 6 | 15.5% | 6.4 | 17.1% | 4.6 | 18.4% | 1.9 |
M3 | 8/8 | 2/8 | 2/8 | 15.8% | 5.5 | 15.7% | 5.9 | 17.4 | 4.1 | 18.5% | 1.4 |
M4 | 2/8 | 2/8 | 2/8 | 17.4% | 5.4 | 17.2% | 5.8 | 23.3% | 4.1 | 21.4% | 1.3 |
M5 | 8/8 | 4/4 | 4/4 | 14.6% | 8.6 | 15.5% | 8.9 | 15.4% | 6.2 | 17.4% | 2.7 |
M6 | 8/8 | 2/4 | 4/4 | 14.8% | 6 | 15.5% | 6.4 | 16.9% | 4.6 | 19.3% | 1.9 |
M7 | 8/8 | 2/4 | 2/4 | 15.8% | 5.7 | 15.7% | 6 | 17.4% | 4.2 | 19.7% | 1.5 |
M8 | 4/4 | 2/4 | 2/4 | 16.4% | 5.7 | 16.2% | 6 | 18.5% | 4.2 | 20.6% | 1.4 |
W/A | W/A | W/A | LSTM | GRU | LiGRU | SRU | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
EO | EO | EO | EO | ||||||||
8-bit | 8/8 | 8/8 | 8/8 | 14.2% (14.4%) | 17% (56%) | 15.5% (15.5%) | 18% (56%) | 14.9% (15%) | 48% (55%) | 16.9% (16.8%) | 20% (31%) |
4-bit | 4/4 | 4/4 | 4/4 | 15.1% (14.9%) | 51% (66%) | 16% (16.1%) | 46% (64%) | 15.5% | 64% | 17.9% | 55% |
M1 | 8/8 | 2/8 | 8/8 | 14.7% (14.7%) | 22% (55%) | 15.4% (15.5%) | 17% (56%) | 16.6% (16.8%) | 49% (56%) | 17.9% (18.4%) | 20% (31%) |
M2 | 8/8 | 2/8 | 4/4 | 15.1% (15.1%) | 25% (57%) | 15.5% (15.6%) | 18% (57%) | 17.1% (17.2%) | 53% (56%) | 18.4% (18.8%) | 34% (45%) |
M3 | 8/8 | 2/8 | 2/8 | 15.8% (16%) | 20% (56%) | 15.7% (15.8%) | 17% (56%) | 17.4% (17.4%) | 49% (56%) | 18.5% (18.8%) | 20% (31%) |
M4 | 2/8 | 2/8 | 2/8 | 17.4% (17.6%) | 21% (56%) | 17.2% (17.3%) | 19% (57%) | 23.3% (23.9%) | 51% (57%) | 21.4% (21.5%) | 20% (33%) |
M5 | 8/8 | 4/4 | 4/4 | 14.6% (14.7%) | 52% (66%) | 15.5% (15.8%) | 47% (64%) | 15.4% | 65% | 17.4% | 56% |
M6 | 8/8 | 2/4 | 4/4 | 14.8% (15.2%) | 52% (66%) | 15.5% (15.8%) | 47% (64%) | 16.9% | 63% | 19.3% | 55% |
M7 | 8/8 | 2/4 | 2/4 | 15.8% (16.1%) | 51% (66%) | 15.7% (16.1%) | 46% (64%) | 17.4% | 63% | 19.7% | 55% |
M8 | 4/4 | 2/4 | 2/4 | 16.4% (16.6%) | 51% (65%) | 16.2% (16.6%) | 46% (63%) | 18.5% | 63% | 20.6% | 53% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rezk, N.M.; Nordström, T.; Ul-Abdin, Z. Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models. Information 2022, 13, 176. https://doi.org/10.3390/info13040176
Rezk NM, Nordström T, Ul-Abdin Z. Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models. Information. 2022; 13(4):176. https://doi.org/10.3390/info13040176
Chicago/Turabian StyleRezk, Nesma M., Tomas Nordström, and Zain Ul-Abdin. 2022. "Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models" Information 13, no. 4: 176. https://doi.org/10.3390/info13040176
APA StyleRezk, N. M., Nordström, T., & Ul-Abdin, Z. (2022). Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models. Information, 13(4), 176. https://doi.org/10.3390/info13040176