A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs
Abstract
:1. Introduction
2. Related Work
3. Background
3.1. Model Compression
3.2. Quantization with TensorFlow Lite
4. Materials and Methods
4.1. Accelerator Architecture
4.1.1. TFLite_mbqm Core
Algorithm 1 Computation of the value mbqm using the TFLite_mbqm core. |
1: 2: 3: 4: 5: 6: |
- •
- tflite_core0: Adds the bias to the input value (e.g., cv_in) and checks for overflow.
- •
- tflite_core1: Multiplies the quantized_multiplier value with the input plus bias (xls), using two DSP48s because the expected result is a 64-bit width. This sub-module also computes the nudge variable.
- •
- tflite_core2: Adds the ab_64 value to the nudge into the ab_nudge.
- •
- tflite_core3: Saturates the ab_nudge and bounds it to the int32_t maximum value. The result is the srdhm value.
- •
- tflite_core4: Rounds the srdhm value using the shift parameter and outputs the mbqm value.
4.1.2. Conv Core
- •
- conv_core0: Adds the offset_ent parameter and the input values x01, …, x025 into the xo01, …, xo025 signals.
- •
- conv_core1: Multiplies the weights w01, …, w25 with the xo01, …, xo025 values using DSP48 blocks, into xow01, …, xow025 signals.
- •
- conv_core2: Adds the xow01, …, xow025 values into the signal xow.
- •
- conv_core3: Adds the previous value cv_in to the present value xow. The result is stored in the output register cv_out.
Algorithm 2 Convolutional layer computation employing the Conv and TFLite_mbqm cores |
|
4.1.3. Mpool Core
Algorithm 3 Maxpooling layer computation using the Mpool core. |
|
4.1.4. Dense Core
- •
- dense_core0: Adds the offset_ent parameter and the input values x01, …, x025, and copies the results into the xo01, …, xo025 signals.
- •
- dense_core1: Multiplies the weights w01, …, w025 by the xo01, …, xo025 values using DSP48, and copies the results into the xow01, …, xow025 signals. Then, these signals are added in the top module, and the result is stored in the output register ds_out.
Algorithm 4 Fully connected layer computation using the Dense and the TFLite_mbqm cores |
|
4.1.5. Additional Functions
Algorithm 5 Padding computation for quantized network. |
|
Algorithm 6 Flatten function. |
|
4.2. Methodology Overview
Algorithm 7 Methodology to implement quantized CNNs in Zynq FPGAs |
|
Algorithm 8 C application template for mapping TFLite quantized CNNs in Vitis |
|
4.3. Experimental Setup
5. Results
5.1. Trained Models
5.2. Quantized Models
5.3. Logic Resources and Power Consumption
5.4. Performance Comparison
6. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolutional neural network |
CPU | Central processing unit |
DPU | Deep processing unit |
DSP | Digital signal processor |
FPGA | Field-programmable gate array |
GPU | Graphics processing unit |
HDL | Hardware description language |
JAFFE | Japanese Female Facial Expression |
LBP | Local binary pattern |
MCU | Microcontroller unit |
ML | Machine learning |
MNIST | Modified National Institute of Standards and Technology database |
TFLite | TensorFlow Lite |
Appendix A. TFLite Functions Used in Quantization
Algorithm A1 SaturatingRoundingDoublingHighMul saturates the product between the input value (a) and the quantized_multiplier (b) and bounds its output to the int32_t maximum. |
|
Algorithm A2 RoundingDivideByPOT rounds the saturated value employing the exponent parameter and the functions BitAnd, MaskIfLessThan, MaskIfGreaterThan, and ShiftRight. |
|
Algorithm A3 MultiplyByQuantizedMultiplier calls the above functions and uses the exponent obtained with the shift quantization parameter to compute the mbqm factor. |
|
Appendix B. Accelerator Power Consumption
References
- Liang, Y.; Lu, L.; Xie, J. OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2021, 40, 1648–1661. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, L.; Liu, H.; Tian, S.; Deng, Q.; Li, J. An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs. IEEE Access 2020, 8, 83224–83237. [Google Scholar] [CrossRef]
- Sarvamangala, D.R.; Kulkarni, R.V. Convolutional neural networks in medical image understanding: A survey. Evol. Intell. 2022, 15, 1–22. [Google Scholar] [CrossRef] [PubMed]
- Yao, S.; Zhao, Y.; Zhang, A.; Su, L.; Abdelzaher, T. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework. 2017. Available online: https://arxiv.org/abs/1706.01215 (accessed on 9 September 2023).
- Yang, T.J.; Chen, Y.H.; Sze, V. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. 2017. Available online: http://xxx.lanl.gov/abs/1611.05128 (accessed on 9 September 2023).
- Chang, S.E.; Li, Y.; Sun, M.; Shi, R.; So, H.K.H.; Qian, X.; Wang, Y.; Lin, X. Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework. 2020. Available online: http://xxx.lanl.gov/abs/2012.04240 (accessed on 9 September 2023).
- Bao, Z.; Fu, G.; Zhang, W.; Zhan, K.; Guo, J. LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration. IEEE Micro 2022, 42, 8–15. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficitent Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
- TensorFlow. TensorFlow for Mobile and Edge. Available online: https://www.tensorflow.org/lite (accessed on 9 September 2023).
- Merenda, M.; Porcaro, C.; Iero, D. Edge Machine Learning for AI-Enabled IoT Devices: A Review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
- Misra, J.; Saha, I. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 2010, 74, 239–255. [Google Scholar] [CrossRef]
- Maloney, S. Survey: Implementing Dense Neural Networks in Hardware. 2013. Available online: https://pdfs.semanticscholar.org/b709/459d8b52783f58f1c118619ec42f3b10e952.pdf (accessed on 15 February 2018).
- Krizhevsky, A. Survey: Implementing Dense Neural Networks in Hardware. 2014. Available online: https://arxiv.org/abs/1404.5997 (accessed on 15 February 2018).
- Farrukh, F.U.D.; Xie, T.; Zhang, C.; Wang, Z. Optimization for Efficient Hardware Implementation of CNN on FPGA. In Proceedings of the 2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), Beijing, China, 21–23 November 2018; pp. 88–89. [Google Scholar] [CrossRef]
- Liang, Y.; Lu, L.; Xiao, Q.; Yan, S. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2020, 39, 857–870. [Google Scholar] [CrossRef]
- Zhou, Y.; Jiang, J. An FPGA-based accelerator implementation for deep convolutional neural networks. In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin, China, 19–20 December 2015; Volume 1, pp. 829–832. [Google Scholar] [CrossRef]
- Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, 22–24 February 2015; FPGA ’15. pp. 161–170. [Google Scholar] [CrossRef]
- Feng, G.; Hu, Z.; Chen, S.; Wu, F. Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks. In Proceedings of the 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Hangzhou, China, 25–28 October 2016; pp. 624–626. [Google Scholar] [CrossRef]
- Li, X.; Cai, Y.; Han, J.; Zeng, X. A high utilization FPGA-based accelerator for variable-scale convolutional neural network. In Proceedings of the 2017 IEEE 12th International Conference on ASIC (ASICON), Guiyang, China, 25–28 October 2017; pp. 944–947. [Google Scholar] [CrossRef]
- Guo, J.; Yin, S.; Ouyang, P.; Liu, L.; Wei, S. Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA, 30 April–2 May 2017; p. 31. [Google Scholar] [CrossRef]
- Chang, X.; Pan, H.; Zhang, D.; Sun, Q.; Lin, W. A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA. In Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 12–14 June 2019; pp. 2137–2141. [Google Scholar] [CrossRef]
- Zong-ling, L.; Lu-yuan, W.; Ji-yang, Y.; Bo-wen, C.; Liang, H. The Design of Lightweight and Multi Parallel CNN Accelerator Based on FPGA. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 1521–1528. [Google Scholar] [CrossRef]
- Ortega-Zamorano, F.; Jerez, J.M.; Munoz, D.U.; Luque-Baena, R.M.; Franco, L. Efficient Implementation of the Backpropagation Algorithm in FPGAs and Microcontrollers. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1840–1850. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Du, Z.; Sun, N.; Wang, J.; Wu, C.; Chen, Y.; Temam, O. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, San Diego, CA, USA, 27 April–1 May 2014; ASPLOS ’14. pp. 269–284. [Google Scholar] [CrossRef]
- Du, Z.; Fasthuber, R.; Chen, T.; Ienne, P.; Li, L.; Luo, T.; Feng, X.; Chen, Y.; Temam, O. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, Oregon, 13–17 June 2015; pp. 92–104. [Google Scholar] [CrossRef]
- TensorFlow: An Open-Source Software Library for Machine Intelligence. Available online: https://www.tensorflow.org/ (accessed on 15 February 2018).
- TensorFlow. TensorFlow Lite 8-Bit Quantization Specification. Available online: https://www.tensorflow.org/lite/performance/quantization_spec (accessed on 28 January 2022).
- TensorFlow. Quantization Aware Training. Available online: https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html (accessed on 28 January 2022).
- TensorFlow. TensorFlow TFLite-Micro. 2023. Available online: https://github.com/tensorflow/tflite-micro/tree/main (accessed on 11 July 2023).
- Xilinx. Zynq Ultrascale+ MPSoC. Available online: https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html (accessed on 12 September 2022).
- LeCun, Y.; Cortes, C.; Burges, C. MNIST Handwritten Digit Database. ATT Labs [Online]. 2010, Volume 2. Available online: http://yann.lecun.com/exdb/mnist (accessed on 9 September 2023).
- Lyons, M.; Kamachi, M.; Gyoba, J. The Japanese Female Facial Expression (JAFFE) Dataset. Zenodo. 14 April 1998. Available online: https://doi.org/10.5281/zenodo.3451524 (accessed on 9 September 2023).
- Parra, D.; Camargo, C. Design Methodology for Single-Channel CNN-Based FER Systems. In Proceedings of the 2023 6th International Conference on Information and Computer Technologies (ICICT), Raleigh, HI, USA, 24–26 March 2023; pp. 89–94. [Google Scholar] [CrossRef]
- Angelini, C. Nvidia GeForce GTX 1660 Ti 6GB Review: Turing without the RTX. 2020. Available online: https://www.tomshardware.com/reviews/nvidia-geforce-gtx-1660-ti-turing,6002-4.html (accessed on 11 July 2023).
Layer | Inputs/Outputs | Data_Type | Range |
---|---|---|---|
Conv_2D | Input 0: | int8 | |
Input 1 (Weight): | int8 | ||
Input 2 (Bias): | int32 | ||
Output 0: | int8 | ||
Fully_Connected | Input 0: | int8 | |
Input 1 (Weight): | int8 | ||
Input 2 (Bias): | int32 | ||
Output 0: | int8 | ||
Max_Pool_2D | Input 0: | int8 | |
Output 0: | int8 |
Name | Model | Precision | Recall | F1-Score | MCC | Accuracy |
---|---|---|---|---|---|---|
Input: | ||||||
CNN+ | Conv2D: | |||||
MNIST | MaxPooling: | |||||
Dense: 10 | ||||||
Input: | ||||||
Conv2D: | ||||||
MaxPooling: | ||||||
CNN+ | Conv2D: | |||||
JAFFE | MaxPooling: | |||||
Conv2D: | ||||||
MaxPooling: | ||||||
Dense: 6 |
Name | Model | Representation | Accuracy | Parameters | Size |
---|---|---|---|---|---|
Input: | Floating Point | 40.64 kB | |||
CNN+ | Conv2D: | ||||
MNIST | MaxPooling: | Integer | 13.30 kB | ||
Dense: 10 | |||||
Input: | Floating Point | 1.17 MB | |||
Conv2D: | |||||
MaxPooling: | |||||
CNN+ | Conv2D: | ||||
JAFFE | MaxPooling: | Integer | 0.30 MB | ||
Conv2D: | |||||
MaxPooling: | |||||
Dense: 6 |
Resource | Available | Utilization | Utilization % |
---|---|---|---|
LUT | 53,200 | 6373 | |
LUTRAM | 17,400 | 71 | |
FF | 106,400 | 12,470 | |
DSP | 220 | 93 | |
IO | 125 | 18 |
Quantized Network | Platform | Accuracy | Inference Time | Power | Cost |
---|---|---|---|---|---|
CNN+MNIST | Laptop with TFLite | 4.45 s | 50 W | ||
CNN+JAFFE | 73.97 s | ||||
CNN+MNIST | Zybo-Z7 with C application | 0.127 s | 4.5 W | ||
CNN+JAFFE | 99.74 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Parra, D.; Escobar Sanabria, D.; Camargo, C. A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs. Electronics 2023, 12, 4367. https://doi.org/10.3390/electronics12204367
Parra D, Escobar Sanabria D, Camargo C. A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs. Electronics. 2023; 12(20):4367. https://doi.org/10.3390/electronics12204367
Chicago/Turabian StyleParra, Dorfell, David Escobar Sanabria, and Carlos Camargo. 2023. "A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs" Electronics 12, no. 20: 4367. https://doi.org/10.3390/electronics12204367
APA StyleParra, D., Escobar Sanabria, D., & Camargo, C. (2023). A Methodology and Open-Source Tools to Implement Convolutional Neural Networks Quantized with TensorFlow Lite on FPGAs. Electronics, 12(20), 4367. https://doi.org/10.3390/electronics12204367