Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications
Abstract
:1. Introduction
- State-of-the-art works typically modify the convolution algorithm, which involves tuning the hardware to a single CNN configuration. The proposed method allows processing any convolution layer configuration without customizing the standard algorithm. Hence, the convolver architecture can be adapted to different CNN structures.
- The present work proposes an adaptive line buffer, where the convolution masks are dynamically generated; that is, the proposed buffer is programmable where the mask size can be defined through the external control. The need to reconfigure the design for each new CNN is avoided.
- The proposed architecture flexibility is demonstrated by its implementation in different CNN layers. From the results, a potential competitive total processing time is achieved compared with recently reported architectures.
2. Materials and Methods
Algorithm 1: Standard pseudo code of a convolutional layer |
- Layer parallelism:
- Data from different layers are pipeline processed through parallel computing of two or even more convolution layers. This type of parallelism is characteristic of streaming architectures and can be used when the target device contains enough hardware resources to host a convolver for each convolution layer. For this work, this type of parallelism is not considered.
- Loop parallelism:
- This type of parallelism is based on the fact that all convolutions between feature maps and kernels within a convolutional layer can be parallel processed due to their independence. Hence, this parallelism can be exploited as: intra-output (denoted as ) and inter-output (denoted as ) parallelisms [48]. The number of input/output feature maps that are parallel computed is related to the first/second parallelism type.
- Operation parallelism:
- The Basic CNN computing operator is the convolution (between a convolution mask and a kernel). It can be understood that the input image is scanned from left to right and top to bottom, creating convolution masks with a size. An element-wise multiplication is obtained for each mask position by the SA. In this case, operation parallelism is represented as and corresponds to the number of used multipliers in such element-wise multiplication.
- Data parallelism:
- Different input feature maps set areas are processed by performing a convolution with each convolution mask; these operations are simultaneously computed by replicating the SA rows.
- Task parallelism:
- This parallelism type consists of processing more than one input image in parallel and implementing more than one CNN within the same chip. This type of parallelism is the least explored since state-of-the-art works focus on accelerating the convolution operation in one layer, taking most of the chip’s resources.
- Weight Reuse (WR):
- Power consumption reduction is achieved by reusing the filters readings from external memory.
- Input Reuse (IR):
- The reuse of the processed feature maps within the convolution layer. Such reuse type is the basics of the described hardware architecture. The number of external memory accesses is reduced because each pixel of the input feature maps set is read only one time.
- Output Reuse (OR):
- The computed convolution layer data are stored in an output buffer until the final result is obtained.
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Aguirre-Álvarez, P.A.; Diaz-Carmona, J.; Arredondo-Velázquez, M. Hardware Flexible Systolic Architecture for Convolution Accelerator in Convolutional Neural Networks. In Proceedings of the 2022 45th International Conference on Telecommunications and Signal Processing (TSP), Prague, Czech Republic, 13–15 July 2022; pp. 305–309. [Google Scholar]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Lindsay, G.W. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. J. Cogn. Neurosci. 2020, 1–15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on COMPUTER Vision and Pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 2017, 105, 2295–2329. [Google Scholar] [CrossRef] [Green Version]
- Xie, L.; Ahmad, T.; Jin, L.; Liu, Y.; Zhang, S. A new CNN-based method for multi-directional car license plate detection. IEEE Trans. Intell. Transp. Syst. 2018, 19, 507–517. [Google Scholar] [CrossRef]
- Pham, T.A. Effective deep neural networks for license plate detection and recognition. Vis. Comput. 2022, 1–15. [Google Scholar] [CrossRef]
- Kim, J.; Kang, J.K.; Kim, Y. A Resource Efficient Integer-Arithmetic-Only FPGA-Based CNN Accelerator for Real-Time Facial Emotion Recognition. IEEE Access 2021, 9, 104367–104381. [Google Scholar] [CrossRef]
- Vu, H.N.; Nguyen, M.H.; Pham, C. Masked face recognition with convolutional neural networks and local binary patterns. Appl. Intell. 2022, 52, 5497–5512. [Google Scholar] [CrossRef]
- Aladem, M.; Rawashdeh, S.A. A single-stream segmentation and depth prediction CNN for autonomous driving. IEEE Intell. Syst. 2020, 36, 79–85. [Google Scholar] [CrossRef]
- Arefnezhad, S.; Eichberger, A.; Frühwirth, M.; Kaufmann, C.; Moser, M.; Koglbauer, I.V. Driver monitoring of automated vehicles by classification of driver drowsiness using a deep convolutional neural network trained by scalograms of ECG signals. Energies 2022, 15, 480. [Google Scholar] [CrossRef]
- Le, E.; Wang, Y.; Huang, Y.; Hickman, S.; Gilbert, F. Artificial intelligence in breast imaging. Clin. Radiol. 2019, 74, 357–366. [Google Scholar] [CrossRef]
- Ankel, V.; Shribak, D.; Chen, W.Y.; Heifetz, A. Classification of computed thermal tomography images with deep learning convolutional neural network. J. Appl. Phys. 2022, 131, 244901. [Google Scholar] [CrossRef]
- Jameil, A.K.; Al-Raweshidy, H. Efficient CNN Architecture on FPGA Using High Level Module for Healthcare Devices. IEEE Access 2022, 10, 60486–60495. [Google Scholar] [CrossRef]
- Mohana, J.; Yakkala, B.; Vimalnath, S.; Benson Mansingh, P.; Yuvaraj, N.; Srihari, K.; Sasikala, G.; Mahalakshmi, V.; Yasir Abdullah, R.; Sundramurthy, V.P. Application of internet of things on the healthcare field using convolutional neural network processing. J. Healthc. Eng. 2022, 2022, 1892123. [Google Scholar] [CrossRef]
- Venieris, S.I.; Bouganis, C.S. fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 326–342. [Google Scholar] [CrossRef] [Green Version]
- Arredondo-Velazquez, M.; Diaz-Carmona, J.; Torres-Huitzil, C.; Padilla-Medina, A.; Prado-Olivarez, J. A streaming architecture for Convolutional Neural Networks based on layer operations chaining. J. Real-Time Image Process. 2020, 17, 1715–1733. [Google Scholar] [CrossRef]
- Arredondo-Velazquez, M.; Diaz-Carmona, J.; Barranco-Gutierrez, A.I.; Torres-Huitzil, C. Review of prominent strategies for mapping CNNs onto embedded systems. IEEE Lat. Am. Trans. 2020, 18, 971–982. [Google Scholar] [CrossRef]
- NVIDIA. Deep Learning Frameworks. 2019. Available online: https://developer.nvidia.com/deep-learning-frameworks (accessed on 16 July 2019).
- Erickson, B.J.; Korfiatis, P.; Akkus, Z.; Kline, T.; Philbrick, K. Toolkits and libraries for deep learning. J. Digit. Imaging 2017, 30, 400–405. [Google Scholar] [CrossRef] [Green Version]
- Cong, J.; Xiao, B. Minimizing Computation in Convolutional Neural Networks. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2014; Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 281–290. [Google Scholar]
- Cong, J.; Fang, Z.; Huang, M.; Wei, P.; Wu, D.; Yu, C.H. Customizable Computing—From Single Chip to Datacenters. Proc. IEEE 2018, 107, 185–203. [Google Scholar] [CrossRef]
- Hailesellasie, M.T.; Hasan, S.R. MulNet: A Flexible CNN Processor With Higher Resource Utilization Efficiency for Constrained Devices. IEEE Access 2019, 7, 47509–47524. [Google Scholar] [CrossRef]
- Liu, Z.; Dou, Y.; Jiang, J.; Xu, J.; Li, S.; Zhou, Y.; Xu, Y. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2017, 10, 17. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; ACM: New York, NY, USA, 2015; pp. 161–170. [Google Scholar]
- Guo, K.; Sui, L.; Qiu, J.; Yu, J.; Wang, J.; Yao, S.; Han, S.; Wang, Y.; Yang, H. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2017, 37, 35–47. [Google Scholar] [CrossRef]
- Chen, Y.H.; Krishna, T.; Emer, J.S.; Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 2017, 52, 127–138. [Google Scholar] [CrossRef] [Green Version]
- Abdelouahab, K.; Pelcat, M.; Sérot, J.; Bourrasset, C.; Berry, F. Tactics to Directly Map CNN graphs on Embedded FPGAs. IEEE Embed. Syst. Lett. 2017, 9, 113–116. [Google Scholar] [CrossRef] [Green Version]
- Dundar, A.; Jin, J.; Martini, B.; Culurciello, E. Embedded streaming deep neural networks accelerator with applications. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 1572–1583. [Google Scholar] [CrossRef]
- Du, L.; Du, Y.; Li, Y.; Su, J.; Kuan, Y.C.; Liu, C.C.; Chang, M.C.F. A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 198–208. [Google Scholar] [CrossRef] [Green Version]
- Tu, F.; Yin, S.; Ouyang, P.; Tang, S.; Liu, L.; Wei, S. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 2220–2223. [Google Scholar] [CrossRef]
- Ma, Y.; Cao, Y.; Vrudhula, S.; Seo, J.s. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2018, 26, 1354–1367. [Google Scholar] [CrossRef]
- Li, J.; Un, K.F.; Yu, W.H.; Mak, P.I.; Martins, R.P. An FPGA-based energy-efficient reconfigurable convolutional neural network accelerator for object recognition applications. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 3143–3147. [Google Scholar] [CrossRef]
- Chen, Y.X.; Ruan, S.J. A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans. Circuits Syst. II Express Briefs 2020, 68, 752–756. [Google Scholar] [CrossRef]
- Gilan, A.A.; Emad, M.; Alizadeh, B. FPGA-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans. Circuits Syst. II Express Briefs 2019, 67, 755–759. [Google Scholar]
- Xu, R.; Ma, S.; Wang, Y.; Chen, X.; Guo, Y. Configurable multi-directional systolic array architecture for convolutional neural networks. ACM Trans. Archit. Code Optim. (TACO) 2021, 18, 1–24. [Google Scholar] [CrossRef]
- Jafari, A.; Page, A.; Sagedy, C.; Smith, E.; Mohsenin, T. A low power seizure detection processor based on direct use of compressively-sensed data and employing a deterministic random matrix. In Proceedings of the Biomedical Circuits and Systems Conference (BioCAS), Atlanta, GA, USA, 22–24 October 2015; pp. 1–4. [Google Scholar]
- Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient processing of deep neural networks. Synth. Lect. Comput. Archit. 2020, 15, 1–341. [Google Scholar]
- Xiyuan, P.; Jinxiang, Y.; Bowen, Y.; Liansheng, L.; Yu, P. A Review of FPGA-Based Custom Computing Architecture for Convolutional Neural Network Inference. Chin. J. Electron. 2021, 30, 1–17. [Google Scholar] [CrossRef]
- Stankovic, L.; Mandic, D. Convolutional neural networks demystified: A matched filtering perspective based tutorial. arXiv 2021, arXiv:2108.11663. [Google Scholar]
- Lacey, G.; Taylor, G.W.; Areibi, S. Deep Learning on FPGAs: Past, Present, and Future. arXiv 2016, arXiv:1602.04283. [Google Scholar]
- Chakradhar, S.; Sankaradas, M.; Jakkula, V.; Cadambi, S. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the ACM SIGARCH Computer Architecture News, New York, NY, USA, 3 June 2010; ACM: New York, NY, USA, 2010; Volume 38, pp. 247–257. [Google Scholar]
- Samajdar, A.; Zhu, Y.; Whatmough, P.; Mattina, M.; Krishna, T. Scale-sim: Systolic cnn accelerator simulator. arXiv 2018, arXiv:1811.02883. [Google Scholar]
- Fu, Y.; Wu, E.; Sirasao, A.; Attia, S.; Khan, K.; Wittig, R. Deep Learning with INT8 Optimization on Xilinx Devices. White Paper. 2016. Available online: https://docs.xilinx.com/v/u/en-US/wp486-deep-learning-int8 (accessed on 13 November 2022).
- Deng, L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
- Li, F.-F.; Andreto, M.; Ranzato, M.A.; Perona, P. Caltech 101. 2022. Available online: https://data.caltech.edu/records/mzrjq-6wc02 (accessed on 6 April 2022). [CrossRef]
- Arredondo-Velázquez, M.; Diaz-Carmona, J.; Torres-Huitzil, C.; Barranco-Gutiérrez, A.I.; Padilla-Medina, A.; Prado-Olivarez, J. A streaming accelerator of convolutional neural networks for resource-limited applications. IEICE Electron. Express 2019, 16, 20190633. [Google Scholar] [CrossRef] [Green Version]
- Shan, D.; Cong, G.; Lu, W. A CNN Accelerator on FPGA with a Flexible Structure. In Proceedings of the 2020 5th International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 19–21 June 2020; pp. 211–216. [Google Scholar]
- Bouguezzi, S.; Fredj, H.B.; Belabed, T.; Valderrama, C.; Faiedh, H.; Souani, C. An efficient FPGA-based convolutional neural network for classification: Ad-MobileNet. Electronics 2021, 10, 2272. [Google Scholar] [CrossRef]
- Parmar, Y.; Sridharan, K. A resource-efficient multiplierless systolic array architecture for convolutions in deep networks. IEEE Trans. Circuits Syst. II Express Briefs 2019, 67, 370–374. [Google Scholar] [CrossRef]
- Bassi, P.R.; Attux, R. A deep convolutional neural network for COVID-19 detection using chest X-rays. Res. Biomed. Eng. 2022, 38, 139–148. [Google Scholar] [CrossRef]
- Wang, D.; Hong, D.; Wu, Q. Attention Deficit Hyperactivity Disorder Classification Based on Deep Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022. [Google Scholar] [CrossRef]
Alexnet | VGG16 | Darknet19 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# Conf | ||||||||||||||||||
1 | 3 | 96 | 11 | 4 | 0 | 3 | 64 | 3 | 1 | 1 | 3 | 32 | 3 | 1 | 1 | |||
2 | 96 | 256 | 5 | 1 | 2 | 64 | 64 | 3 | 1 | 1 | 32 | 64 | 3 | 1 | 1 | |||
3 | 256 | 384 | 3 | 1 | 1 | 64 | 128 | 3 | 1 | 1 | 64 | 128 | 3 | 1 | 1 | |||
4 | 384 | 256 | 3 | 1 | 1 | 128 | 128 | 3 | 1 | 1 | 128 | 64 | 3 | 1 | 0 | |||
5 | - | - | - | - | - | - | 128 | 256 | 3 | 1 | 1 | 128 | 256 | 3 | 1 | 1 | ||
6 | - | - | - | - | - | - | 256 | 256 | 3 | 1 | 1 | 256 | 128 | 1 | 1 | 0 | ||
7 | - | - | - | - | - | - | 256 | 512 | 3 | 1 | 1 | 256 | 512 | 3 | 1 | 1 | ||
8 | - | - | - | - | - | - | 512 | 512 | 3 | 1 | 1 | 512 | 256 | 1 | 1 | 0 | ||
9 | - | - | - | - | - | - | 512 | 512 | 3 | 1 | 1 | 512 | 1024 | 3 | 1 | 1 | ||
10 | - | - | - | - | - | - | - | - | - | - | - | - | 1024 | 512 | 1 | 1 | 0 | |
11 | - | - | - | - | - | - | - | - | - | - | - | - | 1024 | 1000 | 1 | 1 | 0 |
CNN Model | Config | Processing Time (Calculated) | Processing Time (Hardware) |
---|---|---|---|
CNN1 | 1/Overall | 245.6 us | 245.6 us |
1 | 82.14 us | 100.26 us | |
CNN2 | 2 | 59.32 us | 61.57 us |
Overall | 141.46 us | 161.83 us | |
1 | 109.98 us | 134.49 us | |
CNN3 | 2 | 88.30 us | 91.69 us |
3 | 35.45 us | 35.57 us | |
Overall | 233.73 us | 261.75 us | |
Modified | 1 | 1.73 ms | 1.73 ms |
2 | 10.54 ms | 10.54 ms | |
3 | 8.72 ms | 8.73 ms | |
Alexnet | 4 | 13.09 ms | 13.09 ms |
5 | 8.72 ms | 8.72 ms | |
Overall | 42.80 ms | 42.81 ms | |
1 | 22.96 ms | 23.05 ms | |
2 | 155.04 ms | 155.07 ms | |
Alexnet | 3 | 99.16 ms | 99.17 ms |
4 | 99.16 ms | 99.17 ms | |
5 | 66.11 ms | 66.12 ms | |
Overall | 442.44 ms | 442.58 ms | |
1 | 55.06 ms | 56.57 ms | |
2 | 1174.45 ms | 1175.97 ms | |
3 | 587.27 ms | 587.66 ms | |
4 | 1174.55 ms | 1174.93 ms | |
VGG16 | 5 | 587.49 ms | 587.58 ms |
6 | 1174.98 ms | 1175.08 ms | |
7 | 588.35 ms | 588.37 ms | |
8 | 1176.70 ms | 1176.73 ms | |
9 | 295.89 ms | 295.90 ms | |
Overall | 9.75 s | 9.76 s | |
1 | 27.53 ms | 29.05 ms | |
2 | 146.82 ms | 147.20 ms | |
3 | 146.88 ms | 146.97 ms | |
4 | 110.17 ms | 110.26 ms | |
5 | 147.09 ms | 147.11 ms | |
Darknet19 | 6 | 110.39 ms | 110.41 ms |
7 | 147.95 ms | 147.95 ms | |
8 | 111.25 ms | 111.25 ms | |
9 | 151.39 ms | 151.39 ms | |
10 | 114.69 ms | 114.69 ms | |
11 | 225.79 ms | 225.79 ms | |
Overall | 2.62 s | 2.89 s |
Work | Method | CNN Model | Complexity | Throughput (GOp/s) | Pt/Image |
---|---|---|---|---|---|
This Work | PLB for flexible mask generation and flexible SA for standard convolution | One Layer Custom CNN | 668,000 | 2.71 | 245.6 us |
Two Layers Custom CNN | 240,266 | 1.48 | 161.83 us | ||
Three Layers Custom CNN | 250,250 | 0.95 | 261.75 us | ||
Modified Alexnet | 65.81 MMACs | 1.53 | 42.81 ms | ||
Alexnet | 1.076 GMACs | 2.43 | 442.58 ms | ||
VGG16 | 15.34 GMACs | 1.57 | 9.76 s | ||
Darknet19 | 2.79 GMACs | 0.96 | 2.89 s | ||
[22] 2020 | Layer-operation-chaining-based processing on a streaming architecture for a single-image processing | Three layers custom CNN | 250,250 | 0.354 | 706.36 us |
[53] 2019 | convolvers reusability on a streaming architecture employing layer-operation-chaining approach | Three layers custom CNN | 250,250 | 0.6271 | 399 us |
[39] 2021 | Kernel partition SA | Alexnet | 1.45 GOPs | 220 | 6.6050 ms |
VGG16 | 30.72 GOPs | 230.1 | 137.3626 ms | ||
[40] 2020 | Channel oriented data pattern SA | Alexnet | 1.076 GMACs | 10.5 | - |
VGG16 | 15.3466 GMACs | 12.5 | - | ||
[41] 2019 | Parallel MAC modules and DMA input data control | Modified AlexNet | 1.44 GOPs | 198.1 | 12.2 ms |
[54] 2020 | RAM based LB for fixed 5 × 5 mask generation and parallel DSP block multiplier | Custom CNN for MNIST | 209,280 | 0.598 GMAC/s | 526 us |
[55] 2021 | Cascade of a parallel multichannel point-wise convolution stage + multichannel depth-wise convolution stage | RAd-MobileNet | 43.54 MMACs | - | - |
[56] 2020 | Controlled loading of inputs to each PE from separate memories and CORDIC radix-4 computing instead of multiplications | VGG-16 | 15.3446 GMACs | 177.3 | 5.76 ms |
Work | Device | Working Frequency (MHz) | Slices | Registers | DSPs | Memory | Power Consumption (W) |
---|---|---|---|---|---|---|---|
This Work | Cyclone V SE5CSXFC6D | 100 | 11,597 | 24,105 | 112 | 698.11 kb | 0.289 |
[39] 2021 | Virtex 7 VC709 | 200 | 121,472 | 159,872 | 664 | 16.812 Mb | 9.61 |
[40] 2020 | Zynq XC7Z045 | 100 | - | - | 64 | - | - |
[41] 2019 | Zynq XC7Z045 | 200 | 71,741 | 79,347 | 576 | 1.476 MB | - |
[54] 2020 | Artix-7 XC7A200T- SBG484 | 50 | 88,756 | 42,038 | 571 | LUTRAM = 596, BRAM = 218 | 1.225 |
[55] 2021 | Virtex-7 xc7vx980 | 225 | 57,438 | 79,327 | 937 | 5.11 | 3.25 |
[56] 2020 | Virtex 5 XC5VLX5OT | 496 | 1465 | 519 | 0 | 45 kB on-chip and 134 MB off chip | 54 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arredondo-Velázquez, M.; Aguirre-Álvarez, P.A.; Padilla-Medina, A.; Espinosa-Calderon, A.; Prado-Olivarez, J.; Diaz-Carmona, J. Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications. Appl. Sci. 2023, 13, 93. https://doi.org/10.3390/app13010093
Arredondo-Velázquez M, Aguirre-Álvarez PA, Padilla-Medina A, Espinosa-Calderon A, Prado-Olivarez J, Diaz-Carmona J. Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications. Applied Sciences. 2023; 13(1):93. https://doi.org/10.3390/app13010093
Chicago/Turabian StyleArredondo-Velázquez, Moisés, Paulo Aaron Aguirre-Álvarez, Alfredo Padilla-Medina, Alejandro Espinosa-Calderon, Juan Prado-Olivarez, and Javier Diaz-Carmona. 2023. "Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications" Applied Sciences 13, no. 1: 93. https://doi.org/10.3390/app13010093
APA StyleArredondo-Velázquez, M., Aguirre-Álvarez, P. A., Padilla-Medina, A., Espinosa-Calderon, A., Prado-Olivarez, J., & Diaz-Carmona, J. (2023). Flexible Convolver for Convolutional Neural Networks Deployment onto Hardware-Oriented Applications. Applied Sciences, 13(1), 93. https://doi.org/10.3390/app13010093