Survey of Deep Learning Accelerators for Edge and Emerging Computing
Abstract
:1. Introduction
- It provides a comprehensive and easy-to-follow description of state-of-the-art edge devices and their underlying architecture.
- It reviews the supported programming frameworks of the processors and general model compression techniques to enable edge computing.
- It analyzes the technical details of the processors for edge computing and provides charts on hardware parameters.
2. Deep Learning Algorithms in Edge Application
2.1. Classification
2.2. Detection
2.3. Speech Recognition and Natural Language Processing
3. Model Compression
3.1. Quantization
3.2. Pruning
3.3. Knowledge Distillation
4. Framework for Deep Learning Networks
5. Framework for Spiking Neural Networks
6. Edge Processors
6.1. Dataflow Edge Processor
6.2. Neuromorphic Edge AI Processor
6.3. PIM Processor
6.4. Processors in Industrial Research
7. Performance Analysis of Edge Processors
7.1. Overall Analysis of AI Edge Processors
- Performance: tera-operations per second (TOPS);
- Energy efficiency: TOPS/W;
- Power: Watt (W);
- Area: square millimeter (mm2).
7.2. AI Edge Processors with PIM Architecture
7.3. Edge Processors in Industrial Research
7.4. Processor Selection, Price and Applications
8. Summary
Author Contributions
Funding
Conflicts of Interest
References
- Merenda, M. Edge machine learning for ai-enabled iot devices: A review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
- Vestias, M.P.; Duarte, R.P.; de Sousa, J.T.; Neto, H.C. Moving Deep Learning to the Edge. Algorithms 2020, 13, 125. [Google Scholar] [CrossRef]
- IBM. Why Organizations Are Betting on Edge Computing? May 2020. Available online: https://www.ibm.com/thought-leadership/institute-business-value/report/edge-computing (accessed on 1 June 2023).
- Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
- Statista. IoT: Number of Connected Devices Worldwide 2015–2025. November 2016. Available online: https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/ (accessed on 5 June 2023).
- Chabas, J.M.; Gnanasambandam, C.; Gupte, S.; Mahdavian, M. New Demand, New Markets: What Edge Computing Means for Hardware Companies; McKinsey & Company: New York, NY, USA, 2018; Available online: https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/new-demand-new-markets-what-edge-computing-means-for-hardware-companies (accessed on 22 July 2023).
- Google. Cloud TPU. Available online: https://cloud.google.com/tpu (accessed on 5 May 2023).
- Accenture Lab. Driving Intelligence at the Edge with Neuromorphic Computing. 2021. Available online: https://www.accenture.com/_acnmedia/PDF-145/Accenture-Neuromorphic-Computing-POV.pdf (accessed on 3 June 2023).
- Intel Labs. Technology Brief. Taking /Neuromorphic Computing to the Next Level with Loihi 2. 2021. Available online: https://www.intel.com/content/www/us/en/research/neuromorphic-computing-loihi-2-technology-brief.html (accessed on 10 May 2023).
- Akopyan, F. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
- Videantis. July 2020. Available online: https://www.videantis.com/videantis-processor-adopted-for-tempo-ai-chip.html (accessed on 11 June 2022).
- Konikore. A living Breathing Machine. 2021. Available online: https://good-design.org/projects/konikore/ (accessed on 10 July 2022).
- Kalray. May 2023. Available online: https://www.kalrayinc.com/press-release/projet-ip-cube/ (accessed on 7 July 2023).
- Brainchip. 2023. Available online: https://brainchipinc.com/akida-neuromorphic-system-on-chip/ (accessed on 21 July 2023).
- Synsence. May 2023. Available online: https://www.synsense-neuromorphic.com/technology (accessed on 1 June 2023).
- Samsung. HBM-PIM. March 2023. Available online: https://www.samsung.com/semiconductor/solutions/technology/hbm-processing-in-memory/ (accessed on 25 July 2023).
- Upmem. Upmem-PIM. October 2019. Available online: https://www.upmem.com/nextplatform-com-2019-10-03-accelerating-compute-by-cramming-it-into-dram/ (accessed on 7 May 2023).
- Mythic. 2021. Available online: https://www.mythic-ai.com/product/m1076-analog-matrix-processor/ (accessed on 5 February 2022).
- Gyrfalcon. Available online: https://www.gyrfalcontech.ai/solutions/2803s/ (accessed on 3 March 2023).
- Syntiant. January 2021. Available online: https://www.syntiant.com/post/the-growing-syntiant-core-family (accessed on 7 February 2023).
- Leapmind. Efficiera. July 2023. Available online: https://leapmind.io/en/news/detail/230801/ (accessed on 6 July 2023).
- Tarwani, K.M.; Swathi, E. Survey on Recurrent Neural Network in Natural Language Processing. Int. J. Eng. Trends Technol. 2017, 48, 301–304. [Google Scholar] [CrossRef]
- Goldberg, Y. A Primer on Neural Network Models for Natural Language Processing. J. Artif. Intell. Res. 2015, 57, 345–420. [Google Scholar] [CrossRef]
- Yao, L.; Guan, Y. An improved LSTM structure for natural language processing. In Proceedings of the 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 10–12 December 2018; pp. 565–569. [Google Scholar] [CrossRef]
- Wang, S.; Jing, J. Learning natural language inference with LSTM. arXiv 2015, arXiv:1512.08849. [Google Scholar]
- Azari, E.; Virudhula, S. An Energy-Efficient Reconfigurable LSTM Accelerator for Natural Language Processing. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12December 2019; pp. 4450–4459. [Google Scholar] [CrossRef]
- Li, W.; Xu, Y.; Wang, G. Stance Detection of Microblog Text Based on Two-Channel CNN-GRU Fusion Network. IEEE Access 2019, 7, 145944–145952. [Google Scholar] [CrossRef]
- Zulqarnain, M.; Rozaida, G.; Muhammad, G.G.; Muhammad, F.M. Efficient processing of GRU based on word embedding for text classification. JOIV Int. J. Informatics Vis. 2019, 3, 377–383. [Google Scholar] [CrossRef]
- Liu, Q.; Liu, Q.; Xiao, L.; Yang, J.; Chan, J.C.-W. Content-Guided Convolutional Neural Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6124–6137. [Google Scholar] [CrossRef]
- Kumar, A.; Sharma, A.; Bharti, V.; Singh, A.K.; Singh, S.K.; Saxena, S. MobiHisNet: A Lightweight CNN in Mobile Edge Computing for Histopathological Image Classification. IEEE Internet Things J. 2021, 8, 17778–17789. [Google Scholar] [CrossRef]
- Wang, M. Multi-path convolutional neural networks for complex image classification. arXiv 2015, arXiv:1506.04701. [Google Scholar]
- Charlton, H. MacRumors. Apple Reportedly Planning to Switch Technology behind A17 Bionic Chip to Cut Cost Next Year. June 2023. Available online: https://www.macrumors.com/2023/06/23/apple-to-switch-tech-behind-a17-to-cut-costs/ (accessed on 5 July 2023).
- Wang, L. Taipei Times. TSMC Says New Chips to Be World’s Most Advanced. May 2023. Available online: https://www.taipeitimes.com/News/biz/archives/2023/05/12/2003799625 (accessed on 25 June 2023).
- Samsung. Exynos. April 2022. Available online: https://www.samsung.com/semiconductor/minisite/exynos/products/all-processors/ (accessed on 6 February 2023).
- Lin, Z.Q.; Chung, A.G.; Wong, A. Edgespeechnets: Highly efficient deep neural networks for speech recognition on the edge. arXiv 2018, arXiv:1810.08559. [Google Scholar]
- Shen, T.; Gao, C.; Xu, D. analysis of intelligent real-time image recognition technology based on mobile edge computing and deep learning. J. Real-Time Image Process. 2021, 18, 1157–1166. [Google Scholar] [CrossRef]
- Subramaniam, P.; Kaur, M.J. Review of security in mobile edge computing with deep learning. In Proceedings of the 2019 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 26 March–10 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2017, 25, 1097–1105. [Google Scholar] [CrossRef]
- Schneible, J.; Lu, A. Anomaly detection on the edge. In Proceedings of the MILCOM 2017–2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 23–25 October 2017; pp. 678–682. [Google Scholar] [CrossRef]
- Sirojan, T.; Lu, S.; Phung, B.T.; Zhang, D.; Ambikairajah, E. Sustainable Deep Learning at Grid Edge for Real-Time High Impedance Fault Detection. IEEE Trans. Sustain. Comput. 2018, 7, 346–357. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep Learning for Edge Computing Applications: A State-of-the-Art Survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
- Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
- Sengupta, A.; Ye, Y.; Wang, R.; Liu, C.; Roy, K. Going Deeper in Spiking Neural Networks: VGG and Residual Architectures. Front. Neurosci. 2019, 13, 95. [Google Scholar] [CrossRef]
- Wen, L.; Li, X.; Gao, L. A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Comput. Appl. 2020, 32, 6111–6124. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- DeepVision (Kinara). March 2022. Available online: https://kinara.ai/about-us/ (accessed on 8 January 2023).
- Kneron. Available online: https://www.kneron.com/page/soc/ (accessed on 13 January 2023).
- Wang, Q.; Yu, N.; Zhang, M.; Han, Z.; Fu, G. N3LDG: A Lightweight Neural Network Library for Natural Language Processing. Beijing Da Xue Xue Bao 2019, 55, 113–119. [Google Scholar] [CrossRef]
- Desai, S.; Goh, G.; Babu, A.; Aly, A. Lightweight convolutional representations for on-device natural language processing. arXiv 2020, arXiv:2002.01535. [Google Scholar]
- Zhang, M.; Yang, J.; Teng, Z.; Zhang, Y. Libn3l: A lightweight package for neural nlp. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portoroz, Slovenia, 23–28 May 2016; pp. 225–229. Available online: https://aclanthology.org/L16-1034 (accessed on 6 July 2023).
- Tay, Y.; Zhang, A.; Tuan, L.A.; Rao, J.; Zhang, S.; Wang, S.; Fu, J.; Hui, S.C. Lightweight and efficient neural natural language processing with quaternion networks. arXiv 2019, arXiv:1906.04393. [Google Scholar]
- Gyrfalcon. LightSpeur 5801S Neural Accelerator. 2022. Available online: https://www.gyrfalcontech.ai/solutions/lightspeeur-5801/ (accessed on 10 December 2022).
- Liu, D.; Kong, H.; Luo, X.; Liu, W.; Subramaniam, R. Bringing AI to edge: From deep learning’s perspective. Neurocomputing 2022, 485, 297–320. [Google Scholar] [CrossRef]
- Li, H. Application of IOT deep learning in edge computing: A review. Acad. J. Comput. Inf. Sci. 2021, 4, 98–103. [Google Scholar]
- Zaidi, S.S.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
- Chen, J.; Ran, X. Deep Learning with Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Rawat, W.; Wang, Z. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
- Al-Saffar, A.M.; Tao, H.; Talab, M.A. Review of deep convolution neural network in image classification. In Proceedings of the 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), Jakarta, Indonesia, 23–24 October 2017; pp. 26–31. [Google Scholar] [CrossRef]
- Iandola, N.F.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Elhassouny, A.; Smarandache, F. Trends in deep convolutional neural Networks architectures: A review. In Proceedings of the 2019 International Conference of Computer Science and Renewable Energies (ICCSRE), Agadir, Morocco, 22–24 July 2019; pp. 1–8. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. arXiv 2018, arXiv:1801.04381. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. VSearching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October-2 November 2019; pp. 1314–1324. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Ningning, M.X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Mingxing, T.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Niv, V. Hailo blog. Object Detection at the Edge: Making the Right Choice. AI on the Edge: The Hailo Blog. October 2022. Available online: https://hailo.ai/blog/object-detection-at-the-edge-making-the-right-choice/ (accessed on 4 January 2023).
- Zhao, Z.-Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
- Hung, J.-M.; Huang, Y.H.; Huang, S.P.; Chang, F.C.; Wen, T.H.; Su, C.I.; Khwa, W.S.; Lo, C.C.; Liu, R.S.; Hsieh, C.C.; et al. An 8-Mb DC-Current-Free Binary-to-8b Precision ReRAM Nonvolatile Computing-in-Memory Macro using Time-Space-Readout with 1286.4-21.6TOPS/W for Edge-AI Devices. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Oruh, J.; Viriri, S.; Adegun, A. Adegun. Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition. IEEE Access 2022, 10, 30069–30079. [Google Scholar] [CrossRef]
- Liu, B.; Zhang, W.; Xu, X.; Chen, D. Time Delay Recurrent Neural Network for Speech Recognition. J. Phys. Conf. Ser. 2019, 1229, 012078. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, J.; Wang, X.; Li, Y. The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 7095–7099. [Google Scholar] [CrossRef]
- Omar, M.; Choi, S.; Nyang, D.; Mohaisen, D. Natural Language Processing: Recent Advances, Challenges, and Future Directions. arXiv 2022, arXiv:2201.00768. [Google Scholar] [CrossRef]
- Yuan, Z.; Yang, Y.; Yue, J.; Liu, R.; Feng, X.; Lin, Z.; Wu, X.; Li, X.; Yang, H.; Liu, Y. 14.2 A 65 nm 24.7 µJ/Frame 12.3 mW Activation-Similarity-Aware Convolutional Neural Network Video Processor Using Hybrid Precision, Inter-Frame Data Reuse and Mixed-Bit-Width Difference-Frame Data Codec. In Proceedings of the 2020 IEEE International Solid- State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 232–234. [Google Scholar] [CrossRef]
- Geoff, T. Advantages of BFloat16 for AI Inference. October 2019. Available online: https://semiengineering.com/advantages-of-bfloat16-for-ai-inference/ (accessed on 7 January 2023).
- OpenAI. GPT-4: Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multi-task learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Fedus, W. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv 2021, arXiv:2101.03961. [Google Scholar]
- Cao, Q.; Trivedi, H.; Balasubramanian, A.; Balasubramanian, N. DeFormer: Decomposing pre-trained transformers for faster question answering. arXiv 2020, arXiv:2005.00697. [Google Scholar]
- Sun, Z.; Yu, H.; Song, X.; Liu, R.; Yang, Y.; Zhou, D. Mobilebert: A compact task-agnostic bert for resource-limited devices. arXiv 2020, arXiv:2004.02984. [Google Scholar]
- Garret. The Synatiant Journey and Pervasive NDP. Blog Post, Processor, August 2021. Available online: https://www.edge-ai-vision.com/2021/08/the-syntiant-journey-and-the-pervasive-ndp/#:~:text=In%20the%20summer%20of%202019,will%20capitalize%20on%20the%20momentum (accessed on 5 May 2022).
- NXP. iMX Application Processors. Available online: https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-9-processors:IMX9-PROCESSORS (accessed on 10 July 2023).
- NXP. i.MX 8M Plus-Arm Cortex-A53, Machine Learning Vision, Multimedia and Industrial IoT. Available online: https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS (accessed on 17 June 2023).
- NXP Datasheet. i.MX 8M Plus SoM Datasheet. Available online: https://www.solid-run.com/wp-content/uploads/2021/06/i.MX8M-Plus-Datasheet-2021-.pdf (accessed on 10 February 2023).
- Deleo, Cision, PRNewwire. Mythic Expands Product Lineup with New Scalable, Power-Efficient Analog Matrix Processor for Edge AI Applications. Mythic 1076. Available online: https://www.prnewswire.com/news-releases/mythic-expands-product-lineup-with-new-scalable-power-efficient-analog-matrix-processor-for-edge-ai-applications-301306344.html (accessed on 10 May 2023).
- Foxton, S.W. EETimes. Mythic Launches Second AI Chip. Available online: https://www.eetasia.com/mythic-launches-second-ai-chip/ (accessed on 20 April 2022).
- Fick, L.; Skrzyniarz, S.; Parikh, M.; Henry, M.B.; Fick, D. Analog Matrix Processor for Edge AI Real-Time Video Analytics. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 260–262. [Google Scholar]
- Gyrfalcon. PIM AI Accelerators. Available online: https://www.gyrfalcontech.ai/ (accessed on 1 August 2023).
- Modha, D.S.; Akopyan, F.; Andreopoulos, A.; Appuswamy, R.; Arthur, J.V.; Cassidy, A.S.; Datta, P.; DeBole, M.V.; Esser, S.K.; Otero, C.O.; et al. IBM NorthPole neural inference machine. In Proceedings of the HotChips Conference, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv 2020, arXiv:1710.09282. [Google Scholar]
- Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]
- Nan, K.; Liu, S.; Du, J.; Liu, H. Deep model compression for mobile platforms: A survey. Tsinghua Sci. Technol. 2019, 24, 677–693. [Google Scholar] [CrossRef]
- Berthelier, A.; Chateau, T.; Duffner, S.; Garcia, C.; Blanc, C. Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey. J. Signal Process. Syst. 2021, 93, 863–878. [Google Scholar] [CrossRef]
- Lei, J.; Gao, X.; Song, J.; Wang, X.L.; Song, M.L. Survey of Deep Network Model Compression. J. Softw. 2018, 29, 251–266. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Qin, Q.; Ren, J.; Yu, J.; Wang, H.; Gao, L.; Zheng, J.; Feng, Y.; Fang, J.; Wang, Z. To compress, or not to compress: Characterizing deep learning model compression for embedded inference. In Proceedings of the 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, Australia, 11–13 December 2018; pp. 729–736. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
- Chunyu, Y.; Agaian, S.S. A comprehensive review of Binary Neural Network. arXiv 2023, arXiv:2110.06804. [Google Scholar]
- Analog Devices Inc. MAX78000. Available online: https://www.analog.com/en/products/max78000.html (accessed on 9 July 2024).
- Mouser Electronics. Maxim Integrated’s New Neural-Network-Accelerator MAX78000 SoC Now Available at Mouser. Available online: https://www.mouser.com/publicrelations_maxim_max78000_2020final/ (accessed on 9 July 2024).
- Apple. Press Release. Apple Unleashes M1. 10 November 2020. Available online: https://www.apple.com/newsroom/2020/11/apple-unleashes-m1/ (accessed on 5 December 2021).
- Nanoreview.net. A14 Bionic vs. A15 Bionic. Available online: https://nanoreview.net/en/soc-compare/apple-a15-bionic-vs-apple-a14-bionic (accessed on 16 June 2023).
- Cross, J. Macworld. Apple’s A16 Chip Doesn’t Live up to Its ‘Pro’ Price or Expectations. Available online: https://www.macworld.com/article/1073243/a16-processor-cpu-gpu-lpddr5-memory-performance.html (accessed on 1 January 2023).
- Merrit, R. Startup Accelerates AI at the Sensor. EETimes 11 February 2019. Available online: https://www.eetimes.com/startup-accelerates-ai-at-the-sensor/ (accessed on 10 June 2023).
- Clarke, P. Indo-US Startup Preps Agent-based AI Processor. EENews. 26 August 2018. Available online: https://www.eenewsanalog.com/en/indo-us-startup-preps-agent-based-ai-processor-2/ (accessed on 20 June 2023).
- Ghilardi, M. Synsense Secures Additional Capital from Strategic Investors. News Synsecse. 18 April 2023. Available online: https://www.venturelab.swiss/SynSense-secures-additional-capital-from-strategic-investors (accessed on 5 May 2023).
- ARM, NPU, Ethos-78. Highly Scalaeable and Efficient Second Generation ML Inference Processor. Available online: https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-n78 (accessed on 15 May 2022).
- Frumusanu. Arm Announces Ethos-N78: Bigger and More Efficient. Anandtech. 27 May 2020. Available online: https://www.anandtech.com/show/15817/arm-announces-ethosn78-npu-bigger-and-more-efficient (accessed on 25 April 2022).
- AIMotive. Industry High 98% Efficiency Demonstrated Aimotive and Nextchip. 15 April 2021. Available online: https://aimotive.com/-/industry-high-98-efficiency-demonstrated-by-aimotive-and-nextchip (accessed on 25 March 2022).
- AIMotive. NN Acceleration for Automotive AI. Available online: https://aimotive.com/aiware-apache5 (accessed on 25 May 2022).
- Blaize. 2022 Best Edge AI Processor Blaize Pathfinder P1600 Embedded System on Module. Available online: https://www.blaize.com/products/ai-edge-computing-platforms/ (accessed on 5 December 2022).
- Wheeler, B. Bitmain SoC Brings AI to the Edge. Available online: https://www.linleygroup.com/newsletters/newsletter_detail.php?num=5975&year=2019&tag=3 (accessed on 23 July 2023).
- Liang, W. Get Started, Neural Network Stick. Github. 10 May 2019. Available online: https://github.com/BM1880-BIRD/bm1880-system-sdk/wiki/GET-STARTED (accessed on 16 May 2023).
- Brainchip. Introducing the ADK1000 IP and NSOM for Edge AI IoT. May 2020. Available online: https://www.youtube.com/watch?v=EUGx45BCKlE (accessed on 20 November 2022).
- Clarke, P. eeNews. Akida Spiking Neural Processor Could Head to FDSOI. 2 August 2021. Available online: https://www.eenewsanalog.com/news/akida-spiking-neural-processor-could-head-fdsoi (accessed on 25 November 2022).
- Gwennap, L. Kendryte Embeds AI for Surveillance. Available online: https://www.linleygroup.com/newsletters/newsletter_detail.php?num=5992 (accessed on 14 July 2023).
- Canaan. Kendryte K210. Available online: https://canaan.io/product/kendryteai (accessed on 15 May 2023).
- CEVA. Edge AI & Deep Learning. Available online: https://www.ceva-dsp.com/app/deep-learning/ (accessed on 10 July 2023).
- Demler, M. CEVA Neupro Accelerator Neural Nets. Microprocessor Report, January 2018. Available online: https://www.ceva-dsp.com/wp-content/uploads/2018/02/Ceva-NeuPro-Accelerates-Neural-Nets.pdf. (accessed on 10 July 2023).
- Cadence. Tesilica AI Platform. Available online: https://www.cadence.com/en_US/home/tools/ip/tensilica-ip/tensilica-ai-platform.html (accessed on 12 December 2022).
- Cadence Newsroom. Cadence Accelerates Intelligent SoC Development with Comprehensive On-Device Tensilica AI Platform. 13 September 2021. Available online: https://www.cadence.com/en_US/home/company/newsroom/press-releases/pr/2021/cadence-accelerates-intelligent-soc-development-with-comprehensi.html (accessed on 25 August 2022).
- Maxfield, M. Say Hello to Deep Vision’s Polymorphic Dataflow Architecture. EE Journal 24 December 2020. Available online: https://www.eejournal.com/article/say-hello-to-deep-visions-polymorphic-dataflow-architecture/ (accessed on 5 December 2022).
- Ward-Foxton, S. AI Startup Deepvision Raises Funds Preps Next Chip. EETimes. 15 September 2021. Available online: https://www.eetasia.com/ai-startup-deep-vision-raises-funds-preps-next-chip/ (accessed on 5 December 2022).
- Eta Compute. Micropower AI Vision Platform. Available online: https://etacompute.com/tensai-flow/ (accessed on 15 May 2023).
- FlexLogic. Flexlogic Announces InferX High Performance IP for DSP and AI Inference. 24 April 2023. Available online: https://flex-logix.com/inferx-ai/inferx-ai-hardware/ (accessed on 12 June 2023).
- Edge TPU. Coral Technology. Available online: https://coral.ai/technology/ (accessed on 20 May 2022).
- Coral. USB Accelerator. Available online: https://coral.ai/products/accelerator/ (accessed on 13 June 2022).
- SolidRun. Janux GS31 AI Server. Available online: https://www.solid-run.com/embedded-networking/nxp-lx2160a-family/ai-inference-server/ (accessed on 25 May 2022).
- GreenWaves. GAP9 Processor for Hearables and Sensors. Available online: https://greenwaves-technologies.com/gap9_processor/ (accessed on 18 June 2023).
- Deleo. GreenWaves. GAP9. GreenWaves Unveils Groundbreaking Ultra-Low Power GAP9 IoT Application Processor for the Next Wave of Intelligence at the Very Edge. Available online: https://greenwaves-technologies.com/gap9_iot_application_processor/ (accessed on 8 August 2023).
- France, G. Design & Reuse, GreenWaves, GAP9. Available online: https://www.design-reuse.com/news/47305/greenwaves-iot-processor.html (accessed on 7 July 2024).
- Horizon, A.I. Efficient AI Computing for Automotive Intelligence. Available online: https://en.horizon.ai/ (accessed on 6 December 2022).
- Horizon Robotics. Horizon Robotics and BYD Announce Cooperation on BYD’s BEV Perception Solution Powered by Journey 5 Computing Solution at Shanghai Auton Show 2023. Cision PR Newswire. 19 April 2023. Available online: https://www.prnewswire.com/news-releases/horizon-robotics-and-byd-announce-cooperation-on-byds-bev-perception-solution-powered-by-journey-5-computing-solution-at-shanghai-auto-show-2023-301802072.html (accessed on 20 June 2023).
- Zheng. Horizon Robotics’ AI Chip with up to 128 TOPS Computing Power Gets Key Certification. Cnevpost. 6 July 2021. Available online: https://cnevpost.com/2021/07/06/horizon-robotics-ai-chip-with-up-to-128-tops-computing-power-gets-key-certification/ (accessed on 16 June 2022).
- Hailo. The World’s Top Performance AI Processor for Edge Devices. Available online: https://hailo.ai/ (accessed on 20 May 2023).
- Brown. Hailo-8 NPU Ships on Linux-Powered Lanner Edge System. 1 June 2021. Available online: https://linuxgizmos.com/hailo-8-npu-ships-on-linux-powered-lanner-edge-systems/ (accessed on 10 July 2022).
- Rajendran, B.; Sebastian, A.; Schmuker, M.; Srinivasa, N.; Eleftheriou, E. Low-Power Neuromorphic Hardware for Signal Processing Applications: A Review of Architectural and System-Level Design Approaches. IEEE Signal Process. Mag. 2019, 36, 97–110. [Google Scholar] [CrossRef]
- Carmelito. Intel Neural Compute Stick 2-Review. Element14. 8 March 2021. Available online: https://community.element14.com/products/roadtest/rv/roadtest_reviews/954/intel_neural_compute_3 (accessed on 24 March 2023).
- Modha, D.S.; Akopyan, F.; Andreopoulos, A.; Appuswamy, R.; Arthur, J.V.; Cassidy, A.S.; Datta, P.; DeBole, M.V.; Esser, S.K.; Otero, C.O.; et al. Neural inference at the frontier of energy, space, and time. Science 2023, 382, 329–335. [Google Scholar] [CrossRef]
- Imagination. Power Series3NX, Advanced Compute and Neural Network Processors Enabling the Smart Edge. Available online: https://www.imaginationtech.com/vision-ai/powervr-series3nx/ (accessed on 10 June 2022).
- Har-Evan, B. Seperating the Wheat from the Chaff in Embedded AI with PowerVR Series3NX. 24 January 2019. Available online: https://www.imaginationtech.com/blog/separating-the-wheat-from-the-chaff-in-embedded-ai/ (accessed on 25 July 2022).
- Ueyoshi, K.; Papistas, I.A.; Houshmand, P.; Sarda, G.M.; Jain, V.; Shi, M.; Zheng, Q.; Giraldo, S.; Vrancx, P.; Doevenspeck, J.; et al. DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Flaherty, N.; Axelera Shows DIANA Analog In-Memory Computing Chip. EENews. 21 Feburaury 2022. Available online: https://www.eenewseurope.com/en/axelera-shows-diana-analog-in-memory-computing-chip/ (accessed on 22 July 2023).
- Imagination. The Ideal Single Core Solution for Neural Network Acceleration. Available online: https://www.imaginationtech.com/product/img-4nx-mc1/ (accessed on 16 June 2022).
- Memryx. Available online: https://memryx.com/products/ (accessed on 1 August 2023).
- MobileEye. One Automatic Grade SoC, Many Mobility Solutions. Available online: https://www.mobileye.com/our-technology/evolution-eyeq-chip/ (accessed on 4 August 2023).
- EyeQ5. Wikichip. March 2021. Available online: https://en.wikichip.org/wiki/mobileye/eyeq/eyeq5 (accessed on 22 June 2023).
- Casil, D. Mobileye Presents EyeQ Ultra, the Chip That Promises True Level 4 Autonomous Driving in 2025. 1 July 2022. Available online: https://www.gearrice.com/update/mobileye-presents-eyeq-ultra-the-chip-that-promises-true-level-4-autonomous-driving-in-2025/ (accessed on 5 June 2023).
- MobileEye. Meet EyeQ6: Our Most Advanced Driver-Assistance Chip Yet. 25 May 2022. Available online: https://www.mobileye.com/blog/eyeq6-system-on-chip/ (accessed on 27 May 2023).
- Mediatek. i350. Mediatek Introduces i350 Edge AI Platform Designed for Voice and Vision Processing Applications. 14 October 2020. Available online: https://corp.mediatek.com/news-events/press-releases/mediatek-introduces-i350-edge-ai-platform-designed-for-voice-and-vision-processing-applications (accessed on 16 May 2023).
- Nvidia. Jetson Nano. Available online: https://elinux.org/Jetson_Nano#:~:text=Useful%20for%20deploying%20computer%20vision,5%2D10W%20of%20power%20consumption (accessed on 26 May 2023).
- Nvidia, Jetson Orin. The Future of Industrial-Grade Edge AI. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/ (accessed on 25 July 2023).
- Perceive. Put High Power Intelligence in a Low Poer Device. Available online: https://perceive.io/product/ergo/ (accessed on 16 May 2023).
- Tan, Z.; Wu, Y.; Zhang, Y.; Shi, H.; Zhang, W.; Ma, K. A scaleable multi-chiplet deep learning accelerator with hub-side 2.5D heterogeneous integration. In Proceedings of the HotChip Conference 2023, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
- Deleon, L. Build Enhanced Video Conference Experiences. Qualcom. 7 March 2023. Available online: https://developer.qualcomm.com/blog/build-enhanced-video-conference-experiences (accessed on 5 May 2023).
- Qualcomm, QCS8250. Premium Processor Designed to Help You Deliver Maximum Performance for Compute Intensive Camera, Video Conferencing and Edge AI Applications with Support Wi-Fi 6 and 5G for the Internet of Things (IoT). Available online: https://www.qualcomm.com/products/qcs8250 (accessed on 15 July 2023).
- Snapdragon. 888+ 5G Mobile Platform. Available online: https://www.qualcomm.com/products/snapdragon-888-plus-5g-mobile-platform (accessed on 24 May 2023).
- Qualcomm. Qualcomm Snapdragon 888 Plus, Benchmark, Test and Spec. CPU Monkey. 16 June 2023. Available online: https://www.cpu-monkey.com/en/cpu-qualcomm_snapdragon_888_plus (accessed on 15 July 2023).
- Hsu. Training ML Models at the Edge with Federated Learning. Qualcomm 7 June 2021. Available online: https://developer.qualcomm.com/blog/training-ml-models-edge-federated-learning (accessed on 7 July 2023).
- Mahurin, E. Qualcomm Hexagon NPU. In Proceedings of the HotChip Conference 2023, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
- Yida. Introducing the Rock Pi N10 RK3399Pro SBC for AI and Deep Learning. Available online: https://www.seeedstudio.com/blog/2019/12/04/introducing-the-rock-pi-n10-rk3399pro-sbc-for-ai-and-deep-learning/ (accessed on 17 May 2023).
- GadgetVersus. Amalogic A311D Processor Benchmarks and Specs. Available online: https://gadgetversus.com/processor/amlogic-a311d-specs/ (accessed on 16 May 2023).
- Samsung. The Core that Redefines Your Device. Available online: https://www.samsung.com/semiconductor/minisite/exynos/products/all-processors/ (accessed on 25 May 2023).
- GSMARENA. Exynos 2100 Vs Snapdragon 888: Benchmarking the Samsung Galaxy S21 Ultra Versions. GSMARENA. 7 February 2021. Available online: https://www.gsmarena.com/exynos_2100_vs_snapdragon_888_benchmarking_the_samsung_galaxy_s21_ultra_performance-news-47611.php (accessed on 10 June 2023).
- Samsung. Exynos 2200. Available online: https://semiconductor.samsung.com/us/processor/mobile-processor/exynos-2200/ (accessed on 1 June 2023).
- Samsung. Samsung Brings PIM Technology to Wider Applications. 24 August 2021. Available online: https://www.samsung.com/semiconductor/newsroom/news-events/samsung-brings-in-memory-processing-power-to-wider-range-of-applications/ (accessed on 18 May 2023).
- Kim, J.H.; Kang, S.-H.; Lee, S.; Kim, H.; Song, W.; Ro, Y.; Lee, S.; Wang, D.; Shin, H.; Phuah, B.; et al. Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In Proceedings of the 2021 IEEE Hot Chips 33 Symposium (HCS), Palo Alto, CA, USA, 22–24 August 2021; pp. 1–26. [Google Scholar]
- Dhruvanarayan, S.; Bittorf, V. MLSoCTM—An Overview. In Proceedings of the HotChips Conference 2023, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
- SiMa.ai. Available online: https://sima.ai/ (accessed on 3 September 2023).
- Synopsys. Designware ARC EV Processors for Embedded Vsion. Available online: https://www.synopsys.com/designware-ip/processor-solutions/ev-processors.html (accessed on 25 July 2022).
- Synopsys. Synopsys EV7x Vision Processor. Available online: https://www.synopsys.com/dw/ipdir.php?ds=ev7x-vision-processors (accessed on 25 May 2023).
- Syntiant. Making Edge AI a Reality: A New Processor for Deep Learning. Available online: https://www.syntiant.com/ (accessed on 18 June 2023).
- Syntiant. NDP100 Neural Decision Processor- NDP100- Always-on Speech Recognition. Available online: https://www.syntiant.com/ndp100 (accessed on 28 June 2023).
- Tyler, N. Syntiant Introduces NDP102 Neural Decision Processor. Newelectronics. 16 September 2021. Available online: https://www.newelectronics.co.uk/content/news/syntiant-introduces-ndp102-neural-decision-processor (accessed on 28 June 2023).
- Demler, M. Syntiant NDP120 Sharpens Its Hearing, Wake-Word Detector COmbines Ultra-Low Power DLA with HiFi 3DSP. 2021. Available online: https://www.linleygroup.com/mpr/article.php?id=12455 (accessed on 20 June 2023).
- Halfacree, G. Syntiant’s NDP200 Promises 6.4GOP/s of Edge AI Compute in a Tiny 1mW Power Envelope. Hackster.io. 2021. Available online: https://www.hackster.io/news/syntiant-s-ndp200-promises-6-4gop-s-of-edge-ai-compute-in-a-tiny-1mw-power-envelope-96590283ffbc (accessed on 29 June 2023).
- Think Silicon. Nema Pico XS. Available online: https://www.think-silicon.com/nema-pico-xs#features (accessed on 23 May 2023).
- Wikichip. FSD Chip. Wikichip. Available online: https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip (accessed on 28 May 2023).
- Kong, M. VeriSilicon VIP9000 NPU AI Processor and ZSPNano DSP IP bring AI-Vision and AI-Voice to Low Power Automotive Image Processing SoC. VeriSilicon Press Release. 12 May 2020. Available online: https://www.verisilicon.com/en/PressRelease/VIP9000andZSPAdoptedbyiCatch (accessed on 16 July 2022).
- VeriSilicon. VeriSilicon Launches VIP9000, New Generation of Neural Processor Unit IP. VeriSilicon Press Release. 8 July 2019. Available online: https://www.verisilicon.com/en/PressRelease/VIP9000 (accessed on 25 May 2022).
- Untether. The Most Efficient AI Computer Engine Available. Available online: https://www.untether.ai/press-releases/untether-ai-ushers-in-the-petaops-era-with-at-memory-computation-for-ai-inference-workloads (accessed on 18 May 2023).
- Untether. Untether AI. Available online: https://www.colfax-intl.com/downloads/UntetherAI-tsunAImi-Product-Brief.pdf (accessed on 18 May 2023).
- Upmem. The PIM Reference Platform. Available online: https://www.upmem.com/technology/ (accessed on 19 May 2023).
- Lavenier, D.; Cimadomo, R.; Jodin, R. Variant Calling Parallelization on Processor-in-Memory Architecture. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Republic of Korea, 16–19 December 2020; pp. 204–207. [Google Scholar] [CrossRef]
- Gómez-Luna, J.; El Hajj, I.; Fernandez, I.; Giannoula, C.; Oliveira, G.F.; Mutlu, O. Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware. arXiv 2021, arXiv:2110.01709. [Google Scholar]
- Ian Cutress. Hot Chips 31 Analysis: In Memory Processing by Upmem. Anandtech. 18 August 2019. Available online: https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem (accessed on 20 May 2023).
- Mo, H.; Zhu, W.; Hu, W.; Wang, G.; Li, Q.; Li, A.; Yin, S.; Wei, S.; Liu, L. 9.2 A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 146–148. [Google Scholar] [CrossRef]
- Yin, S.; Zhang, B.; Kim, M.; Saikia, J.; Kwon, S.; Myung, S.; Kim, H.; Kim, S.J.; Seok, M.; Seo, J.S. PIMCA: A 3.4-Mb Programmable In-Memory Computing Accelerator in 28nm for On-Chip DNN Inference. In Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, 13–19 June 2021; pp. 1–2. [Google Scholar] [CrossRef]
- Fujiwara, H.; Mori, H.; Zhao, W.C.; Chuang, M.C.; Naous, R.; Chuang, C.K.; Hashizume, T.; Sun, D.; Lee, C.F.; Akarvardar, K.; et al. A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully Digital Computing-in-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Wang, S.; Kanwar, P. BFloat16: The Secret to High Performance on Cloud TPUs. August 2019. Available online: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus (accessed on 18 September 2022).
- Lee, S.; Kim, K.; Oh, S.; Park, J.; Hong, G.; Ka, D.; Hwang, K.; Park, J.; Kang, K.; Kim, J.; et al. A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Demer, M. Blaize Ignites Edge-AI Performance, Microprocessor Report. September 2020. Available online: https://www.blaize.com/wp-content/uploads/2020/09/Blaize-Ignites-Edge-AI-Performance.pdf (accessed on 1 June 2023).
- Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
- Mahdi, B.M.; Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artif. Intell. Rev. 2021, 54, 6391–6438. [Google Scholar] [CrossRef]
- Yang, H.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft filter pruning for accelerating deep convolutional neural networks. arXiv 2018, arXiv:1808.06866. [Google Scholar]
- Torsten, H.; Alistarh, D.; Ben-Nun, T.; Dryden, N.; Peste, A. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. arXiv 2021, arXiv:2102.00554. [Google Scholar]
- Sanh, V.; Wolf, T.; Rush, A. Movement pruning: Adaptive sparsity by fine-tuning. arXiv 2020, arXiv:2005.07683. [Google Scholar]
- Cristian, B.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [Google Scholar] [CrossRef]
- Jianping, G.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Kim, Y.; Rush, A.M. Sequence-level knowledge distillation. arXiv 2016, arXiv:1606.07947. [Google Scholar]
- Zeyuan, Z.; Li, Y. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv 2023, arXiv:2012.09816. [Google Scholar]
- Huang, M.; You, Y.; Chen, Z.; Qian, Y.; Yu, K. Knowledge Distillation for Sequence Model. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 3703–3707. [Google Scholar] [CrossRef]
- Hyun, C.J.; Hariharan, B. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4794–4802. [Google Scholar]
- Tambe, T.; Hooper, C.; Pentecost, L.; Jia, T.; Yang, E.Y.; Donato, M.; Sanh, V.; Whatmough, P.; Rush, A.M.; Brooks, D.; et al. EdgeBERT: Optimizing On-chip inference for multi-task NLP. arXiv 2020, arXiv:2011.14203. [Google Scholar]
- Tensorflow. An End-to-End Open-Source Machine Learning Platform. Available online: https://www.tensorflow.org/ (accessed on 1 May 2023).
- Li, S. TensorFlow Lite: On-Device Machine Learning Framework. J. Comput. Res. Dev. 2020, 57, 1839–1853. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Pytorch, Pytorch Mobile. End to End Workflow from Training to Deployment for iOS and Android Mobile Devices. Available online: https://pytorch.org/mobile/home/ (accessed on 20 December 2022).
- Keras. Keras API References. Available online: https://keras.io/api/ (accessed on 20 December 2022).
- Caffe2. A New Lightweight, Modular, and Scalable Deep Learning Framework. Available online: https://research.facebook.com/downloads/caffe2/ (accessed on 21 December 2022).
- Zelinsky, A. Learning OpenCV—Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008) [On the Shelf]. IEEE Robot. Autom. Mag. 2009, 16, 100. [Google Scholar] [CrossRef]
- ONNX. Open Neural Network Exchange-the Open Standard for Machine Learning Interoperability. Available online: https://onnx.ai/ (accessed on 22 December 2022).
- MXNet. A Flexible and Efficient Efficient Library for Deep Learning. Available online: https://mxnet.apache.org/versions/1.9.0/ (accessed on 22 December 2022).
- ONNX. Meta AI. Available online: https://ai.facebook.com/tools/onnx/ (accessed on 23 December 2022).
- Vajda, P.; Jia, Y. Delivering Real-Time AI in the Palm of Your Hand. Available online: https://engineering.fb.com/2016/11/08/android/delivering-real-time-ai-in-the-palm-of-your-hand/ (accessed on 27 December 2022).
- CEVA. CEVA NeuPro-S On-Device Computer Vision Processor Architecture. September 2020. Available online: https://www.ceva-dsp.com/wpcontent/uploads/2020/11/09_11_20_NeuPro-S_Brochure_V2.pdf (accessed on 17 July 2022).
- Merolla, P.A.; Arthur, J.V.; Alvarez-Icaza, R.; Cassidy, A.S.; Sawada, J.; Akopyan, F.; Jackson, B.L.; Imam, N.; Guo, C.; Nakamura, Y.; et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 2014, 345, 668–673. [Google Scholar] [CrossRef]
- Yakopcic, C.; Rahman, N.; Atahary, T.; Taha, T.M.; Douglass, S. Solving Constraint Satisfaction Problems Using the Loihi Spiking Neuromorphic Processor. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2020; pp. 1079–1084. [Google Scholar] [CrossRef]
- Bohnstingl, T. Neuromorphic Hardware Learns to Learn. Front. Neurosci. 2019, 13, 483. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, S.B.; Orchard, G. Slayer: Spike layer error reassignment in time. arXiv 2018, arXiv:1810.08646. [Google Scholar]
- Davidson, S.; Furber, S.B. Comparison of Artificial and Spiking Neural Networks on Digital Hardware. Front. Neurosci. 2021, 15, 345. [Google Scholar] [CrossRef]
- Blouw, P.; Choo, X.; Hunsberger, E.; Eliasmith, C. Benchmarking keyword spotting efficiency on neuromorphic hardware. In Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop, Albany, NY, USA, 26–28 March 2019; pp. 1–8. [Google Scholar] [CrossRef]
- NengoLoihi. Available online: https://www.nengo.ai/nengo-loihi/ (accessed on 20 November 2022).
- Nengo. Spinnaker backend for Nengo. Available online: https://nengo-spinnaker.readthedocs.io/en/latest/ (accessed on 20 November 2022).
- NengoDL. Available online: https://www.nengo.ai/nengo-dl/ (accessed on 20 November 2022).
- Brainchip. MetaTF. Available online: https://brainchip.com/metatf-development-environment/ (accessed on 10 July 2023).
- Demer, M. Brainchip Akida Is a Faster Learner. Microprocessor Report, Lynely Group. 28 October 2019. Available online: https://d1io3yog0oux5.cloudfront.net/brainchipinc/files/BrainChip+Akida+Is+a+Fast+Learner.pdf (accessed on 12 July 2023).
- Lava. Lava Software Framework. Available online: https://lava-nc.org/ (accessed on 26 November 2022).
- Reuther, A.; Michaleas, P.; Jones, M.; Gadepally, V.; Samsi, S.; Kepner, J. AI and ML Accelerator Survey and Trends. In Proceedings of the 2022 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 19–23 September 2022; pp. 1–10. [Google Scholar] [CrossRef]
- Chen, Y.; Xie, Y.; Song, L.; Chen, F.; Tang, T. A Survey of Accelerator Architectures for Deep Neural Networks. Engineering 2020, 6, 264–274. [Google Scholar] [CrossRef]
- Li, W.; Liewig, M. A survey of AI accelerators for edge environments. In Proceedings of the World Conference on Information Systems and Technologies, Budva, Montenegro, 7–10 April 2020; Springer: Cham, Switzerland, 2020; pp. 35–44. [Google Scholar] [CrossRef]
- Murshed, M.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine Learning at the Network Edge: A Survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Lin, W.; Adetomi, A.; Arslan, T. Low-Power Ultra-Small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions. Electronics 2021, 10, 2048. [Google Scholar] [CrossRef]
- Reuther, A.; Michaleas, P.; Jones, M.; Gadepally, V.; Samsi, S.; Kepner, J. Survey of Machine Learning Accelerators. In Proceedings of the 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 22–24 September 2020; pp. 1–12. [Google Scholar] [CrossRef]
- Xue, C.-X.; Hung, J.M.; Kao, H.Y.; Huang, Y.H.; Huang, S.P.; Chang, F.C.; Chen, P.; Liu, T.W.; Jhang, C.J.; Su, C.I.; et al. 16.1 A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 245–247. [Google Scholar] [CrossRef]
- Chih, Y.-D.; Lee, P.H.; Fujiwara, H.; Shih, Y.C.; Lee, C.F.; Naous, R.; Chen, Y.L.; Lo, C.P.; Lu, C.H.; Mori, H.; et al. 16.4 An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 252–254. [Google Scholar] [CrossRef]
- Dong, Q.; Sinangil, M.E.; Erbagci, B.; Sun, D.; Khwa, W.S.; Liao, H.J.; Wang, Y.; Chang, J. 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications. In Proceedings of the 2020 IEEE International Solid- State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 242–244. [Google Scholar] [CrossRef]
- Yuan, G.; Behnam, P.; Li, Z.; Shafiee, A.; Lin, S.; Ma, X.; Liu, H.; Qian, X.; Bojnordi, M.N.; Wang, Y.; et al. FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, 14–19 June 2021; pp. 265–278. [Google Scholar] [CrossRef]
- Khaddam-Aljameh, R.; Stanisavljevic, M.; Mas, J.F.; Karunaratne, G.; Brandli, M.; Liu, F.; Singh, A.; Muller, S.M.; Petropoulos, A.; Antonakopoulos, T.; et al. HERMES Core—A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing. In Proceedings of the 2021 Symposium on VLSI Technology, Kyoto, Japan, 13–19 June 2021; pp. 1–2. [Google Scholar]
- Caminal, H.; Yang, K.; Srinivasa, S.; Ramanathan, A.K.; Al-Hawaj, K.; Wu, T.; Narayanan, V.; Batten, C.; Martínez, J.F. CAPE: A Content-Addressable Processing Engine. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 27 February–3 March 2021; pp. 557–569. [Google Scholar] [CrossRef]
- Park, S.; Park, C.; Kwon, S.; Jeon, T.; Kang, Y.; Lee, H.; Lee, D.; Kim, J.; Kim, H.S.; Lee, Y.; et al. A Multi-Mode 8K-MAC HW-Utilization-Aware Neural Processing Unit with a Unified Multi-Precision Datapath in 4nm Flagship Mobile SoC. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 246–248. [Google Scholar] [CrossRef]
- Zhu, H.; Jiao, B.; Zhang, J.; Jia, X.; Wang, Y.; Guan, T.; Wang, S.; Niu, D.; Zheng, H.; Chen, C.; et al. COMB-MCM: Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Niu, D.; Li, S.; Wang, Y.; Han, W.; Zhang, Z.; Guan, Y.; Guan, T.; Sun, F.; Xue, F.; Duan, L.; et al. 184QPS/W 64Mb/mm23D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Chiu, Y.-C.; Yang, C.S.; Teng, S.H.; Huang, H.Y.; Chang, F.C.; Wu, Y.; Chien, Y.A.; Hsieh, F.L.; Li, C.Y.; Lin, G.Y.; et al. A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro with a 192GB/s Read-and-Decryption Bandwidth and 25.1–55.1TOPS/W 8b MAC for AI Operations. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 178–180. [Google Scholar] [CrossRef]
- Khwa, W.-S.; Chiu, Y.C.; Jhang, C.J.; Huang, S.P.; Lee, C.Y.; Wen, T.H.; Chang, F.C.; Yu, S.M.; Lee, T.Y.; Chang, M.F. 11.3 A 40-nm, 2M-Cell, 8b-Precision, Hybrid SLC-MLC PCM Computing-in-Memory Macro with 20.5–65.0TOPS/W for Tiny-Al Edge Devices. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Spetalnick, S.D.; Chang, M.; Crafton, B.; Khwa, W.S.; Chih, Y.D.; Chang, M.F.; Raychowdhury, A. A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2RRAM Binary/Compute-in-Memory Macro with 4.23× Improvement in Density and >75% Use of Sensing Dynamic Range. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Chang, M.; Spetalnick, S.D.; Crafton, B.; Khwa, W.S.; Chih, Y.D.; Chang, M.F.; Raychowdhury, A. A 40nm 60.64TOPS/W ECC-Capable Compute-in-Memory/Digital 2.25MB/768KB RRAM/SRAM System with Embedded Cortex M3 Microprocessor for Edge Recommendation Systems. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Wang, D.; Lin, C.T.; Chen, G.K.; Knag, P.; Krishnamurthy, R.K.; Seok, M. DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 266–268. [Google Scholar] [CrossRef]
- Yue, J.; Feng, X.; He, Y.; Huang, Y.; Wang, Y.; Yuan, Z.; Zhan, M.; Liu, J.; Su, J.W.; Chung, Y.L.; et al. 15.2 A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 238–240. [Google Scholar] [CrossRef]
- Yue, J.; Yuan, Z.; Feng, X.; He, Y.; Zhang, Z.; Si, X.; Liu, R.; Chang, M.F.; Li, X.; Yang, H.; et al. 14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse. In Proceedings of the 2020 IEEE International Solid- State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 234–236. [Google Scholar] [CrossRef]
- Wang, Y.; Qin, Y.; Deng, D.; Wei, J.; Zhou, Y.; Fan, Y.; Chen, T.; Sun, H.; Liu, L.; Wei, S.; et al. A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Matsubara, K.; Lieske, H.; Kimura, M.; Nakamura, A.; Koike, M.; Morikawa, S.; Hotta, Y.; Irita, T.; Mochizuki, S.; Hamasaki, H.; et al. 4.2 A 12nm Autonomous-Driving Processor with 60.4TOPS, 13.8TOPS/W CNN Executed by Task-Separated ASIL D Control. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 56–58. [Google Scholar] [CrossRef]
- Agrawal, A.; Lee, S.K.; Silberman, J.; Ziegler, M.; Kang, M.; Venkataramani, S.; Cao, N.; Fleischer, B.; Guillorn, M.; Cohen, M.; et al. 9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 144–146. [Google Scholar] [CrossRef]
- Park, J.-S.; Jang, J.W.; Lee, H.; Lee, D.; Lee, S.; Jung, H.; Lee, S.; Kwon, S.; Jeong, K.; Song, J.H.; et al. 9.5 A 6K-MAC Feature-Map-Sparsity-Aware Neural Processing Unit in 5nm Flagship Mobile SoC. In Proceedings of the 2021 IEEE International Solid- StateCircuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 152–154. [Google Scholar] [CrossRef]
- Eki, R.; Yamada, S.; Ozawa, H.; Kai, H.; Okuike, K.; Gowtham, H.; Nakanishi, H.; Almog, E.; Livne, Y.; Yuval, G.; et al. 9.6 A 1/2.3inch 12.3Mpixel with On-Chip 4.97TOPS/W CNN Processor Back-Illuminated Stacked CMOS Image Sensor. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 13–22 February 2021; pp. 154–156. [Google Scholar] [CrossRef]
- Lin, C.-H.; Cheng, C.C.; Tsai, Y.M.; Hung, S.J.; Kuo, Y.T.; Wang, P.H.; Tsung, P.K.; Hsu, J.Y.; Lai, W.C.; Liu, C.H.; et al. 7.1 A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC. In Proceedings of the 2020 IEEE International Solid- State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 134–136. [Google Scholar] [CrossRef]
- Huang, W.-H.; Wen, T.H.; Hung, J.M.; Khwa, W.S.; Lo, Y.C.; Jhang, C.J.; Hsu, H.H.; Chin, Y.H.; Chen, Y.C.; Lo, C.C.; et al. A Nonvolatile Al-Edge Processor with 4MB SLC-MLC Hybrid-Mode ReRAM Compute-in-Memory Macro and 51.4-251TOPS/W. In Proceedings of the 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 15–17. [Google Scholar] [CrossRef]
- Tambe, T.; Zhang, J.; Hooper, C.; Jia, T.; Whatmough, P.N.; Zuckerman, J.; Dos Santos, M.C.; Loscalzo, E.J.; Giri, D.; Shepard, K.; et al. 22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management. In Proceedings of the 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 342–344. [Google Scholar] [CrossRef]
- Chiu, Y.-C.; Khwa, W.S.; Li, C.Y.; Hsieh, F.L.; Chien, Y.A.; Lin, G.Y.; Chen, P.J.; Pan, T.H.; You, D.Q.; Chen, F.Y.; et al. A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices. In Proceedings of the 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 496–498. [Google Scholar] [CrossRef]
- Desoli, G.; Chawla, N.; Boesch, T.; Avodhyawasi, M.; Rawat, H.; Chawla, H.; Abhijith, V.S.; Zambotti, P.; Sharma, A.; Cappetta, C.; et al. 16.7 A 40-310TOPS/W SRAM-Based All-Digital Up to 4b In-Memory Computing Multi-Tiled NN Accelerator in FD-SOI 18nm for Deep-Learning Edge Applications. In Proceedings of the 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 19–23 February 2023; pp. 260–262. [Google Scholar] [CrossRef]
- Shih, M.-E.; Hsieh, S.-W.; Tsa, P.-Y.; Lin, M.-H.; Tsung, P.-K.; Chang, E.-J.; Liang, J.; Chang, S.-H.; Nian, Y.-Y.; Wan, Z.; et al. NVE: A 3nm 23.2TOPS/W 12b-Digital-CIM-Based Neural Engine for High Resolution Visual-Quality Enhancement on Smart Devices. In Proceedings of the 2024 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 18–22 February 2024. [Google Scholar]
- Khwa, W.-S.; Wu, P.-C.; Wu, J.-J.; Su, J.-W.; Chen, H.-Y.; Ke, Z.-E.; Chiu, T.-C.; Hsu, J.-M.; Cheng, C.-Y.; Chen, Y.-C.; et al. A 16nm 96Kb Integer/Floating-Point Dual Mode-Gain-CellComputing-in-Memory Macro Achieving 73.3 163.3TOPS/W and 33.2-91.2TFLOPS/W for AI-Edge Devices. In Proceedings of the 2024 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 18–22 February 2024. [Google Scholar]
- Nose, K.; Fujii, T.; Togawa, K.; Okumura, S.; Mikami, K.; Hayashi, D.; Tanaka, T.; Toi, T. A 23.9TOPS/W @ 0.8V, 130TOPS AI Acceleratorwith 16× Performanc e-Accelerable Pruning in 14nm Heterogeneous Embedded MPU for Real-Time Robot Applications. In Proceedings of the 2024 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 18–22 February 2024. [Google Scholar]
- Apple. Press Release. Apple Unveils M2, Taking the Breakthrough Performance and Capabilities of M1 Even Further. 6 June 2022. Available online: https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthrough-performance-and-capabilities/ (accessed on 10 July 2023).
- Dahad, N. Hardware Inference Chip Targets Automotive Applications. 24 December 2019. Available online: https://www.embedded.com/hardware-inference-chip-targets-automotive-applications/ (accessed on 25 June 2022).
- Jouppi, N.P.; Yoon, D.H.; Kurian, G.; Li, S.; Patil, N.; Laudon, J.; Young, C.; Patterson, D. A domain-specific supercomputer for training deep neural networks. Commun. ACM 2020, 63, 67–78. [Google Scholar] [CrossRef]
- Google. How Google Tensor Powers Up Pixel Phones. Available online: https://store.google.com/intl/en/ideas/articles/google-tensor-pixel-smartphone/ (accessed on 6 July 2022).
- Wikichip. Intel Nirvana, Neural Network Processor (NNP). Available online: https://en.wikichip.org/wiki/nervana/nnp (accessed on 14 July 2023).
- Smith, L. 4th Gen Intel Xeon Scalable Processors Launched. StorageReview. 10 January 2023. Available online: https://www.storagereview.com/news/4th-gen-intel-xeon-scalable-processors-launched (accessed on 12 May 2023).
- Burns, J.; Chang, L. Meet the IBM Artificial Intelligence Unit. 18 October 2022. Available online: https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu (accessed on 16 December 2022).
- Gupta, K. IBM Research Introduces Artificial Intelligence Unit (AIU): It’s First Complete System-on-Chip Designed to Run and Train Deep Learning Models Faster and More Efficiently than a General-Purpose CPU. MarkTecPost. 27 October 2022. Available online: https://www.marktechpost.com/2022/10/27/ibm-research-introduces-artificial-intelligence-unit-aiu-its-first-complete-system-on-chip-designed-to-run-and-train-deep-learning-models-faster-and-more-efficiently-than-a-general-purpose-cpu/ (accessed on 20 December 2022).
- Clarke, P. Startup Launches Near-Binary Neural Network Accelerator. EENews 19 May 2020. Available online: https://www.eenewseurope.com/en/startup-launches-near-binary-neural-network-accelerator/ (accessed on 20 December 2022).
- NIDIA Jetson Nano B01. Deep Learning with Raspberry pi and Alternatives. 5 April 2023. Available online: https://qengineering.eu/deep-learning-with-raspberry-pi-and-alternatives.html#Compare_Jetson (accessed on 3 July 2023).
- Ambarella. Available online: https://www.ambarella.com/products/iot-industrial-robotics/ (accessed on 5 March 2024).
- Research and Markets. Neuromorphic Chips: Global Strategic Business Report. Research and Markets, ID: 4805280. Available online: https://www.researchandmarkets.com/reports/4805280/neuromorphic-chips-global-strategic-business (accessed on 16 May 2023).
- GrAI VIP. Life Ready AI Processors. Available online: https://www.graimatterlabs.ai/product (accessed on 16 July 2023).
- Cassidy, S.; Alvarez-Icaza, R.; Akopyan, F.; Sawada, J.; Arthur, J.V.; Merolla, P.A.; Datta, P.; Tallada, M.G.; Taba, B.; Andreopoulos, A.; et al. Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution. In Proceedings of the SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, 16–21 November 2014; pp. 27–38. [Google Scholar] [CrossRef]
- Wax-forton, S. Innatera Unveils Neuromorphic AI Chip to Accelerate Spiking Networks. EETimes. 7 July 2021. Available online: https://www.linleygroup.com/newsletters/newsletter_detail.php?num=6302&year=2021&tag=3 (accessed on 25 May 2023).
- Aufrace, J.L. Innatera Neuromorphic AI Accelerator for Spiking Neural Networks Enables Sub-mW AI Inference. CNX Software-Embedded Systems News. 16 July 2021. Available online: https://www.cnx-software.com/2021/07/16/innatera-neuromorphic-ai-accelerator-for-spiking-neural-networks-snn-enables-sub-mw-ai-inference/ (accessed on 25 May 2023).
- Yousefzadeh, A.; Van Schaik, G.J.; Tahghighi, M.; Detterer, P.; Traferro, S.; Hijdra, M.; Stuijt, J.; Corradi, F.; Sifalakis, M.; Konijnenburg, M. SENeCA: Scalable energy-efficient neuromorphic computer architecture. In Proceedings of the 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Incheon, Republic of Korea, 13–15 June 2022; pp. 371–374. [Google Scholar]
- Konikore. Technology That Sniffs Out Danger. Available online: https://theindexproject.org/post/konikore (accessed on 26 May 2023).
- Syntiant. NDP200 Neural Decision Processor, NDP200 Always-on Vision, Sensor and Speech Recognition. Available online: https://www.syntiant.com/ndp200 (accessed on 28 June 2023).
- Demler, M. Syntiant Knows All the Best Words, NDP10x Speech-Recognition Processors Consume Just 200uW. Microprocessors Report. 2019. Available online: https://www.syntiant.com/post/syntiant-knows-all-the-best-words (accessed on 29 June 2023).
- MemComputing. MEMCPU. Available online: https://www.memcpu.com/ (accessed on 1 July 2023).
- IniLabs. IniLabs. Available online: https://inilabs.com/ (accessed on 1 July 2023).
- Tavanaei, A.; Ghodrati, M.; Kheradpisheh, S.R.; Masquelier, T.; Maida, A. Deep learning in spiking neural networks. Neural Netw. 2019, 111, 47–63. [Google Scholar] [CrossRef] [PubMed]
- Amazon, Coral Edge TPU, Amazon. USB Edge TPU ML Accelerator Coprocessor for Raspberry Pi and Other Embedded Single Board Computers. Available online: https://www.amazon.com/Google-Coral-Accelerator-coprocessor-Raspberry/dp/B07R53D12W (accessed on 2 July 2024).
- Shakir, U. Tesla Slashes Full Self-Driving Price after Elon Musk Said It Would only Get More Expensive. 22 April 2024. Available online: https://www.theverge.com/2024/4/22/24137056/tesla-full-self-driving-fsd-price-cut-8000 (accessed on 5 July 2024).
- Amazon. NVIDIA Jetson AGX Orin. NVIDIA Jetson AGX Orin 64GB Developer Kit. Available online: https://www.amazon.com/NVIDIA-Jetson-Orin-64GB-Developer/dp/B0BYGB3WV4?th=1 (accessed on 8 July 2024).
- Bill Dally. ‘Hardware for Deep Learning’, NVIDIA Corporation. In Proceedings of the HotChip Conference 2023, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
- Kim, J.H.; Ro, Y.; So, J.; Lee, S.; Kang, S.-H.; Cho, Y.; Kim, H.; Kim, B.; Kim, K.; Park, S.; et al. Samsung PIM/PNM for Transformer based AI: Energy Efficiency on PIM/PNM Cluster. In Proceedings of the HotChips Conference, Palo Alto, CA, USA, 27–29 August 2023. [Google Scholar]
Architecture | Precision | Process (nm) | Metrics | Frameworks | Algorithm/Models | Applications |
---|---|---|---|---|---|---|
GPU TPU Neuromorphic PIM SoC ASIC | FP-8,16,32 BF-16 INT-1,2,4,8,16 | 4 5 7 10 14 16 20 22 28 40 | Area Power Throughput Energy Efficiency | Tensorflow (TF) TF Lite Caffe2 Pytorch MXNet ONNX MetaTF Lava Nengo OpenCV DarkNet | SNN MLP CNN VGG ResNet YOLO Inception MobileNet RNN GRU BERT LSTM | Defense Healthcare Cyber Security Vehicle Smartphone Transportation Robotics Education UAV Drones Communication Industry Traffic Control |
Company | Latest Chip | Max Power (W) | Process (nm) | Area (mm2) | Precision INT/FP | Max Performance (TOPS) | Energy Efficiency (TOPS/W) | Architecture | Reference |
---|---|---|---|---|---|---|---|---|---|
Analog Devices | MAX78000 | 1 pJ/MAC | -- | 64 | 1, 2, 4, 8 | -- | -- | Dataflow | [100,101] |
Apple | M1 | 10 | 5 | 119 | 64 | 11 | 1.1 | Dataflow | [102] |
Apple | A14 | 6 | 5 | 88 | 64 | 11 | 1.83 | Dataflow | [103] |
Apple | A15 | 7 | 5 | 64 | 15.8 | 2.26 | Dataflow | [103] | |
Apple | A16 | 5.5 | 4 | 64 | 17 | 3 | Dataflow | [104] | |
* AIStorm | AIStorm | 0.225 | 8 | 2.5 | 11 | Dataflow | [105] | ||
* AlphaIC | RAP-E | 3 | 8 | 30 | 10 | Dataflow | [106] | ||
aiCTX | Dynap-CNN | 0.001 | 22 | 12 | 1 | 0.0002 | 0.2 | Neuromorphic | [15,107] |
* ARM | Ethos78 | 1 | 5 | 16 | 10 | 10 | Dataflow | [108,109] | |
* AIMotive | Apache5 IEP | 0.8 | 16 | 121 | 8 | 1.6–32 | 2 | Dataflow | [110,111] |
* Blaize | Pathfinder, EI Cano | 6 | 14 | 64, FP-8, BF16 | 16 | 2.7 | Dataflow | [112] | |
* Bitman | BM1880 | 2.5 | 28 | 93.52 | 8 | 2 | 0.8 | Dataflow | [113,114] |
* BrainChip | Akida1000 | 2 | 28 | 225 | 1, 2, 4 | 1.5 | 0.75 | Neuromorphic | [115,116] |
* Cannan | Kendrite K210 | 2 | 28 | 8 | 1.5 | 1.25 | Dataflow | [117,118] | |
* CEVA | CEVA-Neuro-S | 16 | 2, 5, 8, 12, 16 | 12.7 | Dataflow | [119] | |||
* CEVA | CEVA-Neuro-M | 0.83 | 16 | 2, 5, 8, 12, 16 | 20 | 24 | Dataflow | [120] | |
* Cadence | DNA100 | 0.85 | 16 | 16 | 4.6 | 3 | Dataflow | [121,122] | |
* Deepvision | ARA-1 | 1.7 | 28 | 8, 16 | 4 | 2.35 | Dataflow | [123] | |
* Deepvision | ARA-2 | 16 | Dataflow | [124] | |||||
* Eta | ECM3532 | 0.01 | 55 | 25 | 8 | 0.001 | 0.1 | Dataflow | [125] |
* FlexLogic | InferX X1 | 13.5 | 7 | 54 | 8 | 7.65 | 0.57 | Dataflow | [126] |
Edge TPU | 2 | 28 | 96 | 8, BF16 | 4 | 2 | Dataflow | [127,128] | |
* Gyrfalcon | LightSpeer 2803S | 0.7 | 28 | 81 | 8 | 16.8 | 24 | PIM | [47,89] |
* Gyrfalcon | LightSpeer 5801 | 0.224 | 28 | 36 | 8 | 2.8 | 12.6 | PIM | [89] |
* Gyrfalcon | Janux GS31 | 650/900 | 28 | 10457.5 | 8 | 2150 | 3.30 | PIM | [129] |
* GreenWaves | GAP9 | 0.05 | 22 | 12.25 | 8, 16, 32 | 0.05 | 1 | Dataflow | [130,131,132] |
* Horizon | Journey 3 | 2.5 | 16 | 8 | 5 | 2 | Dataflow | [133] | |
* Horizon | Journey5/5P | 30 | 16 | 8 | 128 | 4.8 | Dataflow | [134,135] | |
* Hailo | Hailo 8 M2 | 2.5 | 28 | 225 | 4, 8, 16 | 26 | 2.8 | Dataflow | [136,137] |
Intel | Loihi 2 | 0.1 | 7 | 31 | 8 | 0.3 | 3 | Neuromorphic | [9] |
Intel | Loihi | 0.11 | 14 | 60 | 1–9 | 0.03 | 0.3 | Neuromorphic | [9,138] |
* Intel | Intel® Movidius | 2 | 16 | 71.928 | 16 | 4 | 2 | Dataflow | [139] |
IBM | TrueNorth | 0.065 | 28 | 430 | 8 | 0.0581 | 0.4 | Neuroorphic | [10,138] |
IBM | NorthPole | 74 | 12 | 800 | 2, 4, 8 | 200 (INT8) | 2.7 | Dataflow | [90,140] |
* Imagination | PowerVR Series3NX | FP-(8, 16) | 0.60 | Dataflow | [141,142] | ||||
* Imec | DIANA | 22 | 10.244 | 2 | 29.5 (A), 0.14 (D) | 14.4 | PIM + Digital | [143,144] | |
* Imagination | IMG 4NX MC1 | 0.417 | 4, 16 | 12.5 | 30 | Dataflow | [145] | ||
* Kalray | MPPA3 | 15 | 16 | 8, 16 | 255 | 1.67 | Dataflow | [13] | |
* Kneron | KL720 AI | 1.56 | 28 | 81 | 8, 16 | 1.4 | 0.9 | Dataflow | [47] |
* Kneron | KL530 | 0.5 | 8 | 1 | 2 | Dataflow | [47] | ||
* Koniku | Konicore | Neuromorphic | [12] | ||||||
* LeapMind | Efficiera | 0.237 | 12 | 0.422 | 1, 2, 4, 8, 16, 32 | 6.55 | 27.7 | Dataflow | [21] |
* Memryx | MX3 | 1 | -- | -- | 4, 8, 16, BF16 | 5 | 5 | Dataflow | [146] |
* Mythic | M1108 | 4 | 361 | 8 | 35 | 8.75 | PIM | [87] | |
* Mythic | M1076 | 3 | 40 | 294.5 | 8 | 25 | 8.34 | PIM | [18,86,88] |
* mobileEye | EyeQ5 | 10 | 7 | 45 | 4, 8 | 24 | 2.4 | Dataflow | [147,148,149] |
* mobileEye | EyeQ6 | 40 | 7 | 4, 8 | 128 | 3.2 | Dataflow | [150] | |
* Mediatek | i350 | 14 | 0.45 | Dataflow | [151] | ||||
* NVIDIA | Jetson Nano B01 | 10 | 20 | 118 | FP16 | 1.88 | 0.188 | Dataflow | [152] |
* NVIDIA | AGX Orin | 60 | 7 | -- | 8 | 275 | 3.33 | Dataflow | [153] |
* NXP | i.MX 8M+ | 14 | 196 | FP16 | 2.3 | Dataflow | [84,85] | ||
* NXP | i.MX9 | 4 × 10−6 | 12 | Dataflow | [83] | ||||
* Perceive | Ergo | 0.073 | 5 | 49 | 8 | 4 | 55 | Dataflow | [154] |
TSU & Polar Bear Tech | QM930 | 12 | 12 | 1089 | 4, 8, 16 | 20 (INT8) | 1.67 | Dataflow | [155] |
Qualcomm | QCS8250 | 7 | 157.48 | 8 | 15 | Dataflow | [156,157] | ||
Qualcomm | Snapdragon 888+ | 5 | 5 | FP32 | 32 | 6.4 | Dataflow | [158,159,160] | |
Qualcomm | Snapdragon 8 Gen2 | 4 | 4, 8, 16, FP16 | 51 | Dataflow | [161] | |||
* RockChip | rk3399Pro | 3 | 28 | 729 | 8, 16 | 3 | 1 | Dataflow | [162] |
Rokid | Amlogic A311D | 12 | 5 | Dataflow | [163] | ||||
Samsung | Exynos 2100 | 5 | 26 | Dataflow | [164,165] | ||||
Samsung | Exynos 2200 | 4 | 8, 16, FP16 | Dataflow | [166] | ||||
Samsung | HBM-PIM | 0.9 | 20 | 46.88 | 1.2 | 1.34 | PIM | [167,168] | |
Sima.ai | MLSoC | 10 | 16 | 175.55 | 8 | 50 | 5 | Dataflow | [169,170] |
Synopsis | EV7x | 16 | 8, 12, 16, | 2.7 | Dataflow | [171,172] | |||
* Syntiant | NDP100 | 0.00014 | 40 | 2.52 | 0.000256 | 20 | PIM | [173,174] | |
* Syntiant | NDP101 | 0.0002 | 40 | 25 | 1, 2, 4,8 | 0.004 | 20 | PIM | [173,175] |
* Syntiant | NDP102 | 0.0001 | 40 | 4.2921 | 1, 2, 4, 8 | 0.003 | 20 | PIM | [173,175] |
* Syntiant | NDP120 | 0.0005 | 40 | 7.75 | 1, 2, 4, 8 | 0.0019 | 3.8 | PIM | [173,176] |
* Syntiant | NDP200 | 0.001 | 40 | 1, 2, 4, 8 | 0.0064 | 6.4 | PIM | [173,177] | |
Think Silicon | NEMA®|pico XS | 0.0003 | 28 | 0.11 | FP16, 32 | 0.0018 | 6 | Dataflow | [178] |
Tesla/Samsung | FSD Chip | 36 | 14 | 260 | 8, FP-8 | 73.72 | 2.04 | Dataflow | [179] |
Videntis | TEMPO | Neuromorphic | [11] | ||||||
Verisilicon | VIP9000 | 16 | 16, FP16 | 0.5–100 | Dataflow | [180,181] | |||
Untether | TsunAImi | 400 | 16 | 8 | 2008 | 8 | PIM | [182,183] | |
UPMEM | UPMEM-PIM | 700 | 20 | 32, 64 | 0.149 | PIM | [184,185,186,187] |
Company | Product | Supported Neural Networks | Supported Frameworks | Application/Benefits |
---|---|---|---|---|
Apple | Apple A14 | DNN | TFL | iPhone12 series |
Apple | Apple A15 | DNN | TFL | iPhone13 series |
aiCTX-Synsense | Dynap-CNN | CNN, RNN, Reservoir Computing | SNN | High-speed aircraft, IoT, security, healthcare, mobile |
ARM | Ethos78 | CNN and RNN | TF, TFL, Caffe2, PyTorch, MXNet, ONNX | Automotive |
AIMotive | Apache5 IEP | GoogleNet, VGG16, 19, Inception-v4, v2, MobileNet v1, ResNet50, Yolo v2 | Caffe2 | Automotives, pedestrian detection, vehicle detection, lane detection, driver status monitoring |
Blaize | EI Cano | CNN, YOLO v3 | TFL | Fit for industrial, retail, smart-city, and computer-vision systems |
BrainChip | Akida1000 | CNN in SNN, MobileNet | MetaTF | Online learning, data analytics, security |
BrainChip | AKD500, 1500, 2000 | DNN | MetaTF | Smart homes, smart health, smart city and smart transportation |
CEVA | Neuro-s | CNN, RNN | TFL | IoTs, smartphones, surveillance, automotive, robotics, medical |
Cadence | Tensilica DNA100 | FCC, CNN, LSTM | ONNX, Caffe2, TensorFlow | IoT, smartphones, AR/VR, smart surveillance, autonomous vehicles |
Deepvision | ARA-1 | Deep Lab V3, Resnet-50, Resnet-152, MobileNet-SSD, YOLO V3, UNET | Caffe2, TFL, MXNET, PyTorch | Smart retail, robotics, industrial automation, smart cities, autonomous vehicles, and more |
Deepvision | ARA-2 | Model in ARA-1 and LSTM, RNN, | TFL, Pytorch | Smart retail, robotics, industrial automation, smart cities, |
Eta | ECM3532 | CNN, GRU, LSTM | --- | Smart homes, consumer products, medical, logistics, smart industry |
Gyrfalcon | LightSpeer 2803S | CNN-based, VGG, ResNet, MobileNet; | TFL, Caffe2 | High-performance audio and video processing |
Gyrfalcon | LightSpeer 5801 | CNN-based, ResNet, MobileNet and VGG16, | TFL, PyTorch & Caffe2 | Object detection and tracking, NLP, visual analysis |
Gyrfalcon Edge Server | Janux GS31 | VGG, REsNet, MobileNet | TFL, Caffe2, PyTorch | Smart cities, surveillance, object detection, recognition |
GreenWaves | GAP9 | CNN, LSTM, GRU, MobileNet | TF, Pytorch | DSP application |
Horizon | Journey 3 | CNN, MobileNet v2, EfficientNet | TFL, Pytorch, ONNX, mxnet, Caffe2 | Automotives |
Horizon | Journey5/5P | Resnet18, 50, MobileNet v1-v2, ShuffleNetv2, EfficientNet FasterRCNN, Yolov3 | TFL, Pytorch, ONNX, mxnet, Caffe2 | Automotives |
Hailo | Hailo 8 M2 | YOLO 3, YOLOv4, CenterPose, CenterNet, ResNet-50 | ONNX, TFL | Edge vision applications |
Intel | Loihi 2 | SNN-based NN | Lava, TFL, Pytorch | Online learning, sensing, robotics, healthcare |
Intel | Loihi | SNN-based NN | Nengo | Online learning, robotics, healthcare and many more |
Imagination | PowerVR Series3NX | MobileNet v3, CNN | Caffe, TFL | Smartphones, smart cameras, drones, automotives, wearables |
Imec & GF | DIANA | DNN | TFL, Pytorch | Analog computing in Edge inference |
KoniKu | Konicore | Synthetic biology + silicon | -- | Chemical detection, aviation, security |
Kalray | MPPA3 | Deep network converted to KaNN | Kalray’s KANN | Autonomous vehicles, surveillance, robotics, industry, 5G |
Kneron | KL720 AI | CNN, RNN, LSTM | ONNX, TFL, Keras, Caffe2 | Wide applications from automotive to home appliances |
Kneron | KL520 | Vgg16, Resnet, GoogleNet, YOLO, Lenet, MobileNet, FCC | ONNX, TFL, Keras, Caffe2 | Automotives, homes, industry, and so on |
LeapMind | Efficiera | CNN, YOLO v3, MobileNet v2, Lmnet | Blueoil, Python & C++ API | Homes, industrial machinery, surveillance cameras, robots |
Memryx | MX3 | CNN | Pytorh, ONNX, TF, Keras | Automation, surveillance, agriculture, financial |
Mythic | M1108 | CNN, large complex DNN, Resnet50, YOLO v3, Body25 | Pytorch, TFL, and ONNX | Machine vision, electronics, smart homes, UAV/drones, edge servers |
Mythic | M1076 | CNN, complex DNN, Resnet50, YOLO v3 | Pytorch, TFL, and ONNX | Surveillance, vision, voice, smart homes, UAV, edge servers |
MobileEye | EyeQ5 | DNN | Autonomous driving | |
MobileEye | EyeQ6 | DNN | Autonomous driving | |
Mediatek | i350 | DNN | TFL | Vision and voice, biotech and bio-metric measurements |
NXP | i.MX 8M+ | DNN | TFL, Arm NN, ONNX | Edge vision |
NXP | i.MX9 | CNN, MobileNet v1 | TFL, Arm NN, ONNX | Graphics, images, display, audio |
NVIDIA | AGX Orin | DNN | TF, TFL, Caffe, Pytorch | Robotics, retail, traffic, manufacturing |
Qualcomm | QCS8250 | CNN, GAN, RNN | TFL | Smartphones, tablets, supporting 5G, video and image processing |
Qualcomm | Snapdragon 888+ | DNN | TFL | Smartphones, tablets, 5G, gaming, video upscaling, image and video processing |
RockChip | rk3399Pro | VGG16, ResNEt50, Inception4 | TFL, Caffe, mxnet, ONNX, darknet | Smart homes, cities, and industry, facial recognition, driving monitoring |
Rokid | Amlogic A311D | Inception V3, YoloV2, YOLOV3 | TFL, Caffe2 Darknet | High-performance multimedia |
Samsung | Exynos 2100 | CNN | TFL | Smartphones, tablets, advanced image signal processing (ISP), 5G |
Samsung | HBM-PIM | DNN | Pytorch, TFL | Supercomputer and AI application |
Synopsis | EV7x | CNN, RNN, LSTM | OpenCV, OpenVX and OpenCL C, TFL, Caffe2 | Robotics, autopilot cars, vision, SLAM, and DSP algorithms |
Syntiant | NDP100 | DNN | TFL | Mobile phones, hearing equipment, smartwatches, IoT, remote controls |
Syntiant | NDP101 | CNN, RNN, GRU, LSTM | TFL | Mobile phones, smart homes, remote controls, smartwatches, IoT |
Syntiant | NDP102 | CNN, RNN, GRU, LSTM | TFL | Mobile phones, smart homes, remote controls, smartwatches, IoT |
Syntiant | NDP120 | CNN, RNN, GRU, LSTM | TFL | Mobile phones, smart home, wearables, PC, IoT endpoints, media streamers, AR/VR |
Syntiant | NDP200 | FC, Conv, DSConv, RNN-GRU, LSTM | TFL | Mobile phones, smart homes, security cameras, video doorbells |
Think Silicon | Nema PicoXS | DNN | ---- | Wearable and embedded devices |
Tesla | FSD | CNN | Pytorch | Automotives |
Verisilicon | VIP9000 | All modern DNN | TF, Pytorch, TFL, DarkNet, ONNX | Can perform as intelligent eyes and intelligent ears at the edge |
Untether | TsunAImi | DNN, ResNet-50, Yolo, Unet, RNN, BERT, TCNs, LSTMs | TFL, Pytorch | NLP, inference at the edge server or data center |
UPMEM | UPMEM-PIM | DNN | ----- | Sequence alignment of DNA or protein, genome assembly, metagenomic analysis |
Research Group | Name | Power (W) | Process (nm) | Area (mm2) | Precision INT/FP * | Performance (TOPS) | E. Eff. (TOPS/W) | Architecture | Reference |
---|---|---|---|---|---|---|---|---|---|
TSMC + NTHU | 0.00213 | 22 | 6 | 2, 4, 8 | 0.418 | 195.7 | PIM | [236] | |
TSMC | 0.037 | 22 | 0.202 | 4, 8, 12, 16 | 3.3 | 89 | PIM | [237] | |
TSMC | 0.00142 | 7 | 0.0032 | 4 | 0.372 | 351 | PIM | [238] | |
Samsung + GIT | FORMS | 66.36 | 32 | 89.15 | 8 | 0.0277 | PIM | [239] | |
IBM + U Patra | HERMES | 0.0961 | 14 | 0.6351 | 8 | 2.1 | 21.9 | PIM | [240] |
Samsung + ASU | PIMCA | 0.124 | 20.9 | 1, 2 | 4.9 | 588 | PIM | [189] | |
Intel + Cornell U | CAPE | 7 | 9 | 4 | PIM | [241] | |||
SK Hynix | AiM | 6.08 | 1 | PIM | [192] | ||||
TSMC | DCIM | 0.0116 | 5 | 0.0133 | 4, 8 | 2.95 | 254 | PIM | [190] |
Samsung | 0.3181 | 4 | 4.74 | 4, 8, 16, FP16 | 39.3 | 11.59 | Dataflow | [242] | |
Alibaba + FU | 0.0212 | 28 | 8.7 | 3 | 0.97 | 32.9 | Dataflow | [243] | |
Alibaba + FU | 0.072 | 65 | 8.7 | 3 | 1 | 8.6 | Dataflow | [243] | |
Alibaba | 0.978 | 55 | 602.22 | 8 | Dataflow | [244] | |||
TSMC + NTHU | 0.00227 | 22 | 18 | 2, 4, 8 | 0.91 | 960.2 | PIM | [245] | |
TSMC + NTHU | 0.00543 | 40 | 18 | 2, 4, 8 | 3.9 | 718 | PIM | [246] | |
TSMC + GIT | 0.000350 | 40 | 0.027 | 0.0092 | 26.56 | PIM | [247] | ||
TSMC + GIT | 0.131 | 40 | 25 | 1–8, 1–8, 32 | 7.989 | 60.64 | PIM | [248] | |
Intel + UC | 0.0090 | 28 | 0.033 | 1, 1 | 20 | 2219 | PIM | [249] | |
Intel + UC | 0.0194 | 28 | 0.049 | 1–4, 1 | 4.8 | 248 | PIM | [249] | |
TSMC + NTHU | nvCIM | 0.00398 | 22 | 6 | 2,4 | 5.12 | 1286.4 | PIM | [69] |
Pi2star + NTHU | 0.00841 | 65 | 12 | 1–8 | 3.16 | 75.9 | PIM | [250] | |
Pi2star + NTHU | 0.00652 | 65 | 9 | 4, 8 | 2 | 35.8 | PIM | [251] | |
Tsing + NTHU | 0.273 | 28 | 6.82 | 12 | 4.07 | 27.5 | Dataflow | [252] | |
Samsung | 0.381 | 4 | 4.74 | 4, 8, FP16 | 19.7 | 11.59 | Dataflow | [242] | |
Renesas Electronics | 4.4 | 12 | 60.4 | 13.8 | Dataflow | [253] | |||
IBM | 6.20 | 7 | 19.6 | 2, 4, FP(8,16,32) | 102.4 | 16.5 | Dataflow | [254] | |
Intel + IMTU | QNAP | 0.132 | 28 | 3.24 | 8 | 2.3 | 17.5 | Dataflow | [188] |
Samsung | 0.794 | 5 | 5.46 | 8, 16 | 29.4 | 13.6 | Dataflow | [255] | |
Sony | 0.379 | 22 | 61.91 | 8, 16, 32 | 1.21 | 4.97 | Dataflow | [256] | |
Mediatek | 1.05 | 7 | 3.04 | 3.6 | 13.32 | Dataflow | [257] | ||
Pi2star | 0.099 | 65 | 12 | 8 | 1.32 | 13.3 | Dataflow | [74] | |
Mediatek | 0.0012 | 12 | 0.102 | 86.24 | PIM | [257] | |||
TSMC + NTHU | 0.10 | 22 | 8.6 | 8, 8, 8 | 6.96 | 68.9 | PIM | [258] | |
TSMC + NTHU | 0.099 | 22 | 9.32 | 8, 8, 8 | 24.8 | 251 | PIM | [258] | |
ARM + Harvard | 0.04 | 12 | FP4 | 0.734 | 18.1 | Dataflow | [259] | ||
ARM + Harvard | 0.045 | 12 | FP8 | 0.367 | 8.24 | Dataflow | [259] | ||
TSMC + NTHU | 0.0037 | 22 | 18 | 8, 8, 22 | 0.59 | 160.1 | Dataflow | [260] | |
STMircroelectronics | 0.738 | 18 | 4.24 | 1, 1 | 229 | 310 | Dataflow | [261] | |
STMircroelectronics | 0.740 | 18 | 4.19 | 4, 4 | 57 | 77 | Dataflow | [261] | |
MediaTek | 0.711 | 12 | 1.37 | 12 | 16.5 | 23.2 | PIM | [262] | |
TSMC+ NTHU | 16 | 8 | 98.5 | PIM | [263] | ||||
Renesas Electronics | 5.06 | 14 | 8 | 130.55 | 23.9 | Dataflow | [264] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alam, S.; Yakopcic, C.; Wu, Q.; Barnell, M.; Khan, S.; Taha, T.M. Survey of Deep Learning Accelerators for Edge and Emerging Computing. Electronics 2024, 13, 2988. https://doi.org/10.3390/electronics13152988
Alam S, Yakopcic C, Wu Q, Barnell M, Khan S, Taha TM. Survey of Deep Learning Accelerators for Edge and Emerging Computing. Electronics. 2024; 13(15):2988. https://doi.org/10.3390/electronics13152988
Chicago/Turabian StyleAlam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. "Survey of Deep Learning Accelerators for Edge and Emerging Computing" Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988
APA StyleAlam, S., Yakopcic, C., Wu, Q., Barnell, M., Khan, S., & Taha, T. M. (2024). Survey of Deep Learning Accelerators for Edge and Emerging Computing. Electronics, 13(15), 2988. https://doi.org/10.3390/electronics13152988