A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications

Beltrán-Escobar, Miguel; Alarcón, Teresa E.; Rumbo-Morales, Jesse Y.; López, Sonia; Ortiz-Torres, Gerardo; Sorcia-Vázquez, Felipe D. J.

doi:10.3390/a17110476

Open AccessArticle

A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications

by

Miguel Beltrán-Escobar

¹

,

Teresa E. Alarcón

²,

Jesse Y. Rumbo-Morales

²

,

Sonia López

²

,

Gerardo Ortiz-Torres

²

and

Felipe D. J. Sorcia-Vázquez

^2,*

¹

Academic Division of Industrial Mechanics, Emiliano Zapata Technological University of the State of Morelos, Emiliano Zapata 62760, Mexico

²

Computer Science and Engineering Department, University of Guadalajara, Ameca 46600, Mexico

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(11), 476; https://doi.org/10.3390/a17110476

Submission received: 19 August 2024 / Revised: 19 October 2024 / Accepted: 21 October 2024 / Published: 24 October 2024

(This article belongs to the Special Issue AI and Computational Methods in Engineering and Science)

Download

Browse Figures

Versions Notes

Abstract

:

The evolution of low-cost embedded systems is growing exponentially; likewise, their use in robotics applications aims to achieve critical task execution by implementing sophisticated control and computer vision algorithms. We review the state-of-the-art strategies available for Tiny Machine Learning (TinyML) implementation to provide a complete overview using various existing embedded vision and control systems. Our discussion divides the article into four critical aspects that high-cost and low-cost embedded systems must include to execute real-time control and image processing tasks, applying TinyML techniques: Hardware Architecture, Vision System, Power Consumption, and Embedded Software Platform development environment. The advantages and disadvantages of the reviewed systems are presented. Subsequently, the perspectives of them for the next ten years are present. A basic TinyML implementation for embedded vision application using three low-cost embedded systems, Raspberry Pi Pico, ESP32, and Arduino Nano 33 BLE Sense, is presented for performance analysis.

Keywords:

embedded system; image processing; mobile robotic; TinyML

1. Introduction

The daily use of smart devices is a growing trend that arises from the population’s need to carry out domestic and work activities with greater efficiency, safety, precision, and comfort. These qualities increase individual productivity, improve organizational teamwork, and optimize time spent completing tasks.

There is a pressing need to solve open problems in the academic and industrial fields. These include executing repetitive or insecure tasks with greater precision and in less time without putting human and material resources at risk. This need is a key motivator for the research community to develop microelectronic systems that make up an optimized hardware architecture, also known as embedded systems, which support artificial intelligence (AI) algorithms such as machine learning (ML).

In this sense, size, data storage capacity, processing speeds, and power consumption are key hardware characteristics for the optimal embedded system selection.

Currently, efficient embedded systems have several applications: monitoring and controlling the temperature inside a room [1,2], and regulation of blood pumping in an artificial heart valve in real time [3], among others. However, the mentioned systems have a high economic cost due to the complexity of the hardware and the complexity of the algorithms of these embedded systems. In addition to that, there is high energy consumption. Powerful microprocessors, dedicated graphics processing units, FPGAs, and digital signal processors are used to improve the computational performance of the embedded systems. However, this solution elevates the cost and the power consumption [4].

For robotics systems, in particular, in the case of dynamic control systems, data processing, and embedded systems for computer vision, the implementation of low-cost embedded systems allows for solving problems related to optimal dynamic response, image processing, recognition, and tracking objects and obstacle detection in real time, among others [5,6,7,8]. In terms of power consumption, it has been proven that high-performance embedded systems require more expensive and complex power systems, such as high-capacity batteries. Sometimes, the power systems’ weight and volume are unsuitable for small or medium-sized robots.

In the last twenty years, there has been a trend in the embedded computer vision for robotics that addresses problems in controlling mobile and industrial robots using FPGAs [9,10]. FPGAs are embedded systems that allow a robot to solve tasks efficiently and quickly in real time and, above all, autonomously. However, regarding economics, incorporating FPGA and vision sensors with high-performance power systems tends to increase financing costs [11,12].

High power consumption, high cost, and the need to be an open-source platform in terms of hardware and software development are troubles that face the community dedicated to the embedded vision systems for robotics. Therefore, researchers in this area have focused on low-cost and open-source hardware architectures in the last five years. Advances in this direction have demonstrated that low-cost hardware-constrained embedded systems are superior when dealing with mobile robotics [13,14]. Hardware-constrained embedded systems are defined as embedded systems whose hardware characteristics are much more limited than other systems, such as computers, FPGAs, and high-end DSPs. However, they can efficiently solve complicated control and monitoring tasks with lower energy consumption, lower costs associated with their manufacturing, and smaller size [15].

One of the main challenges that currently prevails in embedded systems for robotics applications, specifically in those that incorporate elements for artificial vision, such as sensors or cameras, is the optimization of algorithms capable of performing tasks using ML and, in particular, deep learning (DL). These systems are known as embedded vision systems [15,16,17,18].

Applications with ML and DL require particular hardware characteristics compared to traditional systems. Large-scale data processing, even with pretrained models on a computer, demands hardware with high performance. The problem has arisen in the last five years, and it has become a subbranch of the Artificial Intelligence of Things (AIoT) [15,19,20] that deals with the implementation of ML and DL algorithms in low-cost embedded systems known as TinyML and Tiny Deep Learning (TinyDL).

TinyML is conceptually defined as an implementation trend in which traditional algorithms to achieve data processing using classical ML theory are migrated, to be programmed and executed, to Tiny embedded systems in which hardware features are much more limited than the computers and the traditional high-cost embedded systems [21]. This process involves the optimization of many algorithms related to ML and DL, such as pattern recognition and classification, linear regression, and sensor values prediction, among others. TinyML algorithm optimization, including DL, has emerged as a new research area in ML and AIoT. The research community in the mentioned field investigates how to run ML and DL algorithms satisfactorily within Tiny devices and hardware limitations [22].

The main contribution of this work is to provide the research community in the field of computer vision and AIoT, through a state-of-the-art review, with a specific overview of the key hardware and software characteristics that low-cost, hardware-constrained embedded systems must have to enable a basic implementation commonly used in mobile robotics applications. This implementation consists of a real-time image type classification using TinyML and is explained in detail in Section 3. The implementation using TinyML techniques is carried out to demonstrate the feasibility of low-cost hardware-constrained embedded systems for robotic applications using embedded vision. Likewise, in the related work section, the characteristics of each type of hardware and the optimization possibilities of ML and DL algorithms are investigated. In addition, we include an overview of algorithms that allow improvement of computer vision performance using low-cost hardware-constrained embedded systems and the key tools for optimal algorithm development in TinyML applications. This work is divided as follows: This section shows the introduction and the most important concepts in this review. The related work is presented and discussed in Section 2. The discussions and analysis include the advantages and disadvantages of each system based on its hardware architecture and features, vision system, power consumption, and software support. In Section 3, a TinyML implementation for robotics applications using embedded computer vision is presented. A deep neural network (DNN) is especially implemented for object classification on a Tiny device using the TensorFlow Lite and Keras library in Python for training in a computer, as well as the Arduino IDE for Tiny device programming. Raspberry Pi Pico, ESP32, and Arduino Nano 33 BLE Sense devices execute a DNN using the Iris database. In Section 4, a projected future trend for low-cost embedded vision systems for robotics applications is addressed. In Section 5, the conclusions are presented.

2. Related Work

This section summarizes the state of the art related to embedded vision systems. We classify them and emphasize the main trends.

For the state-of-the-art review, four important analysis criteria are established to classify embedded systems for real-time image processing tasks using robotics applications. The classification criteria are the following:

Classification by System on a Chip (SoC) CPU Architecture.
Classification by implementing vision systems.
Classification by power consumption rate.
Classification by programming language and platform support (OS).

2.1. Classifications by CPU Architecture

One of the most outstanding technological advances in the last ten years, among many other fields of research, in the area of materials science and nanotechnology, has been the improvement of semiconductor devices and the discovery of more efficient materials when providing electronic mobility and the miniaturization capacity that obeys the theory established by Gordon Moore [23].

In the field of embedded systems dedicated to data processing and the execution of complex mathematical–logical operations on a large scale, the advent of new technologies has triggered the embedded systems industry to focus its interests on the development of new hardware architectures which, due to electronic miniaturization in semiconductors, solve problems such as very intense processing capacity with high performance, low energy consumption when executing different tasks, and the possibility of being programmed and even reconfigured through the reduction to a more straightforward set of instructions [22].

Research such as [22] demonstrated the efficient application in the mobile robots control assisted by computer vision that implements techniques dealing with single-layer and multi-layer artificial neural networks (ANNs) [3], also known as dense layer ANNs, using microcontrollers with hardware architectures of 32-bit, as well as 64-bit microcomputers.

It is proven that low-cost embedded systems for mobile robotics applications have several advantages and capabilities over commercially known and powerful ones such as FPGA Development Boards and dedicated graphics environment processors [24,25,26]. One of the key points for embedded systems dedicated to the execution of critical tasks such as real-time controllers, fuzzy logic systems, and image processing, among others, is the possibility of executing one or several of these tasks simultaneously and in a very close approximation to real time [9,27]. For image processing, in the embedded computer vision for robotics applications, it is important to keep the execution of tasks very close to real time without sacrificing the results of robotic system performance when the image processing is running [28,29].

The differences between the architectures of embedded systems with microprocessors, microcontrollers, and FPGAs can cause an unbalanced appreciation approach when performing analysis, and comparing them among others in real-time applications for large-scale data control and processing each type of architecture, from its particular constitution, allows us to equate performance and processing methods to obtain the same results. While microprocessors and microcontrollers are capable of executing a set of instructions that allows the implementation of mathematical logic operations at high speed (depending on the clock speed), FPGAs, in contrast, are capable of synthesizing a hardware architecture by reconfiguring its internal elements, primarily logic tables or logic blocks, which contain hundreds or thousands of logic gates interconnected in a specific way.

According to the CPU architecture, we establish the following classification:

Advanced RISC Machine (ARM) systems.
Reduced Instruction Set Computer version five (RISC-V) systems.

Innovative architectures such as RISC-V and ARM [30], specially designed for low-cost and low-energy consumption embedded systems, have managed to position themselves as the most widely used architectures in the field of dynamic systems control, embedded vision, and real-time image processing applications carried out with Tiny devices [31]. However, there are some important differences between these two CPU architectural configurations in embedded systems.

Table 1 shows important characteristics of embedded systems with 8-bit, 32-bit, and 64-bit configurations designed with an ARM architecture. Similarly, Table 2 presents the same characteristics classification for embedded systems designed with RISC-V architecture.

Table 1 and Table 2 summarize the general features that denote differences between ARM and RISC-V architectures. It should be noted that 8-bit microcontrollers do not support the RISC-V architecture due to their restrictive and limited hardware features. To our knowledge, no reliable bibliographic references show which year the TinyML approaches and their applications address several tasks in the AI field using algorithms that deal with ANN. However, the first implementation using constrained embedded systems for applications of TinyML considered 32-bit and 64-bit architecture devices. The 8-bit architectures for TinyML applications were not contemplated until 2015 [21] due to the limited capacity of their hardware, such as memory usage or clock speed [32]. In 2015, with the contributions of several works such as [21,32,33], the techniques for implementing TinyML in embedded systems with 8-bit architectures demonstrated reliable results.

An important point regarding the hardware comparisons in this review is that the ARM and RISC-V architectures are based on the same ISA (Instruction Set Architecture) philosophy. They differ because ARM is used in privately licensed embedded systems, such as devices manufactured by the STMicroelectronics company (Geneva, Switzerland). RISC-V is an architecture that has ventured into open source platforms as an efficient, low-cost, and powerful alternative for brands such as Espressif Systems, Arduino, and Rasberry Pi.

It is necessary to mention a history fragment to understand the evolution of hardware architectures and explain why architectures such as RISC-V are considered novel, improved versions of ARM architecture.

In the second part of the 1980s, the research community dedicated to developing devices with high performance and low power consumption in computer control embedded applications turned to look at the new approach that consisted of using the ARM architecture, implemented in commercial computer microprocessors since the mid-1970s, in embedded systems with hardware-constrained. This trend increased all throughout the 1990s. The ARM architecture allowed the implementation of several experimental prototypes and, subsequently, massive manufacturing of CPU and microcontrollers used in the industry dedicated to the development of data processing systems and complex mathematical calculations carried out by several electronic systems such as industrial controllers, medical diagnosis, and semi-autonomous transportation, among others.

From the 1980s until 2002, microcontrollers and Tiny unit processing for real-time embedded vision and dynamic control systems within mobile robotics applications were manufactured using ARM architecture [34]. Afterward, the massive use of the ARM architecture for 8-bit microcontrollers and success in deploying processing complex logic-mathematical tasks, such as embedded vision and real-time control implementation carried out by Tiny devices, allowed the development of 32-bit ARM Cortex embedded devices, which led to the implementation of TinyML applications aimed at solving problems in mobile robotics applications [35].

Embedded systems based on Cortex-M microcontrollers are built under the ARM architecture, with Cortex-M microcontrollers. Applications for digital signal filtering, embedded computer vision, proportional integral derivative (PID) controllers, and real-time image processing are implemented in these kinds of systems [36,37,38]. However, ten years ago, the lresearch community turned again to a new optimized architecture specially designed to deploy highly complex logic-mathematical tasks, such as real-time image processing algorithms and AIoT implementations. This architecture is called RISC-V.

One of the first characteristics observed, along with others mentioned, that made the RISC-V architecture more attractive over the ARM architecture is the lower power consumption in applications with high processing demand [39].

On the other hand, numerous research works address a comparative analysis of the performance of embedded devices built under RISC-V and ARM architectures subjected to the same test scenarios. Recent works show the implementation of Crypto Algorithms [40], using RISC-V and ARM architectures, to demonstrate the performance and fast processing of complex mathematical problems. In particular, in [41], it is described as a redundant number system based on a finite field accelerator for the SIKEp434 specification. A similar approach is presented in [42]. Other works have taken up the analysis of the implementation of TinyML algorithms for convolutional neural networks (CNNs) [43] to compare the number of clock cycles that devices built with the RISC-V and ARM need to carry out the computational task related to CNN.

However, although ARM architecture is still present and dominant in commercial and technological applications related to computer vision made on Tiny embedded devices, the RISC-V architecture has the advantage over ARM because it uses an open-source philosophy. While the ARM architecture has a proprietary and licensed instruction set architecture (ISA), the RISC-V architecture has an open-source ISA based on a reduced instruction set computing (RISC) that allows the academic experimentation under no restrictive license and very successful implementation of TinyML algorithms for solving several complex problems present in the computer vision, dynamic control systems and robotics applications.

Table 3 provides some examples of RISC-V embedded systems, and Table 4 provides some examples of ARM embedded systems. The selected devices were reported within the state-of-the-art applications involving real-time image processing with hardware-constrained embedded vision systems. For each device, Table 3 and Table 4 address the following hardware features: CPU SoC, flash memory, Static Random Access Memory (SRAM), the CPU clock frequency, and principal applications.

The CPU with RISC-V architecture is optimized for AIoT applications, including implementing real-time image processing tasks and TinyML algorithms. Conceptually, TinyML is a trend focused on executing classification and inferencing tasks through machine learning algorithms based on techniques such as support vector machine and neural networks in embedded systems with hardware-constrained [15,44,45].

The RISC-V architecture allows the deployment of neural-network-optimized models for predicting and inferring tasks in control and image classification systems. In robotics applications, these tasks are useful for tracking, avoiding, and detecting objects.

On the other hand, the CPU with Cortex-M that incorporates RISC-V architecture allows a CPU clock frequency of over 250 MHz, which could increase the system’s performance; however, this configuration increases power consumption and cost.

The TinyML trend supported by classical ML theory has allowed a significant advance within the field of mobile robotics applications by making it possible for embedded systems that govern the operation of a robot to be capable of performing computer vision tasks, such as real-time image processing, as well as the control of robot actuators. In this sense, one of the most outstanding achievements since 2020 [15,46,47] is the implementation of a general-purpose artificial neural network (ANN) to achieve patterns classification in real-time image processing, linear regression, or value predictions expected from sample sensor signals, as well as intelligent PID controllers implemented on microcontrollers [48].

In the Tiny high processing area, specifically in the embedded vision field and real-time control systems implementations, there are embedded systems that in the last three years have obtained high relevance within the research community in the mobile robotic applications, as well TinyML, which are those designed as a hybrid or intermediate step between microcontrollers and digital signal processors, such as the ESP32 devices in their different types and ranges [49]. In particular, the systems using a 32-bit SoC with dual-core architectures allow image processing algorithms to be executed in real time [50,51], maintaining good performance and low power consumption without compromising other critical tasks that need to be executed on the same system [52]. Works such as [53,54,55] show applications in mobile robotics for detecting and monitoring objects in real-time and security systems, as well as the Internet of Things.

Remark 1.

The number and description of the data used for training the DNN can provide information about the system’s real behavior, which is relevant when developing ML and TinyML applications. Increasing the amount of data and their type means that the model obtained also has a larger size. This last point must be considered when the DNN is executed on hardware-constrained embedded systems.

Remark 2.

An important issue discussed in the TinyML implementation section is the requirements that the embedded systems selected in this work must meet when deploying TinyML algorithms since this new trend requires that the system have a high capacity for temporary file storage that contains trained and optimized models. The reader will see that the size of the SRAM in each device plays a crucial role in the implementation of TinyML, and it must meet the requirement of being minimal of a specific size.

Remark 3.

The architecture of the ESP32 boards has allowed experimental applications to be carried out in the last five years for the implementation of classification neural networks, such as real-time image processing, regression, and prediction, likewise, in the case of real-time automatic control systems and estimation of variables that are difficult to measure using sensors and processing complex mathematical models. In [56,57] are reported results when identifying objects by color and edges using image classifiers through algorithms with neural networks using microcontrollers from the ESP32-S3. Other works, such as [58], show the use of ESP32 microcontrollers to run neural network models for regression and prediction of control signals. Figure 1 shows the ESP32 Wroom 32 block diagram and the ESP32-S3 block diagram.

Table 3 shows major and typical boards such as Freenove ESP32-S3-WROOM CAM, ESP32 S3-EYE, and Seeed Studio XIAO ESP32S3 Sense that allow implementations of real-time image processing, real-time discrete control systems, TinyML, and TinyDL of very large function in mobile robotics applications.

Within the boards recently used in AIoT implementations, boards are based on a Kendryte K210 System on a Chip (SoC) processor, such as the PyAI-K210 board. The Kendryte K210 is an emergent and new type of AIoT SoC processor built under the RISC-V architecture. The SoC incorporates a 64-bit dual-core CPU, which makes it optimal for programming and standalone executed convolutional neural network (CNN) and deep neural network (DNN) algorithms for real-time image classification tasks [49]. Figure 2 shows the K210 SoC block diagram.

Another board that gains relevance in AIoT implementations is the Sipeed MAIX-II M2 dock based on a 64-bit single-core Cortex-M7 CPU. This SoC is also built under RISC-V architecture, as is the case of a PyAI-K210 board. The Sipeed MAIX-II M2 dock board is optimal for convolutional neural network (CNN) algorithms implementations and image classifications for mobile robotics applications [49].

In the case of embedded systems with 64-bit architectures, such as Raspberry Pi board, several works addressed the study and implementation of algorithms using the Raspberry Pi that execute real-time image processing and control tasks, as well as detection-tracking object through image classifications with Python and ANN trained using TensorFlow and TensorFlow Lite [54,59].

Publications such as [60,61] review the different applications that possibly use the Raspberry Pi platform and program the image processing algorithms in Python by incorporating the OpenCV library.

The Raspberry Pi board, designed under an ARM architecture, allows heavy image processing and the possibility of multiprocessing and multithreading applications. These programming techniques are necessary for image processing and other critical tasks running on the same embedded processing environment, such as robotic actuator control.

Applications such as the objects detection for their identification and object detection-tracking are used in mobile robotics [11,62,63,64,65], as well as the patterns detection in human faces [55] using different versions of Raspberry Pi board, such as the Pi 3B+ and Pi 4, mainly. These applications are also implemented in security or warning systems and the Internet of Things, such as those reported in [13]. Applications such as those described in [3] use the Raspberry PI 3B+ and Raspberry Pi 4 boards to detect and track moving objects by color and edges; the latter addresses the problem of image segmentation for object detection by color, performing conversions between different color spaces. Figure 3 depicts the ARM Cortex M7 SoC incorporating the Raspberry Pi 4 board.

2.2. Classification by the Sensor Implemented in the Vision System

According to the sensor implemented in the vision system, we divided the embedded vision system into two types:

Web Cam systems;
OV2640 Color CMOS systems.

Table 5 and Table 6 illustrate the given classification and show the hardware characteristics that allow the interconnection of a webcam and vision sensor to achieve image capture in real time. The systems can capture image frames per second and process them depending on the hardware characteristics. The frame capture rate determines whether the system can process video in real time or only one frame in an elapsed time.

Figure 4 and Figure 5 show an example of real-time video capture using the two main devices mentioned in the vision system classification. Systems that implement a webcam, such as the Raspberry Pi board, among others, can display data on the video image in real time, such as the coordinates in pixels of the location of the center of an object (in this case, a blue sphere), as well as the frames per second (FPS) and the total time in seconds that it takes for the algorithm to perform the detection for each frame thoroughly. However, for embedding that is compatible with the OV2640 system, such as the ESP32 or the Raspberry Pi Pico, among others, it is only possible to view the detected objects and highlight them in real time without being able to place the coordinates time and FPS data.

It is known by experts in the area of embedded vision that the performance results obtained when performing real-time image processing tasks, such as object detection and classification, as well as many others related to security systems, AI applications, and TinyML, among others, depending on the algorithmic architecture with which the image processing actions are being carried out. However, the image capture device’s characteristics can also take a relevant role in performing the tasks mentioned above.

For applications that use embedded systems to solve tasks involving image processing in real time, as happens in mobile robotics, it is important to mention aspects such as the number of frames per second that the system can process, even the amount of grayscale tones or colors, as well as edges and reliefs, is determined by the quality of the sensor that captures the image and brings the entire spectrum of data composed of color spaces, sizes, shapes, or fine details like edges and textures.

Previous works, such as those by [14,64], have used a 1280 × 720-pixel resolution webcam for image capture and the Raspberry Pi model 3B+ for data processing. The image processing algorithm achieved good object detection performance due to the webcam’s capabilities. Similarly, in [66], they utilized a 1080p full-HD camera with a pixel resolution 640 × 480, leading to accurate line edge detection. It is worth noting that these works executed the image processing algorithm on a computer, not an embedded system, highlighting the difference in hardware limitations.

Many others reported using the OV2640 Color CMOS UXGA vision sensor to capture images and carry out processing tasks within hardware-constrained embedded systems because the OV2640 device allows communication with the processing system in a native way through I2c interfaces or SPI protocol. In addition, the OV2640 Color CMOS UXGA vision sensor has low power consumption, which makes it attractive for mobile robotics applications where the system must have long autonomy. The power of the mentioned sensor depends on lithium or Li-Po batteries.

2.3. Classification by Power Consumption Rate

As a result of our review, we detected two types of embedded vision systems considering the power consumption rate. They are the following:

Embedded vision system with power consumption rate below 2 watts.
Embedded vision system with power consumption rate greater than 2 watts.

Table 7 and Table 8 present a comprehensive summary of various hardware-constrained embedded vision systems’ electrical and power consumption characteristics. The power consumption values were estimated based on the nominal operating voltages and consumption currents under typical operating conditions, providing a reliable basis for comparison. However, it is important to note that these values are subject to change depending on the task and application. For example, image processing and TinyML applications often lead to increased CPU energy consumption due to the high number of mathematical calculations and critical decision tasks.

In Table 7 and Table 8, devices that do not have a sleep mode function do not consume the nominal operating current all the time. However, depending on the required performance, they can achieve minimum consumptions of up to 10 percent of the maximum operating current rating. Performance depends on the number of critical tasks performed and the computational load of each. In Table 7 and Table 8, the Electrical Operating Parameters column indicates the following four important electrical parameters:

Vs: Supply voltage;
Idn: Current demand in normal operating mode;
Ids: Current demand in very low power mode (deep sleep);
Vop: Operating voltage.

In applications such as mobile robotics, it is important to know the consumption of the central embedded system for processing and control and what type of battery is the most suitable for each one. Table 7 and Table 8 indicate whether the device is compatible with a battery and the type. The types of batteries most used to provide power to the embedded systems listed in the tables are the following:

Lithium-ion battery (Li-ion);
Lithium polymer battery (Li-Po);
Nickel-metal battery (Ni-MH);
Sealed lead-acid battery (VRLA).

Several applications designed for solving tasks that involve real-time image processing, monitoring of critical control actions, and TinyML in mobile robotics systems use low-cost and hardware-constrained embedded systems. In the field of academic, commercial, and industrial trends, devices with ARM CPUs, such as the Cortex M based on single-core and dual-core architectures, dominate the spectrum of prototypes used in Space Aerospatial applications, life support medical equipment, army, and industrial robotics, among others [67,68].

In [69], a set of established promising practices is proposed to reduce power consumption in hardware-constrained embedded systems designed for implementation in ARM Cortex M microcontrollers with single-core and dual-core architecture. Specifically, they propose an implementation in two single-core systems subjected to executing the same set of tasks simultaneously and the same computationally difficult executed tasks. On their scenery, one core operates under optimal stable frequency and voltage conditions, while the two single-core systems have a core subjected to executing the same algorithm under unstable operating frequency and voltage conditions. The results show that, although there is an imminent imbalance in the operating conditions, the ARM Cortex M architectures are robust and manage to compensate for anomalies and achieve similar energy consumption in both cores. Arduino Portenta H7 has a Cortex M type CPU, specifically the 32-bit dual core STM32H747XI (M7 + M4), which makes it a candidate for the implementation of methodologies such as those proposed in [69].

Works such as [70,71] explore the efficient use of SRAM and flash memory resources within the new RISC-V architectures to optimize power consumption in systems designed under these architectures. The results reveal that the correct use of memory resources does reduce power consumption. It should be noted that the strategy of improving power consumption by optimizing memory use is much easier in RISC-V architectures than in ARM Cortex M. ESP32-S3 has a RISC-V architecture, which gives it the possibility of using the techniques described in the state-of-the-art for optimization of power consumption.

2.4. Classification by Embedded Software Platform Develop Environment

This section analyzes the different programming languages, framework-based platforms, and environments used for developing algorithms on embedded systems with high computational demands. Depending on the integrated development environment (IDE) and other available features, the community dedicated to embedded software systems development can access several tools and successfully achieve many applications in computer vision and robotics, such as ML algorithms and real-time control tasks, among others, programmed and executed into Tiny capable embedded devices.

The tools that allow the deployment of algorithms that solve tasks with a demand for logical and mathematical operations and processing large amounts of data without compromising the high performance of the embedded in real-time applications, image processing, and TinyML are closely related to the use of a Real-time Operating System (RTOS) designed especially for hardware-constrained embedded systems [72,73,74].

According to the language framework compatibility on the device and considering the CPU architecture, we divided the embedded vision systems into four categories:

C and C++.
Assembly.
Python.
Rust.

In mobile robotics applications, it is of the utmost importance to maintain the optimal performance of the robotic system and ensure compliance with the assigned objective. These objectives can range from object tracking, pattern identification, and balancing a robot on two wheels to rapid response to changes in the conditions of the robot’s operating environment. The complexity in the development of the algorithms is directly linked to these objectives. Therefore, the number of tools available in each development environment, such as the programming language, plays a crucial role in determining each algorithm’s scope while solving ML and real-time computer vision tasks.

Thus, finding a balance between optimal performance and complexity in the development of each algorithm and compatibility between the hardware, programming language, and development platform for each embedded system is correct.

2.4.1. Embedded Devices Supported by C and C++ Language

Table 9, Table 10, Table 11 and Table 12 provide more specific details about the programmable language compatibility with each type of embedded system architecture as well as the available libraries, IDEs, frameworks, and Embedded Software Platform for TinyML and embedded vision applications.

Table 9 shows the architectures capable of supporting the execution of TinyML tasks using embedded systems whose architectures are capable with C and C++ programming languages. Implementing algorithms for TinyML involves, among many other applications, programming tasks for real-time image processing, such as critical system control tasks.

It is important to remember that many applications that do not compromise system performance concerning real-time algorithm executions use an RTOS that runs on the embedded along with the application-specific algorithm. After Table 9, Table 10, Table 11 and Table 12, a more specific analysis of the RTOS compatible with each architecture, programming language, and the advantages and/or disadvantages that each RTOS has is carried out.

In the case of C and C++ language, embedded systems designed under the architectures used in TinyML and vision applications support several libraries written in C and C++ language that are incorporated into ML algorithms. This is the case of TensorFlow Lite, whose main function, within the algorithm programmed in a Tiny device, is the execution of a file that contains coded information resulting from a training stage.

Works like [75] et al. allow us to observe an extensive and detailed TinyML implementation methodology using an ESP32 microcontroller to infer sensor and wind speed measurements using neural networks. The authors of [75,76] also describe the use of TensorFlow Lite and Eloquent TinyML libraries written in C++ language, as well as the Arduino IDE, for the implementation of a trained neural network model using real measurement data. Additionally, the authors retrospectively examine the advantages and disadvantages of implementing an algorithm for TinyML using a platform oriented to programming in the C++ language. They concluded that the main benefit is the performance, precision, and inference speed the trained neural network model achieves when executed on the ESP32 microcontroller because the architecture is designed especially for environments based on C++ language. On the other hand, the main disadvantages presented [75,77,78] are the extensive consumption of the internal RAM and ROM memories of the ESP32 when storing the data of the trained model and the logic decisions and complex mathematical computing calculations carried out when executing the ANN algorithm. Likewise, the necessary integration of different environments and languages, such as Google Colab and Python, is required to pretrain the ANN model on the computer, convert the model file into a readable extension for the ESP32, and, finally, load the algorithm and the trained model into the ESP32.

Works such as [79,80,81,82] explain TinyML implementations using IDEs, EmSPS, libraries, and extensions, including methodologies. According to the cited works, the main advantages are the TinyML applications based on algorithms programmed in C++, the efficient use of clock frequency processors, and its high precision in inference and classification tasks in the case of ANN algorithms and embedded vision. In contrast, the main disadvantage is the high use of the device’s internal memories.

2.4.2. Embedded Devices Supported by Assembly Language

According to Table 10, we can observe the existence of IDEs for developing codes for embedded systems that are programmed with Assembly and compatible with 8-, 32-, and 64-bit architectures. However, although it is a very low-level programming style that is optimal for applications in which it is necessary to use the system’s hardware resources efficiently, there are no stable developments of libraries for AI and TinyML applications in Assembly. Research works such as [83] show the development of a methodology for designing software for embedded systems in applications in which the speed and versatility of code production play a crucial role in production costs without compromising the product’s performance. In this sense, the combination of code fragments programmed with a high-level language, such as C++, was sought to produce rapid design changes within the code, and the use of Assembly to optimize execution speed and use of internal memory resources when the application required it. In [84], from a perspective of the inherent difficulty of developing code in Assembly language, strategies that allow the teaching of Assembly among the student community are analyzed to promote the development of libraries useful in real-life applications such as system control and IoT.

Table 10. Assembly language supported on different embedded systems platforms.

Embedded System Type	Embedded Software Platform Supported (EmSPS)	Architecture Supported	Libraries and Extensions Compatibility	Operative System (OS) Supported	Device Name
8 bit Architectures	MPLAB X IDE, Microchip Studio	ARM, Hardvare	N/A ¹	N/A ¹	PIC18F, dsPIC, AVR ATMega
32 bit Architectures	MPLAB X IDE, STMCube IDE	ARM, RISC, RISC-V	N/A ¹	FreeRTOS	STM32 and STM nucleo 64 microcontrollers, dsPIC
64 bit Architectures	STMCube IDE, VSC	ARM, RISC, RISC-V	N/A ¹	FreeRTOS, Linux	ARM Cortex M7, Cortex A53 quad-core 64-bit/Cortex-A72 quad-core

¹ N/A: Not Available information.

2.4.3. Embedded Devices Supported by Python Language

Table 11 summarizes the EmSPS capabilities with IDEs and libraries for embedded systems that support Python. Likewise, Table 11 presents the architectures and devices that can be programmed with Python for TinyML and embedded vision applications. The devices under 8-bit architecture can not support any EmSPS based on Python because Python is an interpreted language that is not compiled, unlike C++.

Table 11. Python language supported on different embedded systems platforms.

Embedded System Type	Embedded Software Platform Supported (EmSPS)	Architecture Supported	Libraries and Extensions Compatibility	Operative System (OS) Supported	Device Name
8 bit Architectures	N/A ¹	N/A ¹	N/A ¹	N/A ¹	N/A ¹
32 bit Architectures	VSC, Spider, IDLE, Thonny, PyCharm, Sublime Text, Atom, Jupyter Notebook, Thonny	ARM, RISC, RISC-V	OpenCV, PIL, TensorFlow Lite, numpy	FreeRTOS, Raspbian	Raspberry Pi Pico, ESP32, ESP32-S3, MAIX-II M2 dock, Arduino Portenta H7
64-bit Architectures	VSC, Spider, IDLE, Thonny, PyCharm, Sublime Text, Atom, Jupyter Notebook, Thonny	ARM, RISC, RISC-V	OpenCV, PIL, TensorFlow Lite, numpy	Raspbian, Ubuntu	Raspberry Pi, Jetson Nano

¹ N/A: Not Available information.

In this sense, in terms of embedded systems programming, it is understood that the C++ language is a reduced abstraction between the language and the hardware, that is, between the programming language that the designer embedded software develops and the machine language that the embedded system allocates, interprets, and executes, just like the Assembly language is. On the other hand, Python, being a high-level and interpreted language, requires that the hardware restrictions in an embedded system allow the implementation of algorithms with an operating frequency above 40 MHz and a capacity of a memory size more significant than 200 KB. These expectations are difficult to achieve for systems with 8-bit architectures.

However, the possibilities differ for embedded systems with 32-bit and 64-bit architectures. Table 12 mentions the devices with the most integrated vision capabilities used in TinyML applications and with the best performance executing algorithms programmed in Python.

The amateur and academic community has developed and supported the vast development of available information and libraries thanks to Python’s stability and data dynamism. This fact leads to the generation of the development of a specific platform for programming embedded systems called MicroPython [82].

MicroPython is Python for embedded systems. MicroPython retains the same structure and syntax rules as Python, including libraries such as “numpy”, “pandas”, “OpenCV”, and “TensorFlow Lite”, among others; see Table 12.

Table 12. The 32- and 64-bit architecture devices capable of deploying TinyML algorithms programmed in Python for embedded systems (MicroPython).

Device	IDE and Framework Supported	Hardware Features
Arduino Nano BLE Sense	VSC, Thonny	CPU SoC: nRF52840, Clock Speed: 64 MHz, Flash Memory: 1 MB, SRAM: 256 KB, EEPROM: none
Arduino RP2040 Connect	VSC, Thonny	CPU SoC: Dual Core Arm Cortex-M0+, Clock Speed: 133 MHz, SRAM: 264 KB
Arduino Portenta H7	VSC, Thonny	CPU SoC: STM32H747XI dual Cortex-M7+M4 low power Arm MCU, Clock Speed: 480 MHz, +240 MHz, SRAM: 264 KB
Raspberry Pi Pico	VSC, Thonny	CPU SoC: Dual Core Arm Cortex-M0+, Clock Speed: 133 MHz, SRAM: 264 KB
ESP32 and ESP32-S3	VSC, Thonny	CPU SoC: Dual-core XTensa LX7/Dual-core XTensa LX7, Clock Speed: 240 MHz, SRAM: 512 KB/1 MB
Raspberry Pi 3B, 4B	VSC, Thonny	CPU SoC: Cortex-A53 quad-core 64-bit/Cortex-A72 quad-core 64-bit, Clock Speed: 1.4 GHz/1.5 GHz, SDRAM: 1 GB LPDDR2/up to 8 GB LPDDR4-3200

2.4.4. Embedded Devices Supported by Rust Language

For the Rust programming language, using this language in the embedded systems field is a new trend that has received particular interest from the community dedicated to embedded software in the last four years. However, as this is a recent trend, the developments and advances currently being made with Rust for developing complex algorithms executed within embedded systems show results at an academic level. Thus, no low- or large-scale commercial and industrial applications using Rust language for real-time image processing and control systems have been reported. On the other hand, there are research projects that are currently completed and others that are in development.

Works such [85] perform a systematic study to holistically understand the current state and challenges of using Rust for embedded systems.

Other works [86] conducted a comparison study on using C++ and Rust programming languages in a memory-constrained embedded system. In this work, an analysis was carried out using both languages (C++ and Rust) to obtain execution time, memory usage, and development time when the algorithms programmed using Rust and C++ are deployed within the hardware-constrained embedded systems, such as artificial engine sound generation and Quick-sort applications on large arrays, such as image processing.

In [87], a review and comparison study between C++ and Rust is presented. The authors in [87] show the predominant advantages of the C++ language for the development of applications with embedded systems, but, in a critical sense, allowing us to see the adaptability and versatility of Rust using native C++ libraries and instructions, as well as future trends on the potential of Rust to be used in applications that require the programming of embedded systems. The state-of-the-art review shows that in the future, Rust could be predominant in embedded vision applications and TinyML for hardware-constrained embedded systems.

As observed in Table 9, Table 10 and Table 11, different types of architectures, as well as programming languages, are designed to support RTOS functions and allow the execution of algorithms with functions and native operations of an RTOS. However, a broad analysis of the comparative implementation aspect can arise from this topic. It is not the purpose of this work to perform and show this analysis; this is reserved for future work; it only shows a classification of the possibilities of using RTOS on different development platforms (hardware and software) for low-cost and hardware-constrained embedded systems.

It is possible to mention the following aspects of interest for the research community: C+ is the language that currently has big support for RTOS, which means that this language continues to rebound in its use in applications in which the performance of the system is not compromised with the real-time operation of the embedded system.

When algorithms execute complicated logical–mathematical calculations and use a large amount of temporary storage in memory, an RTOS should be used to enhance the embedded system’s performance. However, its use is not strictly necessary. In the implementation shown in this work, no RTOS was used since, despite being a TinyML application that could require it due to the large amount of data and RAM that could be used, the extension itself is only for demonstration. The dataset used does not contain a large amount of data. Therefore, the file containing the trained model is not large, which does not mean that

100 %

of the system hardware resources are required for selected embeds.

3. TinyML Implemenatation

In this section, we show a basic TinyML implementation using three different embedded systems reported as state-of-the-art for programming and deploying TinyML algorithms used to resolve control and embedded vision tasks in robotics applications to sustain the theory presented in Section 2.

The Raspberry Pi Pico, ESP32, and Arduino Nano 33 BLE Sense are the embedded systems selected for the TinyML implementation. The technical criteria considered when choosing these devices are the optimal hardware features for TinyML applications and the EmSPS for programming the CPU SoC of each device. Also, it is important to mention that Pi Pico, ESP32, and Nano 33 BLE have a 32-bit CPU architecture, CPU clock frequency of over 60 MHz, and internal RAM size of over 200 KB, which are key hardware features. Regarding the EmSPS for programming the devices, we used the Arduino IDE, which offers numerous tools and functionally stable libraries, among other advantages, for developing optimal TinyML algorithms.

The Raspberry Pi Pico, ESP32, and Arduino Nano 33 BLE Sense were selected, considering the results reported in the state of the art. For the ESP32, works such as [15,17,24,45] make it clear that the RISC and RISC-V architectures that conform to Tensilica Xtensa LX6 and LX7 processors, and an optimal hardware construction of the entire block that makes up the ESP32 (see Figure 1), make it an ideal system for implementation of neural networks, with excellent performance and accuracy results. On the other hand, the Raspberry Pi Pico and Arduino Nano 33 BLE allow the same type of TinyML implementations as the ESP32, as can be observed in [21].

Raspberry Pi Pico and Arduino Nano 33 BLE Sense allow the same type of TinyML implementations as the ESP32 but with differences in the hardware, specifically in the processor clock frequency and RAM size. These systems are becoming popular in mobile robotics applications, especially in those where it is important to maintain low energy consumption due to the forced use of battery banks, and rapid response in the resolution of critical control tasks is of utmost importance. The main goal of the presented TinyML implementation is to show the performance of the three embedded systems programmed using the same EmSPS to deploy a deep neural network (DNN) algorithm. Appendix A shows the pin diagram and the functions assigned to the embedded systems.

3.1. Implementation Methodology

The implementation methodology consists of the following steps:

TinyML algorithm selection on a computer.
Training the selected model on a computer.
Testing and compressing the model on a computer.
Programming, deploying, and testing the trained model on the embedded system.

We use Python version 3.9.9 for programming, and Tensorflow and Keras are the libraries that support the computer’s implementation of the training algorithm. The Tensorflow lite library in Arduino IDE supports deploying the DNN model on the selected embedded systems. The following paragraphs explain each step in detail.

Remark 4.

The computer used to train the DNN and program the embedded systems has the following hardware requirements: CPU Core i5 2.4 GHz, 12 GB RAM, 1 TB solid state hard disk, no graphic board.

TinyML algorithm selection on a computer. First, the dataset is selected. We utilize the Iris dataset [48]. This dataset comprises 150 samples; each is a set of entry data integrated by four measurements describing the flower’s dimensions.

Figure 6 represents the information in the Iris dataset. Each entry dataset corresponds to a set of four values that indicate the length and width of the sepal and the length and width of the petal in centimeters.

Thus, the sepal and petal measurements determine whether the flower is setosa, virginica, or versicolor; there are three classes.

Based on the results obtained in [88], where the authors used four hidden layers with the following distribution: 300, 200, 100, and 16 neurons, respectively, we decided to carry out a similar implementation and use the DNN architecture illustrated in Figure 7.

Other works that support this implementation can be seen in [89,90].

The Python code, which illustrates the statements that contain the libraries and plugins required for loading and data preparation as well as DNN selection and configuration, appears in Figure 8. The Iris dataset is loaded through a CSV file.

The selected DNN architecture is the following:

Number of inputs: 4.
Total number of Layers: 4.
Number of Hidden Layers: 2.
Number of Neurons for Layer (input, output, and hidden): 16.
Activation function: “Relu” (first three layers), “Sigmoid” (out layer).

We modified the configuration implemented in [88], and our architecture of the DNN consists of one input layer, two hidden layers, and one output layer. The first layer is configured with sixteen neurons, the two hidden layers consist of sixteen neurons, and the output layer consists of three neurons (see Figure 7). The statements

t f . k e r a s . S e q u e n t i a l

and

m o d e l . a d d

allow the DNN model configuration; see code in Figure 8.

The basis of neural networks theory establishes many other considerations about DNN architectures, such as the correct activation functions and their dependence on the task performed (classification or regression), the number of hidden layers, and the number of neurons in each layer. The criteria and considerations for selecting the DNN architecture are based on the system type, application, and the desired degree of accuracy in predictions. This implementation does not address a detailed and specific analysis of the parameters and hyperparameters of the selected DNN and the performance of hardware-constrained embedded systems under the selected architecture. However, the discussion on choosing an optimal DNN architecture plays a crucial role, and this analysis opens up exciting possibilities for future works that address optimization issues of TinyML algorithms.

We selected two activation functions: “Relu” for the input and two hidden layers and “sigmoid” for the output layer.

Training the selected model on a computer. Using DL techniques, a training algorithm is programmed for a DNN. The training steps are executed on the computer. The obtained result is a file that contains the trained model. Training (x_train, y_train) and testing data (x_test, y_test) are obtained using the code shown in Figure 8. As a good practice adopted from various reports in the state of the art [48,88], the “one-hot” labeling technique is used.

Figure 9 illustrates an example of the corresponding features (input data) and label matrix (output) for the DNN model obtained as a result of the code given in Figure 8. It should be noted that the input matrix in Figure 6 is used for demonstration purposes in the explanation. Likewise, this matrix is presented in Figure 9; however, for better understanding, it is presented again with the output tags in one-hot format.

Figure 10 depicts the code fragment for executing the training step considering the architecture described in the code snippet shown in Figure 8. The hyperparameters are given with the statement

m o d e l . c o m p i l e

. The numbers of epochs are set within the statement

m o d e l . f i t

. Finally, the training parameterization is as follows:

Loss function: Categorical cross-entropy.
Optimizer: Adam.
Metrics to evaluate: Accuracy.
Epochs: 100.
Batch size: 1.

Before the launch of the training processes, the Iris dataset is divided into two groups. Figure 8 shows how the dataset is divided into 80% for the training (x_train, y_train) and 20% for the validation (x_test, y_test).

The graphical behavior of the DNN model during the training stage is shown in Figure 11, taking into account 100 training epochs.

Figure 12 shows the training process on the computer using Python and TensorFlow.

An important issue when training ANN with TensorFlow and Keras in Python is the numerical format of the dataset. In some cases, this format is

u i n t 64

(64 bits), which turns out to be a data type that would generate trained models with a large size. In the case of this implementation, for the Iris dataset, the data type was changed to

f l o a t 32

with the statements x_train = x_train.astype(‘float32’) and y_train = y_train.astype(‘float32’), which is a 32-bit float type; thus, the trained model results in a smaller size without disregarding precision in the data and hoping that the accuracy of the model when making predictions does not decrease. However, converting the Iris dataset data from

u i n t 64

to

f l o a t 32

allows the compression process to obtain better results by reducing the size of the model file. However, performing the data type conversion directly in the Iris dataset before the training process does not mean that a quantization process is still involved. However, the possibility of doing so remains available.

Testing and compressing the model on a computer. As a first performance test, the classification processes through the DNN are tested on a computer using Python and Keras. However, the most important step is when the DNN is tested by performing the flower-type classification process within the embedded system. Figure 13 depicts the portion of code on which the

m o d e l . e v a l u a t e

statement is executed to obtain the trained model’s accuracy and loss function value at the last epoch executed in the training process. The result in the Python IDLE output terminal is

l o s s

:0.6000,

A c c u r a c y

: 0.966666638.

The final result of the training process is a file, with H5 extension, that contains the trained model. Subsequently, the H5 file is compressed to a file with a tflite extension and then converted to a cc extension file. See Table 13, where the file compression steps and the resulting size in each step are summarized. The cc file is used in the methodology’s next step.

Obtaining the tflite file is an intermediate and necessary step since it is not possible to directly convert the H5 file to the cc file.

However, although the computers and some embedded systems with more powerful hardware, such as Raspberry Pi 4 or FPGAs, are capable of deploying a DNN model trained with a tflite file, the tflite file is not yet suitable to run by our three embedded systems under test (ESP32, PI Pico, Arduino Nano BLE). For that reason, the H5 to tflite to cc files process conversion is necessary. The code fragment to convert the file with H5 extension to tflite is shown in Figure 14.

It should be noted that once the process of converting and compressing the trained model file has been carried out to produce a suitable file format and size to be supported by the embedded system, a computer evaluation and testing step must be carried out again to verify that the accuracy of the model has not changed or that if there was a change, it was not significant.

In addition, the model’s performance is evaluated with the tflite file by introducing input features from the Iris database and verifying the flower type prediction. The portion code used to test, evaluate, and perform the prediction is shown in Figure 15.

As a result of the prediction, the following fragment is obtained in the Python IDLE output terminal:

\begin{matrix} [1.6642942 \times 10^{- 5} & 7.3261708 \times 10^{- 1} & 2.6736629 \times 10^{- 1} \end{matrix}] I r i s - v i r g i n i c a

. Observe that the highest value of the output vector (

7.3261708 e - 01

) corresponds to the Virginica class type; therefore, the prediction class is Virginica.

The following step is converting the tflite file to cc using the instruction “!xxd -i exercise model.tflite - exercise model.cc” in Python, specifically in the Colab environment. Afterward, “!cat exercise model.cc” allows us to obtain an array of data in hexadecimal memory address representation. The array is saved in a file with a “cc” extension, which is much more compact than the tflite and can be recorded within the embedded system that we consider (ESP32, Pi Pico, Nano 33 BLE) and supported in the RAM. All this procedure is shown in Figure 16.

Table 13 summarizes the process of obtaining the files that contain the trained DNN models, as well as the conversion and compression steps to which the model file is subjected for its adaptation in size and compatible format for execution on ESP32, Raspberry Pi Pico, and Arduino Nano 33 BLE Sense systems.

Figure 14 shows the use of the following statements through which the conversion of the file

H 5

to

t f l i t e

is carried out:

c o n v e r t e r = t f . l i t e . T F L i t e C o n v e r t e r . f r o m

_

k e r a s

_

m o d e l (m o d e l)

c o n v e r t e r . o p t i m i z a t i o n s = [t f . l i t e . O p t i m i z e . D E F A U L T]

t f l i t e

_

m o d e l = c o n v e r t e r . c o n v e r t ()

The following statement allows one to optimize the tflite file, as well as the quantization of the model already trained and converted to a tflite file:

c o n v e r t e r . o p t i m i z a t i o n s = [t f . l i t e . O p t i m i z e . D E F A U L T]

The previous instructions allow the quantization process in this compression step. Quantization is one of the main methods to reduce model size. It consists of converting the weights and activations of the floating precision (32-bit) model to lower precision values, typically 8-bit integers (

u i n t 8

). This process considerably reduces the space needed to store the model weights and makes it ready for the third step of the compression process: converting from a tflite file to a cc file.

An important aspect when creating a TinyML application is, as mentioned, the memory size that the trained model file will require, specifically in the RAM of the embedded system. As seen in Table 13, the tflite file has a size of less than 5000 KB; with this size, the model can be executed on a conventional computer or even on high-performance embedded systems such as the Raspberry Pi 4 or 5, and other equivalent embedded systems with RAM sizes greater than 10 MB. ESP32, Pi Pico, and Arduino Nano 33 BLE have a RAM bigger than 200 KB, while the CC extension model has a size of 5.2 KB.

Running it out without quantization is possible in the third step of the compression process, as in the conversion step from H5 to tflite. Thus, it enables the use of the resulting model and its files in the implementation of the DNN in the selected embedded systems without having the problem of loss of accuracy in data prediction with the DNN due to a quantization process; the statements for this compression step are shown in Figure 16. However, under another scenario, which considers a larger dataset that generates data that, due to their type, use more memory space, and with a denser DNN architecture and other training hyperparameters, there is a possibility that the file size of the trained model is not supported into RAM of an embedded system that executes the DNN. Therefore, data quantization will have to be used after training in the file compression stage, and an analysis of the possible loss of accuracy with the model obtained will have to be performed.

Programming, deploying, and testing the trained model on the embedded system.

The hardware feature that is key for a Tiny embedded system to deploy ML models is mainly the RAM size, which supports and allocates the compressed training model, i.e., the cc file. It is a fact that the RAM size for the three embedded systems is smaller than 520 KB; this becomes a limitation to implementing TinyML with the file of the tflite model resulting from the training since, in some cases, the file size is up to 4000 KB. In contrast, the compressed file size in our experiment is 5.2 KB, as shown in Figure 16. Thus, it is important to show the relevant hardware features of the three embedded systems tested in our demonstrative implementation.

It should be noted that carrying out a training process with a larger dataset, with a greater amount of data and characteristics, and with more demanding training parameters and hyperparameters will generate the final result of the training as a file to which it must be assigned a larger space in RAM than other small ones, to achieve its execution.

However, it is correct to assume that training with a more complete dataset, DNN architecture with a greater number of layers and neurons, as well as more demanding training parameters and hyperparameters, will generate trained models with a higher degree of accuracy in the prediction. One of the most outstanding challenges in the field of TinyML is to achieve more accurate predictions with a certain embedded system using more optimized, compact files that allow each embedded system to execute a TinyML application in real time without having to meet high requirements and or expectations for the size of the SRAM.

The analysis and discussion on file compression techniques for trained models used in TinyML applications is a great topic, which is addressed within the line of research for optimization of embedded systems in the field of artificial intelligence, with works such as [21,32,33,48,56,57,58,63,75,81,91], addressing the topic and bridging the gap for other related research in this field.

However, the main objective of the implementation in this work is not to show the results of the execution of the DNN with models optimized using different compression techniques for execution in hardware-constrained embedded systems.

Likewise, when presenting Table 14, Table 15 and Table 16, emphasis will be placed on compliance with the SRAM size requirement in each system, to support models trained under a context that involves generating increasingly accurate models that do not compromise compliance, to not exceed sizes and to allow each system to operate in real time.

It is a fact that training the DNN with a greater amount of data will generate a model with a larger file size; thus, performing a compression process on the model file without compromising the degree of accuracy in the prediction was a challenge achieved and presented in this implementation.

Table 14 shows important hardware features that must be considered for implementing TinyML algorithms. However, other considerations taken into account to select the embedded systems shown in Table 14 are mentioned below. Although the Pi Pico and the ESP32 are dual-core systems, it is not necessary to carry out the implementation using the capabilities of both cores to distribute the processing; in this way, the comparison between the three embedded systems is fair when performing a deployment of TinyML in a single core.

On the other hand, the operating frequencies of the CPU SoCs could generate differences between the execution times in each case. This aspect is explored in Table 15. As can be seen in Table 14, the RAM size of the Pi Pico and Arduino Nano 33 BLE does not exceed 260 KB and 520 KB for ESP32, which is a key aspect when considering that the implementation of TinyML requires that the embedded system has sufficient memory to allocate memory address and host a file with the trained model of the DNN that can have a size of up to 20 KB.

Figure 17 and Figure 18 describe important hardware details of the ESP32, Raspberry Pi Pico, and Arduino Nano 33 BLE Sense.

Figure 19 summarizes the TinyML implementation process on the Arduino IDE platform, considering the Iris dataset and the compressed hexadecimal model content in the data.h instance.

The data.h instance is a fragment of code that exclusively contains the hexadecimal data array from the cc file obtained shown in Figure 16 and the size in bytes that it uses from RAM within the embedded system.

Figure 20 indicates the libraries to be called in Arduino IDE and the invocation function for Tensor Arena.

The tensor-arena statement (see Figure 21) assigns space in the RAM of the embedded systems selected in this review. This instruction is not required in systems that do not have hardware restriction problems, such as FPGAs and computers.

The main function programmed within ESP32, Raspberry Pi Pico, and Arduino Nano 33 BLE Sense to DNN deploy appears in Figure 21. As mentioned above, we used the Arduino IDE, which supports the development of embedded software in C++.

3.2. Results Deploying the DNN in Embedded Systems

Our example implementation is based on the work published in [88]. To carry out the implementation test on the theESP32, Pi Pico, and Arduino Nano 33 BLE systems, it is necessary to place within an Arduino IDE code snippet with the same data array content in the exercice model.cc file shown in Figure 16.

Figure 22 presents a portion of the exercice model.cc, particularly the first three lines, with the statements and syntax required by the Arduino IDE environment. The file is a more extensive data array, about 380 lines of hexadecimal data, and is copied directly from the cc file obtained in Figure 16 using Colab and pasted into the data.h instance. An important aspect of achieving the execution of the DNN within the ESP32, Pi Pico, and Nano 33 BLE is the size in bytes of the exercice model.cc file, which is approximately 5.2 KB. The RAM sizes of our devices are 512 KB for the ESP32, 264 KB for Pi Pico, and 256 KB for Nano 33 BLE. Thus, the Iris-dataset-based DNN implementation using cc file for flower classification on the three embedded systems is supported.

An example of the input dataset to the DNN executing within our embedded system (ESP32, Pi Pico, and Nano 33 BLE) is shown in Figure 23. Through the output terminal of the Arduino IDE, it is possible to interact and provide the three embedded systems with the input data for the DNN and verify the result when executing the flower classification task using the DNN algorithm, whose training model was obtained and compressed in the computer (see the following steps: training the selected model on a computer and programming, deploying, and testing the trained model in our methodology).

Figure 24 shows the results using the ESP32, Pi Pico, and Nano 33 BLE to predict types of flowers using the Iris dataset. The visualization of the results was performed through the output terminal of the Arduino IDE. The information in Figure 24 is interpreted as follows: when the model is deployed on the embedded system, it obtains a set of three values that form an output vector, and each value of the output vector corresponds to a weight assigned to each type of flower; the output vectors are depicted in the “Prediction-ESP32” column in Figure 24. The first component of each vector corresponds to Setosa, the second one to Virginica, and the third to the Versicolor type of flower. The highest weight in each output vector indicates the prediction of the flower type given by the DNN model. In addition, the “Real Label” column (see Figure 24) shows the correct classification.

For example, the prediction vector in the “Prediction-ESP32” column in Figure 24 is [−21.06, −5.13, −2.25]. Focusing on the third weight (−2.25), it is the highest. This result indicates that the type of flower identified by the DNN based on the entry dataset is Versicolor. It should also be noted that in the “Real Label” column, a “1” is activated in the third column, which indicates that the actual type of flower is also Versicolor. Therefore, the prediction made by the DNN with the entry dataset in the first output vector provided a correct flower classification. The same interpretation is used for the rest of the table shown in Figure 24.

It is necessary to mention that the processing speeds of each system are different because they have different hardware characteristics; see Table 14. Table 15 shows the time, in milliseconds, that each embedded system took to deploy the DNN using the same trained model and the same input data.

The clock frequency processor and the memory size are very important features to be considered. In the case of 32- and 64-bit architectures, the features available on the device do not represent a problem since the implementation results have shown that they have sufficient resources for TinyML applications. For hardware-constrained embedded systems with an 8-bit architecture, the clock frequency processor does not represent a limitation to deploying TinyML algorithms. However, the RAM size still represents a big problem when deploying TinyML algorithms. To face the problem of RAM size limitation, the community dedicated to TinyML turns its attention to the optimization and compression of files used in TinyML, as is the case of trained models, to achieve optimal implementations on 8-bit architectures.

Table 15 shows, more specifically, the latency time that the algorithm has since it invokes the trained model, and a prediction is made with a vector of input characteristics. Likewise, the measurement of energy consumed by each embedded system is presented when it is running the algorithm continuously. The RAM allocated for invoking the trained model and the results when predicting the flower type are presented.

As described in Table 16, the Raspberry Pi Pico has lower latency than the ESP32 and Arduino Nano BLE, as well as lower power consumption, when the embedded systems deploy the DNN in real time. However, it is possible that for DNNs with more complicated and larger models, the performance of the Raspberry Pi Pico begins to reduce compared to the ESP32 and Arduino Nano BLE due to the hardware resources that each system has. The analysis of the performance of each embedded system when executing DNNs with more complex models for classification and regression can be addressed in depth in future work.

It should be mentioned that the implementation only shows the execution of the DNN in embedded systems without analyzing its accuracy when performing classifications with inputs different from those contained in the Iris dataset. The optimization of the model is not analyzed from training, as improvements can be achieved throughout the process by choosing another dataset for training, changing the architecture of the DNN, or modifying training parameters and hyperparameters. The performance of the real-time DNN executed on hardware-constrained embedded systems by modifying the training and the model’s characteristics can be analyzed in future works.

4. Future Trends

Regarding the evolution perspectives of the technology that incorporates AI, particularly intelligent systems and devices, for the next 20 years, embedded systems technology will grow, developing devices capable of executing supervised and unsupervised learning algorithms and training complex models.

TinyML emerges as a new trend that seeks to provide a different ML technique implementation alternative. Over time, TinyML is included in an area called AIoT. The philosophy of AIoT, and specifically of TinyML, is to perform ML tasks using algorithms that run locally within the hardware-constrained embedded systems and then use the result for the direct application or share it in another service with more excellent storage and processing capacity [91].

The fields in which TinyML techniques are currently implemented are sign language detection, handwriting recognition, medical face mask detection, gesture recognition, speech recognition, and autonomous mini vehicles, robotics, and dynamic control systems, among other applications. For the subsequent years, the application spectrum covered by TinyML will be greater in areas where optimal performance of tasks supported by ML algorithms is required [92]. On the other hand, according to our review, the possible perspective of evolution for robotics and computer vision applications in the next twenty years is focusing on using hardware-constrained embedded systems with 8-bit, 32-bit, and 64-bit architectures that implement TinyML algorithms.

However, not all development processes in TinyML and dynamic systems control applications using vision systems will be easy. Open problems must be resolved so that the systems achieve expected results much better than those currently available.

Supporting more complex trained model files in hardware-constrained embedded systems presents challenges. The problem is focused not on increasing the hardware capabilities of the system, specifically RAM, but, rather, on making the compression of the model file efficient through processes that achieve better results by reducing file size without compromising aspects such as model accuracy and application performance. Tools are being developed to compress trained models through code abstraction, such as TinyML applications, obtaining reduced models capable of running on 8-bit and 16-bit architectures.

Another challenge is oriented to executing tasks in real time, especially in those applications in which object detection is carried out using a vision system. Currently, systems such as the ESP32-S3 and microcontrollers with Cortex-M4 CPU achieve a balance between low cost and acceptable performance, maintaining low energy consumption compared to other systems. However, the latency times of the algorithms generate delays between 50 and 500 milliseconds, which may not be acceptable in critical control and detection tasks.

5. Conclusions

We reviewed the state of resource-constrained embedded vision systems-based TinyML for robotic applications, considering relevant aspects such as hardware features, power consumption, the integrated vision system, and the programming language. We provided an example of the basic implementation of the ML algorithm within an embedded system.

From our review, the current challenges in robotics and embedded vision are focused on optimizing ML models that allow embedded systems to solve tasks more quickly and accurately, with less power consumption and minimizing the extensive use of each system’s hardware resources.

In the implementation carried out in this work, the Arduino environment is used to program source code and deploy it in embedded systems, and the training of the DNN TensorFlow Lite Framework in Python IDLE was required. Arduino IDE and Python are open-source platforms for developing and implementing the DNN algorithm, representing an advantage over other implementations reported in the state-of-the-art due to using environments with useful but licensing tools and expensive hardware architectures with a patent license.

In this work, we used the Iris dataset. Current TinyML trends return to the Iris dataset, among other database types, to test the performance of an embedded system when running an inference, classification, and learning algorithm. In our case, the performance of the Pi Pico, ESP32, and Nano 33 BLE devices is tested by executing a DNN algorithm that calls a file containing the information of the trained model on the computer.

The implementation stage allowed us to test the possibility of running ML algorithms on low-cost, hardware-constrained embedded systems.

Regarding Table 16, we can establish the following conclusive points of the analysis:

Latency time: ESP32: This microcontroller has acceptable performance for running AI models, although it may have slightly higher latencies due to its lower processing capacity compared to the Raspberry Pi Pico and Nano BLE. Raspberry Pi Pico: With its dual-core and higher clock speed, the Pico can handle inference tasks with slightly lower latency. Nano BLE: There is latency between the ESP32 and the Pico, balanced for inference tasks in embedded systems.

Energy consumed: ESP32: Tends to consume more power due to its built-in Wi-Fi and Bluetooth, which may be active even if unnecessary for the specific task. Raspberry Pi Pico: Its simpler, more energy-efficient design results in lower power consumption during inference. Nano BLE: Intermediate power consumption, optimized for processing tasks and BLE connectivity.

Allocated RAM: ESP32: Has more RAM available than the Pico, allowing us to allocate more memory to the model, although this may not be critical for a simple model like the Iris. Raspberry Pi Pico: With less RAM available, the Pico can handle simpler models or models optimized for resource-constrained devices. Nano BLE: Intermediate RAM, sufficient for most models trained with TensorFlow Lite.

The analysis of the performance of the used DNN model is an aspect that will be addressed in future work. With regard to the selected embedded systems, it is possible to conclude the following:

ESP32: Ideal if processing capacity and a low prediction error rate are prioritized, although at the cost of higher energy consumption. Raspberry Pi Pico: Best choice for applications that require low latency and low power consumption, with a slight trade-off in accuracy. Arduino Nano BLE 33 Sense: Balance between energy efficiency, processing capacity, and connectivity, suitable for applications where good performance with BLE connectivity is needed.

In addition, future work includes studying other ML algorithms to solve more complex classification and regression problems in embedded computer vision and system control applications and using other embedded systems with enhanced hardware features to carry out different applications in TinyML and AIoT.

Author Contributions

Conceptualization, M.B.-E. and T.E.A.; methodology, J.Y.R.-M. and G.O.-T.; software, M.B.-E. and T.E.A.; validation, M.B.-E., T.E.A. and F.D.J.S.-V.; formal analysis, M.B.-E. and T.E.A.; investigation, M.B.-E. and S.L.; resources, S.L. and G.O.-T.; writing—original draft preparation, M.B.-E. and T.E.A.; writing—review and editing, F.D.J.S.-V. and J.Y.R.-M.; visualization, M.B.-E. and F.D.J.S.-V.; supervision, F.D.J.S.-V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OS	Operative Systems
RISC	Reduced Instruction Set Computer
CISC	Complex Instruction Set computer
RTOS	Real-Time Operative System
IDE	Integrated Development Environment
EmSPS	Embedded Software Platform Support
TinyML	Tiny Machine Learning
TinyDL	Tiny Deep Learning
DNN	Deep Neural Network
ANN	Artificial Neural Network
CPU	Central Processing Unit
SoCs	System on a Chip
RAM	Random Access Memory
SRAM	Static Random Access Memory
ROM	Read Only Memory
EEPROM	Electrically Erasable Programmable Read-Only Memory
mW	Milliwatts
mA	Milliamperes
MHz	Megahertz
KB	Kilobytes
MB	Megabytes
TF lite	TensorFlow lite

Appendix A. Pinout Diagrams

The pinout configurations of the three embedded systems selected for the TinyML implementation are shown in Figure A1, Figure A2 and Figure A3. It is important to know the physical configuration of the General Purpose Input/Output (GPIO) peripheral pins of the boards on which the embedded systems are located since, in future mobile robotics and TinyML applications, the acquisition of signals and the generation of the signals will be required themselves when deploying control algorithms on the driving parts of a robot or the interconnection of sensors.

Figure A1. ESP32 Pinout configuration.

Figure A2. Raspberry Pi Pico pinout configuration.

Figure A3. Arduino Nano 33 BLE Sense pinout configuration.

References

Rowe, A. A Second Generation Low Cost Embedded Color Vision System. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 21–23 September 2005; pp. 1–7. [Google Scholar]
Rowe, A.; Rosenberg, C.; Nourbakhsh, I. A low cost embedded color vision system. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 1, pp. 208–213. [Google Scholar] [CrossRef]
Kumar, V.; Wang, Q.; Wang, M.; Rizwan, S.; Ali, S.; Liu, X. Computer Vision Based Object Grasping 6DoF Robotic Arm Using Picamera. In Proceedings of the 2018 4th International Conference on Control, Automation and Robotics (ICCAR), Auckland, New Zealand, 20–23 April 2018; pp. 1–10. [Google Scholar] [CrossRef]
Hadoune, O.; Benouaret, M. ANFIS multi-tasking algorithm implementation scheme for ball-on-plate system stabilization. Indones. J. Electr. Eng. Inform. (IJEEI) 2022, 10, 983–995. [Google Scholar] [CrossRef]
Juang, L.H.; Zhang, S. Intelligent Service Robot Vision Control Using Embedded System. Intell. Autom. Soft Comput. 2019, 25, 451–458. [Google Scholar] [CrossRef]
Yu, J.; Sun, F.; Xu, D.; Tan, M. Embedded Vision Guided 3-D Tracking Control for Robotic Fish. IEEE Trans. Ind. Electron. 2015, 63, 355–363. [Google Scholar] [CrossRef]
Kazemian, A.; Khoshnevis, B. Real-time extrusion quality monitoring techniques for construction 3D printing. Constr. Build. Mater. 2021, 303, 124520. [Google Scholar] [CrossRef]
Jafri, R.; Ali, S.; Arabnia, H. Computer Vision-based object recognition for the visually impaired using visual tags. In Proceedings of the 2013 International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV’13), Las Vegas, NV, USA, 22–25 July 2013; pp. 1–7. [Google Scholar]
Benabid, S.; Latour, L.; Poulain, S.; Jaafar, M. FPGA-Based Real-Time Embedded Vision System for Autonomous Mobile Robots. In Proceedings of the 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, 4–7 August 2019; pp. 1093–1096. [Google Scholar] [CrossRef]
Zhang, Y.J. An Overview of Image and Video Segmentation in the Last 40 Years. In Advances in Image and Video Segmentation; IGI Global: Hershey, PA, USA, 2006; pp. 1–16. [Google Scholar] [CrossRef]
Chakravarthy, K.K.; Krishna, M.V. Color Objects Detection in Real-time with Raspberry Pi and Image Processing. Samriddhi 2021, 13, 5–7. [Google Scholar] [CrossRef]
Du, S.; Ibrahim, M.; Shehata, M.; Badawy, W. Automatic License Plate Recognition (ALPR): A State of the Art Review. IEEE Trans. Circuits Syst. Video Technol. 2012, 23, 311–325. [Google Scholar] [CrossRef]
Sumardi, S.; Taufiqurrahman, M.; Riyadi, M.A. Street Mark Detection Using Raspberry PI for Self-driving System. Telkomnika 2018, 16, 629–634. [Google Scholar] [CrossRef]
Jalil, A.; Matalangi, M. Object motion detection in home security system using the binary-image comparison method based on robot operating system 2 and Raspberry Pi. ILKOM J. Ilm. 2021, 13, 1–8. [Google Scholar] [CrossRef]
Zim, M.Z.H. TinyML: Analysis of Xtensa LX6 microprocessor for Neural Network Applications by ESP32 SoC. arXiv 2021, arXiv:2106.10652. [Google Scholar]
Naik, S.; Patel, B. Machine Vision based Fruit Classification and Grading—A Review. Int. J. Comput. Appl. 2017, 170, 22–34. [Google Scholar] [CrossRef]
Saha, S.S.; Sandha, S.S.; Srivastava, M. Machine Learning for Microcontroller-Class Hardware: A Review. IEEE Sens. J. 2022, 22, 362–390. [Google Scholar] [CrossRef] [PubMed]
Kuninti, S. Backpropagation Algorithm and its Hardware Implementations: A Review. J. Phys. Conf. Ser. 2021, 1804, 012169. [Google Scholar] [CrossRef]
Fu, K. Syntactic pattern recognition and applications. Proc. IEEE 1983, 71, 1231. [Google Scholar]
Desurmont, X.; Wijnhoven, R.; Jaspers, E.; Caignart, O.; Barais, M.; Favoreel, W.; Delaigle, J.F. Performance evaluation of real-time video content analysis systems in the CANDELA project. In Proceedings of the IS&T/SPIE Symposium on Electronic Imaging, San Jose, CA, USA, 16–20 January 2005; pp. 200–211. [Google Scholar]
Warden, P.; Situnayake, D. Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2019. [Google Scholar]
Estrebou, C.A.; Fleming, M.; Saavedra, M.D.; Adra, F. A Neural Network Framework for Small Microcontrollers. In Proceedings of the CACIC 2021 UNSa, Salta, Argentina, 4–8 October 2021; pp. 51–60. [Google Scholar]
Theis, T.N.; Wong, H.S.P. The End of Moore’s Law: A New Beginning for Information Technology. Comput. Sci. Eng. (IEEE) 2017, 16, 41–50. [Google Scholar] [CrossRef]
Ray, P. A Review on TinyML: State-of-the-art and Prospects. J. King Saud Univ.-Comput. Inf. Sci. 2021, 34, 1595–1623. [Google Scholar] [CrossRef]
Asadi, K.; Ramshankar, H.; Pullagurla, H.; Bhandare, A.; Shanbhag, S.; Mehta, P.; Kundu, S.; Han, K.; Lobaton, E.; Wu, T. Vision-based integrated mobile robotic system for real-time applications in construction. Autom. Constr. 2018, 96, 470–482. [Google Scholar] [CrossRef]
Song, Z.; Yao, J.; Hao, H. Design and implementation of video processing controller for pipeline robot based on embedded machine vision. Neural Comput. Appl. 2021, 34, 2707–2718. [Google Scholar] [CrossRef]
Abbood, W.T.; Abdullah, O.I.; Khalid, E.A. A real-time automated sorting of robotic vision system based on the interactive design approach. Int. J. Interact. Des. Manuf. (IJIDeM) 2019, 14, 201–209. [Google Scholar] [CrossRef]
Marroquín, A.; Garcia, G.; Fabregas, E.; Aranda-Escolástico, E.; Farias, G. Mobile Robot Navigation Based on Embedded Computer Vision. Mathematics 2023, 11, 2561. [Google Scholar] [CrossRef]
Jin, S.; Cho, J.; Dai Pham, X.; Lee, K.M.; Park, S.K.; Kim, M.; Jeon, J.W. FPGA Design and Implementation of a Real-Time Stereo Vision System. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 15–26. [Google Scholar]
Santana, J.; Pachiana, G.; Markwirth, T.; Sohrmann, C.; Fischer, B.; Matschnig, M. Evaluating the feasibility of a RISC-V core for real-time applications using a virtual prototype. In Proceedings of the Design and Verification Conference and Exhibition, San Jose, CA, USA, 28 February–3 March 2022; pp. 1–7. [Google Scholar] [CrossRef]
Lu, T. A Survey on RISC-V Security: Hardware and Architecture. arXiv 2021, arXiv:2107.04175. [Google Scholar]
Immonen, R.; Hämäläinen, T. Tiny Machine Learning for Resource Constrained Microcontrollers. J. Sens. 2022, 2022, 7437023. [Google Scholar] [CrossRef]
Banbury, C.; Zhou, C.; Fedorov, I.; Matas, R.; Thakker, U.; Gope, D.; Janapa Reddi, V.; Mattina, M.; Whatmough, P. Micronets: Neural Network Architectures for Deploying Tinyml Applications on Commodity Microcontrollers. In Proceedings of the 4th MLSys Conference, San Jose, CA, USA, 5–9 April 2021. [Google Scholar]
Eggimann, M.; Mach, S.; Magno, M.; Benini, L. A RISC-V Based Open Hardware Platform for Always-On Wearable Smart Sensing. In Proceedings of the 2019 IEEE 8th International Workshop on Advances in Sensors and Interfaces (IWASI), Otranto, Italy, 13–14 June 2019; pp. 169–174. [Google Scholar]
Ungurean, I. Timing Comparison of the Real-Time Operating Systems for Small Microcontrollers. Symmetry 2020, 12, 592. [Google Scholar] [CrossRef]
Eski, A.; Zavalani, O.; Komici, D. The Implementation of a Digital Filter-Based Algorithm for Active Power Measurement using an ARM Microcontroller. Eur. J. Electr. Eng. Comput. Sci. 2017, 1, 1–3. [Google Scholar] [CrossRef]
Taysi, Z.C.; Yavuz, A.G.; Guvensan, M.A.; Karsligil, M.E. In situ image processing capabilities of ARM-based micro-controllers. J. Real-Time Image Process. 2012, 9, 111–125. [Google Scholar] [CrossRef]
Prasath, A.; Satheesh, A. Implementation of Real Time Data Acquisition System with ARM. In Proceedings of the 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 19–20 March 2015; pp. 1–5. [Google Scholar]
Suárez, D.; Almeida, F.; Blanco, V. Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs. J. Supercomput. 2024, 80, 12771–12789. [Google Scholar] [CrossRef]
Adomnicai, A.; Peyrin, T. Fixslicing AES-like Ciphers: New bitsliced AES speed records on ARM-Cortex M and RISC-V. Trans. Cryptogr. Hardw. Embed. Syst. 2020, 2021, 402–425. [Google Scholar] [CrossRef]
Roy, D.B.; Fritzmann, T.; Sigl, G. Efficient Hardware/Software Co-Design for Post-Quantum Crypto Algorithm SIKE on ARM and RISC-V based Microcontrollers. In Proceedings of the ICCAD ’20: Proceedings of the 39th International Conference on Computer-Aided Design, Virtual Event, 2–5 November 2020; pp. 1–9.
Saarinen, M.J.O. SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions. Cryptology ePrint Archive, Paper 2019/936. IEEE/ACM Int. 2019, 1–15. Available online: https://eprint.iacr.org/2019/936 (accessed on 20 October 2024).
Dokic, K.; Mikolcevic, H.; Radisic, B. Inference speed comparison using convolutions in neural networks on various SoC hardware platforms using MicroPython. International Conference on Recent Trends and Applications in Computer Science and Information Technology. In Proceedings of the RTA-CSIT 2021, Tirana, Albania, 21–22 May 2021; pp. 1–7. [Google Scholar]
Schizas, N. TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review. Future Internet 2022, 14, 363. [Google Scholar] [CrossRef]
Freitas, C.M.S. Artificial Neural Networks Applied to Power Management in Low Power Microprocessors. Preprints 2021, 1–16. [Google Scholar] [CrossRef]
Kouanou, A.T. Real-Time Image Compression System Using an Embedded Board. Sci. J. Circuits Syst. Signal Process. 2019, 17, 81–86. [Google Scholar] [CrossRef]
Sharma, K. Deep Learning-based ECG Classification on Raspberry Pi Using a TensorFlow Lite Model Based on PTB-XL Dataset. Int. J. Artif. Intell. Appl. (IJAIA) 2022, 13, 55–61. [Google Scholar] [CrossRef]
Funk, F.; Bucksch, T.; Mueller-Gritschneder, D. ML Training on a Tiny Microcontroller for a Self-adaptive Neural Network-Based DC Motor Speed Controller. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for E mbedded Machine Learning; Communications in Computer and Information Science 1325; Springer: Cham, Switzerland, 2020; pp. 268–279. [Google Scholar]
Torres-Sánchez, E.; Alastruey-Benedé, J.; Torres-Moreno, E. Developing an AI IoT application with open software on a RISC-V SoC. In Proceedings of the 2020 XXXV Conference on Design of Circuits and Integrated Systems (DCIS), Segovia, Spain, 18–20 November 2020; pp. 1–6. [Google Scholar]
Tambe, P.M.; Khade, R.; Shinde, R.; Ahire, A.; Murkute, N. Third Eye: Object Recognition and Tracking System to Assist Visually Impaired People. Int. Res. J. Mod. Eng. Technol. Sci. 2022, 4, 1750–1754. [Google Scholar]
Okokpujie, K.; Okokpujie, I.P.; Young, F.T.; Subair, R.E. Development of an Affordable Real-Time IoT-Based Surveillance System Using ESP32 and T WILIO API. Int. J. Saf. Secur. Eng. 2023, 13, 1069–1075. [Google Scholar] [CrossRef]
Rusimamto, P.W.; Endryansyah, L.A.; Harimurti, R.; Anistyasari, Y. Implementation of arduino pro mini and ESP32 cam for temperature monitoring on automatic thermogun IoT-based. Indones. J. Electr. Eng. Comput. Sci. 2022, 23, 1366–1375. [Google Scholar] [CrossRef]
Finžgar, M.; Podržaj, P. Machine-Vision-Based Human-Oriented Mobile Robots: A Review. Stroj. Vestn.-J. Mech. Eng. 2017, 63, 331–348. [Google Scholar] [CrossRef]
Ryberg, S.; Jansson, J. Real-time object detection robot control. TRITA-ITM-EX 2022, 104, 1–41. [Google Scholar]
Vaidya, O.S.; Patil, R.; Phade, G.M.; Gandhe, S.T. Embedded Vision Based Cost Effective Tele-operating Smart Robot. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 2019, 8, 1544–1550. [Google Scholar]
Sharuiev, R.D. Research of the Characteristics of a Convolutional Neural Network on the ESP32-CAM Microcontroller. MicrosystElectronAcoust 2023, 28, 1–7. [Google Scholar] [CrossRef]
Fedorov, I.; Matas, R.; Tann, H.; Zhou, C.; Mattina, M.; Whatmough, P. UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units. In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 1–21. [Google Scholar]
Gutti, V.; Karthi, R. Real Time Classification of Fruits and Vegetables Deployed on Low Power Embedded Devices Using Tiny ML. In Third International Conference on Image Processing and Capsule Networks (ICIPCN); Springer: Cham, Switzerland, 2022; Volume 514, pp. 347–359. [Google Scholar] [CrossRef]
Chauhan, S.; Ranka, A.; Khoiwal, N.; Sharma, R.; Tripathi, R. Neurobot: A Neural Network Robot Based on Raspberry Pi. Int. J. Electr. Electron. Eng. (IJEEE) 2017, 9, 236–242. [Google Scholar]
Woolf, M.S.; Dignan, L.M.; Scott, A.T.; Landers, J.P. Digital postprocessing and image segmentation for objective analysis of colorimetric reactions. Nat. Protoc. 2021, 16, 218–238. [Google Scholar] [CrossRef] [PubMed]
León Araujo, H.; Gulfo Agudelo, J.; Crawford Vidal, R.; Ardila Uribe, J.; Remolina, J.F.; Serpa-Imbett, C.; López, A.M.; Patiño Guevara, D. Autonomous Mobile Robot Implemented in LEGO EV3 Integrated with Raspberry Pi to Use Android-Based Vision Control Algorithms for Human-Machine Interaction. Machines 2022, 10, 193. [Google Scholar] [CrossRef]
Pachón-Suescún, C.G.; Enciso-Aragón, C.J.; Jiménez-Moreno, R. Robotic navigation algorithm with machine vision. Int. J. Electr. Comput. Eng. (IJECE) 2020, 10, 1308–1316. [Google Scholar] [CrossRef]
Iodice, G.M. TinyML Cookbook, 2nd ed.; Packt Publishing: Birmingham, UK, 2022; pp. 24–28. [Google Scholar]
Bhowmik, D.; Appiah, K. Embedded Vision Systems: A Review of the Literature. In Applied Reconfigurable Computing. Architectures, Tools, and Applications; Springer: Cham, Switzerland, 2018; pp. 1–12. [Google Scholar]
Abrahams, S. TensorFlow for Machine Intelligence. 2016. Available online: http://bleedingedgepress.com (accessed on 15 May 2024).
Larregay, G.; Pinna, F.; Avila, L.; Morán, D. Design and Implementation of a Computer Vision System for an Autonomous Chess-Playing Robot. J. Comput. Sci. Technol. 2018, 18, e01. [Google Scholar] [CrossRef]
Kwon, H.; Kim, H.; Eum, S.; Sim, M.; Kim, H.; Lee, W.K.; Hu, Z.; Seo, H. Optimized Implementation of SM4 on AVR Microcontrollers, RISC-V Processors, and ARM Processors. IEEE Access 2022, 10, 80225–80233. [Google Scholar] [CrossRef]
Kanter, D. RISC-V OFFERS SIMPLE, MODULAR ISA New CPU Instruction Set Is Open and Extensible. Available online: https://www.semanticscholar.org/paper/RISC-V-OFFERS-SIMPLE%2C-MODULAR-ISA/63fe402257fc961fb4232ea7a87e4b55bec63808 (accessed on 30 May 2024).
Gomez, A.; Pinto, C.; Bartolini, A.; Rossi, D.; Benini, L.; Fatemi, H.; de Gyvez, J.P. Reducing Energy Consumption in Microcontroller-based Platforms with Low Design Margin Co-Processors. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Grenoble, France, 9–13 March 2015; pp. 269–272. [Google Scholar]
Bora, S.; Paily, R. A High-Performance Core Micro-Architecture Based on RISC-V ISA for Low Power Applications. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 2132–2136. [Google Scholar] [CrossRef]
Morales, H.; Duran, C.; Roa, E. A Low-Area Direct Memory Access Controller Architecture for a RISC-V Based Low-Power Microcontroller. In Proceedings of the IEEE 10th Latin American Symposium on Circuits Systems (LASCAS), Armenia, Colombia, 24–27 February 2019; pp. 97–100. [Google Scholar]
Arm, J.; Baštán, O.; Mihálik, O.; Bradáč, Z. Measuring the Performance of FreeRTOS on ESP32 Multi-Core. IFAC Pap. Online 2022, 55, 292–297. [Google Scholar] [CrossRef]
Pothuganti, K.; Haile, A.; Pothuganti, S. A Comparative Study of Real Time Operating Systems for Embedded Systems. Int. J. Innov. Res. Comput. Commun. Eng. 2016, 4, 12008–12013. [Google Scholar]
Chia, K.S. An Integration of Open-Source Resources in Distance Teaching for Real-Time Embedded System Using Arduino Microcontroller and Freertos. Int. J. Integr. Eng. 2022, 14, 194–205. [Google Scholar] [CrossRef]
Hong, C.K.; Abu, M.A.; Shapiai, M.I.; Haniff, M.F.; Mohamad, R.S.; Abu, A. Analysis of Wind Speed Prediction using Artificial Neural Network and Multiple Linear Regression Model using Tinyml on Esp32. J. Adv. Res. Fluid Mech. Therm. Sci. 2023, 107, 29–44. [Google Scholar]
Hayajneh, A.M.; Aldalahmeh, S.A.; Alasali, F.; Al-Obiedollah, H.; Zaidi, S.A.; McLernon, D. Tiny machine learning on the edge: A framework for transfer learning empowered unmanned aerial vehicle assisted smart farming. IET Smart Cities 2023, 6, 10–26. [Google Scholar] [CrossRef]
Kim, E.; Kim, J.; Park, J.; Ko, H.; Kyung, Y. TinyML-Based Classification in an ECG Monitoring Embedded System. Comput. Contin. (CMC) 2023, 75, 1751–1764. [Google Scholar] [CrossRef]
Barberá, A.C. Integración de Redes Neuronales en Sistemas Empotrados. Clasificación de Imagen con RaspberryPi. Ph.D. Thesis, Universidad Politécnica de Valencia, Valencia, Spain, 2020. [Google Scholar]
Montoya-Moraga, A. Tiny Trainable Instruments. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2021. [Google Scholar]
Ostrovan, E. Tinyml On-Device Neural Network Training; Politecnico di Milano, Scuola Di Ingenieria Industriale Edell’Informazione: Milan, Italy, 2022; pp. 1–31. [Google Scholar]
Siderius, J. Teaching (Tiny)ML Using Tangible Educational Methods. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2022. [Google Scholar]
Domínguez, D.R. Developing a Wireless Distributed Embedded Machine Learning Application for the Arduino Portenta H7. Ph.D. Thesis, Polytechnic University of Cataluña, Barcelona, Spain, 2023. [Google Scholar]
Hintz, K.J. Merging C and assembly language in microcontroller applications. J. Microcomput. Appl. 1992, 15, 267–278. [Google Scholar] [CrossRef]
Shan, L. Exploration of Education Reform Based on 32-bit Assembly Language Programming. In Proceedings of the 6th International Conference on Computer Science and Education (ICCSE 2011), Singapore, 3–5 August 2011; pp. 595–599. [Google Scholar]
Sharma, A.; Sharma, S.; Torres-Arias, S.; Machiry, A. Rust for Embedded Systems: Current State, Challenges and Open Problems. arXiv 2023, arXiv:2311.05063. [Google Scholar]
Balakrishnan, A.K.; Nattanmai Ganesh, G. Modern C++ and Rust in Embedded Memory-Constrained Systems. Ph.D. Thesis, University of Gothenburg, Chalmers University of Techlogy, Gothenburg, Sweden, 2022. [Google Scholar]
Borgsmüller, N. The Rust Programming Language for Embedded Software Development. Ph.D. Thesis, Tecnische Hochschule Ingolstad, Ingolstadt, Germany, 2021. [Google Scholar]
SAKR, F. Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices. Ph.D. Thesis, University of Genoa–Queen Mary University of London, London, UK, 2023. [Google Scholar]
Idrissi, I.; Mostafa Azizi, M.; Moussaoui, O. A Lightweight Optimized Deep Learning-based Host-Intrusion Detection System Deployed on the Edge for IoT. Int. J. Comput. Digit. Syst. 2022, 11, 209–216. [Google Scholar] [CrossRef] [PubMed]
Cornetta, G.; Touhafi, A. Design and Evaluation of a New Machine Learning Framework for IoT and Embedded Devices. Electronics 2021, 10, 600. [Google Scholar] [CrossRef]
Sanchez-Iborra, R.; Skarmeta, A.F. TinyML-Enabled Frugal Smart Objects: Challenges and Opportunities. IEEE Circuits Syst. Mag. 2020, 20, 4–18. [Google Scholar] [CrossRef]
Alajlan, N.N.; Ibrahim, D.M. TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications. Micormachines 2022, 13, 851. [Google Scholar] [CrossRef]

Figure 1. Block diagram of ESP32-S3 SoC.

Figure 2. Block diagram of Kendryte K210 SoC.

Figure 3. Block diagram of ARM Cortex M7 SoC.

Figure 4. Example of real-time video capture using a webcam and a Raspberry Pi 4.

Figure 5. Example of real-time video capture using an OV2640 system and an ESP32.

Figure 6. Content of the Iris dataset.

Figure 7. DNN architecture tested in Pi Pico, ESP32 and Arduino Nano 33.

Figure 8. Code snippet in Python loading the Iris dataset and DNN architecture configuration.

Figure 9. Matrix with input features and matrix with output labels for the DNN.

Figure 10. Code Snippet with the statement for training and hyperparameters of the DNN.

Figure 11. Training results. “Loss”: loss function of the training dataset; “val-loss”: validation dataset loss function; “accuracy”: degree of accuracy of the training dataset; “val-accuracy”: degree of accuracy of the validation dataset.

Figure 12. Training process on the computer.

Figure 13. Portion of code with the statement of accuracy and the loss function in Python.

Figure 14. Code snippet with the statement to compress the H5 file and convert it into a tflite file.

Figure 15. Code snippet in Python used to obtain the accuracy and loss when training and evaluating the model on the computer with the model trained in tflite file.

Figure 16. Code snippet for obtaining the .cc file in Colab.

Figure 17. Important hardware parts of selected embedded systems. (a) Arduino Nano 33 BLE Sense Board. (b) Raspberry Pi Pico Board. (c) ESP32 Wroom 32 DevKitc v.1 Board.

Figure 18. ESP32 Wroom32 module parts.

Figure 19. TinyML implementation process.

Figure 20. Code snippet with the headers and libraries required in Arduino IDE for programming the DNN on the ESP32, Pi Pico, and Arduino Nano 33 BLE.

Figure 21. Code snippet with the main function programmed in Arduino IDE to deploy the DNN within the ESP32, Pi Pico, and Arduino Nano 33 BLE.

Figure 22. The exercice model.cc file formed by array of hexadecimal data and contained in data.h instance in Arduino IDE.

Figure 23. Entry dataset for flower type prediction (left) and one-hot tags for the expected prediction (right), in the DNN algorithm deployed on ESP32, Pi Pico, and Nano 33 BLE.

Figure 24. Results when the flower classification process is deployed within ESP32.

Table 1. Characteristics of ARM architectures.

Embedded System Type	Applications Area	General Features	Characteristics
8-bit Architectures	Real-time dynamic control systems, data acquisition, signal generation, FFT analysis	1–16 KB SRAM, up to 256 KB EEPROM, up to 40 MHz clock frequency	Low power consumption, low cost, simple implementation, low performance with high computational processing demand
32-bit architectures	Real-time dynamic control systems, TinyML, signal processing, FFT analysis, real-time embedded vision	16 KB–1 MB SRAM, up to 512 KB EEPROM, up to 480 MHz clock frequency	Medium power consumption, high cost, and low cost, optimal performance with high computational processing demand
64-bit architectures	Smart mobile devices, GPUs, FPGAs, microprocessors, real-time embedded vision	1 MB–up to 64 GB SRAM, up to 16 MB flash, up to 2.5 GHz clock frequency	Very high power consumption, high cost, optimal performance with high computational processing demand

Table 2. Characteristics of RISC-V architectures.

Embedded System Type	Applications Area	General Features	Characteristics
8-bit Architectures	N/A ¹	N/A ¹	N/A ¹
32-bit Microcontrollers	Real-time dynamic control systems, TinyML, signal processing, FFT analysis, real-time embedded vision	16 KB–1 MB SRAM, up to 512 KB EEPROM, up to 480 MHz clock frequency	Low power consumption, high cost, optimal performance with high computational processing demand
64-bit processors	Smart mobile devices, GPUs, FPGAs, microprocessors, real-time embedded vision	1 MB–up to 64 GB SRAM, up to 16 MB flash, up to 2.5 GHz clock frequency	Very high power consumption, high cost

¹ N/A: Not Available information.

Table 3. Embedded systems classification for RISC-V architecture.

Device Name	Processing Hardware System on a Chip (SoC)	Flash	SRAM	CPU Clock Frequency	CV Applications
Freenove ESP32-S3-WROOM CAM Board	32-bit dual core XTensa LX7 RISC-V	8 MB	8 MB	2 × 240 MHz	Robotics, Embedded Vision, Security Systems, AI App
ESP32 S3-EYE	32-bit dual core XTensa LX7 RISC-V	8 MB	8 MB Octal PSRAM	2 × 240 MHz	Robotics, Embedded Vision, Security Systems, AI App
Seeed Studio XIAO ESP32S3 Sense	32-bit dual core XTensa LX7 RISC-V	8 MB	8 MB Octal PSRAM	2 × 240 MHz	Robotics, Embedded Vision, Security Systems, AI App
PyAI-K210	64-bit dual-core Kendryte K210 RISC-V	1 MB	512 KB	2 × 486 MHz	Robotics, Embedded Vision, IA App
Sipeed MAIX-II M2 dock	64-bit single-core Cortex-M7 RISC-V	16 MB SPI Nor Flash	8 MB SRAM	800 MHz	Robotics, Embedded Vision, IA App

Table 4. Embedded Systems classification for ARM Architecture.

Device Name	Processing Hardware System on a Chip (SoC)	Flash	SRAM	CPU Clock Frequency	CV Applications
Raspberry Pi 4	64-bit quad-core ARM Cortex A72 processor	N/A ¹	256 KB	4 × 1.5 Ghz	Robotics, Embedded Vision, App Development, AI App, TinyML
Arduino Portenta H7	32-bit dual core STM32H747XI ARM Cortex (M7 + M4)	8 MB SDRAM	16 MB NOR	M7: 480 MHz/M4: 240 MHz	Robotics, Embedded Vision, IA App.
Pixy2 (PixyCam2.1)	32-bit dual core NXP LPC 4330 ARM Cortex (M4 + M0)	2 MB	246 KB	M4: 204 MHz/M0: 204 MHz	Robotics, Embedded Vision, IA App
Jetson Nano	64-bit Quad-core ARM Cortex-A57 MPCore	16 GB	4 GB	4 × 1.6 GHz	Robotics, Embedded Vision, IA App
OpenMV Cam H7 Plus	32-bit single-core STM32H743II ARM Cortex-M7	32 MB (external), 2 MB (internal)	32 MB SDRAM, 1 MB SRAM	480 MHz	Robotics, Embedded Vision, IA App
Himax WE-I Plus	32-bit single-core ARC EM9D DSP with FPUCore	2 MB	2 MB	400 MHz	Robotics, Embedded Vision, IA App
Sony’s Spresense	32-bit single-core CXD5602 ARM Cortex-M4F 6 Core	8 MB	1.5 MB	156 MHz	Robotics, Embedded Vision, IA App

¹ N/A: Not Available information.

Table 5. Embedded vision system with webcam support.

Device Name	Sensor Type	FPS	Image/Video Resolution	Applications
Raspberry Pi 4	Web Cam 1080p/RaspiCam 4608 × 2592 pixels	7–120	5–10 MegaPixel	Object color recognition, Object edges recognition, tracking object
Jetson Nano	Web Cam 1080p (supported)	120	5–10 MegaPixel	Object color recognition, Object edges recognition, tracking object

Table 6. Embedded vision system with OV2640 Color CMOS support.

Device Name	Sensor Type	FPS	Image/Video Resolution	Applications
ESP32-S3	OV2640 Color CMOS UXGA/OV5640 Color CMOS	15–30	2 MegaPixel	Object color recognition, Object edges recognition, tracking object
Arduino Portenta H7	OV2640 Color CMOS UXGA	15–30	2 MegaPixel	Object color recognition, Object edges recognition, tracking object
Pixy2 (PixyCam2.1)	Aptina MT9M114, 1296 × 976 resolution with integrated image flow processor	30	2 MegaPixel	Object color recognition, Object edges recognition, tracking object
PyAI-K210	OV2640 Color CMOS UXGA/OV5640 CMOS	30	2 MegaPixel	Object color recognition, Object edges recognition, tracking object
Sipeed MAIX-II M2 dock	OV2640 Color CMOS UXGA	60	2 MegaPixel	Object color recognition, Object edges recognition, tracking object

Table 7. Embedded vision system technology with power consumption ≤ 2 watts.

Device Name	Power Consumption	Electrical Operating Parameters	Compatibility to Battery Use	Heat Sink Needs	Approximate Cost in Dollars
ESP32-S3	Up to 0.3 watts (normal operating mode)/0.00068 watts (deep sleep mode)	Vs: 5v, Idn: 68 mA, Ids: 0.000150 mA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH	NO	6.50
Arduino Portenta H7	Up to 1.15 watts (normal operating mode)/0.00335 watts (deep sleep mode)	Vs: 5v, Idn: 230 mA, Ids: 0.67 mA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH	NO	113
Pixy2 (PixyCam2.1)	Up to 0.7 watts (normal operating mode)/No sleep mode function	Vs: 5v, Idn: 140 mA, Ids: NA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH	NO	93
PyAI-K210	Up to 1.05 watts (normal operating mode)/No sleep mode function	Vs: 5v, Idn: 200 mA, Ids: NA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH	NO	75

Table 8. Embedded vision system technology with power consumption > 2 watts.

Device Name	Power Consumption	Electrical Features	Compatibility to Battery Use	Heat Sink Needs	Approximate Cost in Dollars
Raspberry Pi 4	15.3 watts (Max)/0.5 watts (Min)	Vs: 5v, Idn: 3000 mA (Max), Ids: NA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH, VRLA	YES	165
Jetson Nano	Up to 10 watts (normal operating mode)/0.65 watts (Min)	Vs: 15v, Idn: 2400 mA, Ids: NA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH, VRLA	YES	198
Sipeed MAIX-II M2 dock	Up to 2.5 watts (normal operating mode)/No sleep mode function	Vs: 5v, Idn: 500 mA, Ids: NA, Vop: 3.3v	YES/Li-ion, Li-Po, Ni-MH, VRLA	NO	70

Table 9. C and C++ language supported on different embedded systems platforms.

Embedded System Type	Embedded Software Platform Supported (EmSPS)	Architecture Supported	Libraries and Extensions Compatibility	OS	Device Name
8 bit architectures	VSC, PlatformIO, Espressif IDF, Arduino IDE, GO, PIC C Compiler, MikroC, MPLAB, Eclipse, CLE, STM32 Cube IDE, Edge Impulse	ARM, RISC, Harvard	TensorFlow Lite, NanoEdge AI Studio, uTensor	FreeRTOS	PIC18F, dsPIC, AVR ATMega
32 bit architectures	VSC, PlatformIO, Espressif IDF, Arduino, GO, PIC C Compiler, MikroC, MPLAB, Eclipse, CLE, STM32 Cube IDE	ARM, RISC, RISC-V	TensorFlow Lite, OpenCV, NanoEdge AI Studio, uTensor, STM32Cube AI, Edge Impulse, Eloquent TinyML	FreeRTOS, Mbed OS, Zephyr OS, Contiki OS	ESP32, ESP32-S3, XIAO, Sipeed MAIX-II M2 dock, Arduino Portenta H7, Pixy2, OpenMV Cam H7 Plus, Himax WE-I Plus H7, Sony’s Spresense, Arduino Nano 33 IoT, STM32 Nucleo-32, STM64 Nucleo-32
64-bit architectures	VSC, PlatformIO, Espressif IDF, Arduino, GO, PIC C Compiler, MikroC, MPLAB, Eclipse, CLE	ARM, RISC, RISC-V	TensorFlow Lite, OpenCV, Embedded Learning Library (ELL), uTensor, Edge Impulse	FreeRTOS, Mbed OS, Zephyr OS, Contiki OS, Mongoose OS, NuttX, RIOT OS	Raspberry Pi, Jetson Nano

Table 13. Steps for conversion and compression of the trained model for the DNN in Python.

Step	Code Snippet to Implementation	File Extension	File Size
(1) Obtain H5 File	$m o d e l . s a v e (H 5 f i l e)$	$H 5$	5236 KB
(2) Obtain tflite file	$c o n v e r t = t f . l i t e . T F L i t e C o n v e r t e r (H 5 f i l e)$	$t f l i t e$	4156 KB
(3) Obtain cc file	$! x x d - i f i l e . t f l i t e > f i l e . c c$	$c c$	5.2 KB

Table 14. Embedded systems hardware characteristics under TinyML implementation test.

Device	CPU Clock Frequency	Internal Memories Size	Processor (SoC)	Internal CPUs
Raspberry Pi Pico	133 MHz	264 KB RAM, 32 KB ROM size	RP2040 32 bits	Dual Core (2 × ARM Cortex M0+)
ESP32	up to 240 MHz	520 KB RAM, 448 KB ROM size	ESP32-D0WD-V3	Dual Core (2 × Tensilica Xtensa 32 bits LX6)
Arduino Nano BLE 33 Sense	64 MHz	256 KB RAM, 32 KB ROM size	Nordic nRF52480	Single Core (ARM Cortex M4F)

Table 15. Processing times for each embedded system when deploying the flower type classification DNN algorithm.

Device	CPU Clock Frequency	Processing Time in Milliseconds
Raspberry Pi Pico	133 MHz	190
ESP32	up to 240 MHz	153
Arduino Nano BLE 33 Sense	64 MHz	283

Table 16. Importance parameters for each embedded system when deploying model with

e x e r c i c e m o d e l . c c

file and realize one prediction per iteration in real time.

Table 16. Importance parameters for each embedded system when deploying model with

e x e r c i c e m o d e l . c c

file and realize one prediction per iteration in real time.

Characteristic	ESP23	Raspberry Pi Pico	Arduino Nano 33 BLE
Latency Time	50–60 ms	35–45 ms	70–80 ms
Power consumption (instantaneous value)	825–990 mW	660–825 mW	500–600 mW
Demand current in mA (instantaneous value)	250–300 mA	200–250 mA	160–190 mA
Allocated RAM Memory	180 KB	100 KB	110 KB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Beltrán-Escobar, M.; Alarcón, T.E.; Rumbo-Morales, J.Y.; López, S.; Ortiz-Torres, G.; Sorcia-Vázquez, F.D.J. A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications. Algorithms 2024, 17, 476. https://doi.org/10.3390/a17110476

AMA Style

Beltrán-Escobar M, Alarcón TE, Rumbo-Morales JY, López S, Ortiz-Torres G, Sorcia-Vázquez FDJ. A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications. Algorithms. 2024; 17(11):476. https://doi.org/10.3390/a17110476

Chicago/Turabian Style

Beltrán-Escobar, Miguel, Teresa E. Alarcón, Jesse Y. Rumbo-Morales, Sonia López, Gerardo Ortiz-Torres, and Felipe D. J. Sorcia-Vázquez. 2024. "A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications" Algorithms 17, no. 11: 476. https://doi.org/10.3390/a17110476

APA Style

Beltrán-Escobar, M., Alarcón, T. E., Rumbo-Morales, J. Y., López, S., Ortiz-Torres, G., & Sorcia-Vázquez, F. D. J. (2024). A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications. Algorithms, 17(11), 476. https://doi.org/10.3390/a17110476

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review on Resource-Constrained Embedded Vision Systems-Based Tiny Machine Learning for Robotic Applications

Abstract

1. Introduction

2. Related Work

2.1. Classifications by CPU Architecture

2.2. Classification by the Sensor Implemented in the Vision System

2.3. Classification by Power Consumption Rate

2.4. Classification by Embedded Software Platform Develop Environment

2.4.1. Embedded Devices Supported by C and C++ Language

2.4.2. Embedded Devices Supported by Assembly Language

2.4.3. Embedded Devices Supported by Python Language

2.4.4. Embedded Devices Supported by Rust Language

3. TinyML Implemenatation

3.1. Implementation Methodology

3.2. Results Deploying the DNN in Embedded Systems

4. Future Trends

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Pinout Diagrams

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI