1. Introduction
The seminal work of [
1] has proven the ability to convert high-dimensional tensors into low-dimensional representations by training a multi-layer neural network (NN) with a tiny layer in the middle of the topology to approximate the inputs accurately. Since then, artificial intelligence (AI) has been increasingly developed and applied in many ways, thus becoming more and more pervasive in everyone’s daily life [
2].
In the AI and data science domains, machine learning (ML) plays an important role due to its capabilities to learn from and process large amounts of heterogeneous data, thus enabling ML workloads to automatically provide results which are often more accurate than the ones generated by hand-crafted or traditional ML approaches [
3]. Thus, ML has been proven in a wide range of domains, such as image recognition, facial recognition, object detection, natural language processing, and machine translation, among others. ML is also a key technology in autonomous systems like self-driving cars and humanoid robotics, where ML workloads process sensor data constantly acquired in real-time and provide critical decisions which are actuated.
The initial deployment of ML models was on powerful computing devices, as they are typically available in Cloud infrastructures. They are positioned very far from data sources. This approach has been proven to not scale with the ever-increasing number of users over the years. Furthermore, there are a number of other concerns when ML workloads are Cloud-centric, such as the instability of Cloud connectivity in rural, harsh, and remote areas, data privacy, too high computational costs, and power consumption, just to name a few. The awareness of these challenges and the need to find new approaches to achieve ML decentralization on Edge gave rise to the TinyML community, which was established in 2019 [
4]. The TinyML Foundation, at the heart of this community, is a worldwide non-profit organization empowering a community of professionals, academic researchers, students, and policymakers focused on low-power AI at the very edge of the Cloud. Moreover, TinyML is a category of ML that allows models to run on smaller, less powerful devices. It involves hardware, algorithms, tools, and embedded software that can analyze sensor data on these devices with very low power consumption (e.g., under 1 mW), making it ideal for always-on use cases and battery-operated devices.
Thanks to the advent of TinyML in several markets, tiny neural networks (tiny NNs) became increasingly popular, and they have been employed in many embedded applications running on low-power devices, such as image classification use cases developed for MLCommons benchmarks [
5]. They are off-the-shelf micro-controllers (MCUs) running at several hundreds of MHz (the operative core frequency), embedding up to 1 Mbyte of RAM and 2 Mbytes of FLASH memories, for example. The MCU can run tiny NNs that process data at either floating-point precision (fp32) or an integer (e.g., 8 bits) with a range of performances adequate for most of the embedded AI applications currently available on the market.
Despite the small footprint and low cost of these embedded devices, many AI developers still require lots of hand-crafted ML design. Also, the training processes to get accurate tiny NN solutions are costly and time-consuming. Moreover, most of the project time is spent on dataset creation, labeling, and updating.
Since tiny NNs are the essential core technology in Edge AI, unfortunately, they can be easily copied. They represent the intellectual properties (IPs) of many small and medium enterprises (SMEs); therefore, protecting them against unwanted copies is of paramount importance [
6]. This is currently considered a top priority. Therefore, the ML community is required to enable them to achieve such a level of protection.
The application of robust watermarking in neural networks extends beyond theoretical concepts and provides practical benefits for real-world users across various domains. For example, in the field of computer vision, watermarking techniques have been effectively applied to NN models trained with CIFAR-10 to protect intellectual property. A scenario can be envisioned in which a trained neural network for image classification is purchased. By embedding a robust watermark, the origin of the network can be verified, ensuring that the acquired model is genuinely the one intended and protecting against fraudulent replacements. This capability extends to the detection of unauthorized copies; if a stolen or duplicated network is used without permission, the embedded watermark can serve as a unique identifier, allowing rightful ownership to be asserted. This protective mechanism holds particular significance in industries that deploy their proprietary models in small devices, such as the IoT sensors distributed in large areas. Watermarking these models ensures that their origin can be verified regardless of their deployment, thus providing a layer of security and fostering trust throughout the model’s lifecycle.
The possibility of forcing a unique behavior or making NN layer weights unique has been examined using two state-of-the-art methods, which led to the development of two adaptable approaches that can be applied across various Edge AI use cases, particularly within image classification neural networks. These methods were designed to enhance flexibility and robustness, enabling effective deployment in resource-constrained Edge environments. The resulting solutions were tailored to ensure compatibility with typical Edge AI scenarios, addressing challenges such as limited computational resources and data constraints while maintaining high accuracy and responsiveness in image classification tasks using only a few dozen layer weights. Overall, two NNW methods are examined in this study. For the first one, a black-box approach is considered, and three distinct approaches for generating these trigger sets are explored. For the second method, a white-box application is considered, focusing on embedding the watermarks directly into the model’s weights through selective modification of specific bits using layer weight regularizers. All the models extracted with the different combinations considering the use cases and the NNW methods are evaluated by computing adequate metrics and leveraging a third-party methodology, following the MPAI (
https://mpai.community/ (accessed on 13 November 2024)) standardized evaluation procedure under attack simulation.
The main contributions of the paper are summarized as follows:
We define a methodology that allows developers to associate a unique identifier to the NN model that is recognizable to avoid wasting investments;
We define multiple watermarking approaches according to a methodology that can be reproducible;
We investigate the possibility of applying watermarking to a hypo-parametric model (more challenging than applying watermarking to a model with a high number of parameters);
We define a method that requires no extra cost in deploying the device;
Given the industrial importance of preserving the IP, we define a watermarking method that meets this specification.
Before delving deeper into the details, the paper is organized as follows:
Section 2 describes the background knowledge essential to understanding the watermarking process applied to NNs;
Section 3 describes the known approaches in the literature;
Section 4 defines the problem this paper addresses with the definition of some requirements to be fulfilled;
Section 5 describes the use cases considered for the watermarking application;
Section 6 describes the approaches that have been coded to achieve the watermarking of tiny NNs;
Section 7 explains the results achieved in this work with the related interpretations;
Section 8 shows the results achieved through deployability on the MCU;
Section 9 explains the results achieved after some attacks were applied to verify the robustness of the watermarking processes;
Section 10 concludes the paper and introduces some future developments about the continuation of the research.
2. Background on Watermarking
A digital watermark (DW) can be defined as a marker embedded in a noise-tolerant processing algorithm such as the NN pipeline. It can be used to identify IP ownership. More precisely, digital watermarking is the process of hiding digital information, for example, among the hyper-parameters of an NN such that the hidden information embodied shall contain a relation to them. A DW may be used to verify the authenticity or integrity of the NN, which would be intended as IP, or to prove the identity of its creator. It is typically used to help detect copyright infringements. Moreover, the DW should not negatively affect the expected performance in operative conditions and when the copyright infringement does not need to be detected. If the DW distorts the NN performance, it is considered less effective and intrusive performance-wise.
A DW process can be designed based on the level of access it has. Additionally, it can be divided into two main areas:
White-box watermarking;
Black-box watermarking.
In white-box watermarking, the process is designed to be open and transparent, allowing authorized individuals to fully understand and verify the embedded watermark. In black-box watermarking, the process is designed to be hidden and inaccessible, making it difficult for unauthorized individuals to comprehend or tamper with the watermark [
7].
The required properties of a DW may depend on the use case in which it is applied. For marking parameterized models, with copyright information, such as an NN, a DW has to be rather robust against modifications such as knowledge distillation (KD), pruning, weights and activations quantization, additive noise to the inputs, etc. A digital copy of the NN is exactly the same as the original. DW is a passive approach; it can just mark the parameters and does not degrade NN task performance as expected with respect to the unwatermarked (unWMd) version [
8]. One application of DW is source tracking. A watermark can be embedded into an NN model in several different stages. In the event that an unauthorized copy of the model has hypothetically been found, these NNW methods can help the owner to do the following:
These can be useful when malicious individuals take advantage of unclaimed IP. This helps to protect IP and ensures that the model cannot be used unlawfully without attribution.
NN watermarking (NNW) is composed of two main steps [
8]:
- 1.
Embedding the watermark: The process begins by selecting a DW, which could be a sequence of bits or a unique pattern. The DW is embedded into the NN model during the training phase or post-training phase. This can be achieved by subtly altering the parameters of the model, such as weights and biases, to inject the DW information. Alternatively, it can be integrated into the inputs as well;
- 2.
Detection: Once the NN has been trained and the DW embedded, it can be deployed for its intended task. To verify the authenticity of the model, the DW needs to be extracted. This is typically performed by providing a set of inputs to the model and observing its outputs. Specialized algorithms, such as trigger set-based detection [
9], steganalysis-based Detection [
10], and zero-bit watermarking [
11], are then used to analyze the model’s behavior and extract the DW from its parameters or its activations. The extracted watermark is compared against the original one to verify the authenticity and ownership of the model.
DW techniques face several challenges, including ensuring robustness against malicious attacks, preserving content quality, and providing security to prevent unauthorized detection and removal. Effective DW must balance these challenges while also offering sufficient capacity and computational efficiency. To address these issues, experiments can be conducted to test robustness against attacks such as Gaussian noise, pruning, and quantization attacks, as reported in
Section 9. Afterward, the watermark is evaluated to assess efficiency. These experiments help refine techniques to ensure they meet the necessary requirements and perform effectively in real-world scenarios.
9. Attacks and Results
This section reports the experimental analysis of the resilience of the watermarked models through a series of simulated attacks. The goal is to evaluate how well these models can withstand adversarial conditions without prior knowledge of the model’s inner workings. As outlined in
Section 6.1 and
Section 6.2, both white-box and black-box watermarking techniques were implemented, each with a distinct embedding or detection strategy. The upcoming sections will focus on these models, subjecting them to a variety of simulated attack scenarios, including Gaussian noise injection, pruning, and quantization. These simulations are designed to test the robustness and security of the watermarks, providing insight into the potential vulnerabilities of these models under adversarial stress.
The MPAI (
https://mpai.community/ accessed on 13 November 2024) (Moving Picture, Audio and Data Coding by Artificial Intelligence) community established the IEEE 3304-2023 standard [
52] for NNW to ensure the robustness and reliability of AI models. According to this standard, NN models may be subjected to various types of attacks to evaluate their resilience and performance. This is crucial for maintaining the integrity and reliability of AI systems in real-world applications.
The models were tested under three different types of attacks:
Gaussian involves adding random noise to the data, which can potentially disrupt the model’s ability to output accurate predictions. Gaussian noise is a common form of statistical noise that, when it affects the input data, leads to degraded performance. For the attacks, the values 0.001, 0.01, 0.1, 1, and 10 represented the standard deviation of the added noise, with higher numbers indicating more noise that can disrupt the model’s performance more and potentially its watermark;
Pruning selectively removes connections between neurons, effectively reducing the complexity of the NN and potentially degrading its performance. Pruning is often used to simplify a model’s complexity, but in this context, it is used to test the model’s robustness. In these attacks, the numbers 0.1 to 0.5 indicated the fraction of the model’s weights that were removed, with higher values representing more aggressive pruning that can degrade both the model’s accuracy and the integrity of the watermark;
Quantization involves reducing the precision of numerical values, which can lead to a loss of information and accuracy. Quantization typically reduces the bit depth of the weights of the NN, and according to the MPAI documentation, this is applied only to the weights as fake quantization. The reduction in bit depth can significantly impact the model’s performance. For quantization attacks, 16-bits to 2-bits denoted the bit depth to which the model’s weights were reduced, with lower bit depths reflecting more extreme quantization, which can result in a significant loss in precision and potentially damage the watermark.
In
Section 9.1 and
Section 9.2, the results achieved under simulated attacks are presented. The values under the type of attacks in the first left column provided in
Table 13,
Table 14,
Table 15 and
Table 16 refer to the specific parameters of the attacks, proportional to the extent of degradation. The models have been trained using the Keras AI framework v2.12.0. Since the attacks standardized by MPAI are implemented based on PyTorch v2.2.0, a lossless conversion has been developed. Subsequently, the PyTorch attached models were exported in the ONNX exchange format so that tests could be conducted not only with the original framework on a PC but also on the NUCLEO-H743ZI2 MCU board with STEdgeAI. Both the results for the original dataset and the robustness of the previously inserted watermark have been included for both NNW techniques.
The mentioned watermark is resistant to pruning because of the distribution and redundancy of the watermark throughout the whole model, which ensures that not all watermark information is lost, even if portions of the network are removed somehow. Similar to its resistance to pruning, the watermark is also resilient to quantization through careful design, such as using quantization-aware training to maintain its accuracy even when the precision of weights changes. By embedding watermarks with these considerations, the detectability of the watermark is maintained despite the network modifications.
9.1. Results for the White-Box Watermarking Method
In this subsection, the resistance to attacks explained above will be presented for WMd and unWMd models with datasets explained in
Section 5.1 and
Section 5.2 considering the watermarking algorithm explained in
Section 6.1 with the form of the watermark matrix in
Section 6.1.2 set as direct. In addition to the results from the inferences with the original dataset as input, the BER extracted value for each model is reported in
Table 13 and
Table 14. To ensure the correct identification of watermarks, a BER threshold of 0% has been adopted. In other words, a watermark was considered to be correctly recognized only if each single extracted bit matched exactly the original inserted bit. Results that met this criterion are highlighted in green.
The results of the experiments discussed in
Section 7.1.1 and reported in
Table 13 reveal varying degrees of effectiveness for each attack on the watermarked model that uses the dataset explained in
Section 5.1 that uses a MobilenetV1 model with the
Section 6.1 and
Section 6.1.2 approaches in direct mode. It has to be noted that when the model is subject to the three types of attacks (Gaussian, prune, and quantization, its performance in terms of accuracy shows a decreasing trend as the attack power increases, in fact, from the lowest values of attack an accuracy that goes from 71.66% to 60.93% could be seen. Despite the decrease in accuracy values, the model remains robust in terms of the watermark detected; in fact, it has been possible to notice that the BER value remained steady at 0% in all attack cases except for the Gaussian attack with an attack value of 10, in which the BER is 8.98%.
The results of experiments discussed in
Section 7.1.2 and reported in
Table 14 revealed varying degrees of effectiveness for each attack on the watermarked model that uses the dataset explained in
Section 5.2 that uses a customized ResNet-8 with the
Section 6.1 and
Section 6.1.2 approaches in direct mode. In contrast to the results presented in
Table 13, the model showed greater resistance to attacks of lower intensity. In fact, an accuracy above 70% can be seen in the first three levels of attack for each case (Gaussian, prune, and quantization). However, when exposed to more powerful attacks, the model’s performance dropped significantly. Regarding the detection of the watermark, behavior similar to that described in
Table 13 was observed, with the exception of the Gaussian attack with an attack value of 10, in which a BER of 0.26% was obtained.
9.2. Results for the Black-Box Watermarking Method
In this subsection, the resistance to attacks explained above will be presented for WMd and unWMd models with datasets explained in
Section 5.1 and
Section 5.2 considering the watermarking algorithm explained in
Section 6.2 embedded using abstract images generated by the Pillow library as described in
Section 6.2.5. In addition to the results from the inferences with the original dataset as input, the accuracy value obtained using the selected trigger value for each model has been reported in
Table 13 and
Table 14. To ensure the correct identification of watermarks, the accuracy of the trigger dataset should exceed 88%, with a non-recognition threshold of 12%. Results that met this criterion were highlighted in green.
The results of experiments discussed in
Section 7.2.1 and reported in
Table 15 revealed varying degrees of effectiveness for each attack on the watermarked model that uses the dataset explained in
Section 5.1 that uses a MobilenetV1 model with the
Section 6.2 and
Section 6.2.2 approaches composed of random shapes and colors. In the case presented, a similar behavior of the model subjected to the three types of attacks (Gaussian, prune and quantization), obtained in the two previous tables, in terms of accuracy, was also found for
Table 15. The attack does not seem to have been decisive from the beginning, but it was possible to notice that as the power increased, the accuracy decreased, reaching a minimum of 50%. The trend of the data is also very consistent with the level of attack applied to the model. According to the results obtained in
Table 15, the WMd model appeared to be more robust with the application of quantization attack, obtaining a percentage higher than 88% up to the set 6-bit attack value, while it seemed to be more susceptible to the pruned attack in which, already from the second level of attack, the trigger dataset reached a percentage of 77.46%.
The results of experiments discussed in
Section 7.2.2 reported in
Table 16 revealed varying degrees of effectiveness for each attack on the watermarked model that uses the dataset explained in
Section 5.2 that uses a MobilenetV1 model with
Section 6.2 approach and a
Section 6.2.2 composed of random shapes and colors. The performances of the model subjected to the three types of attacks (Gaussian, prune, and quantization), in terms of accuracy, were also consistent with the level of attack. However, a strong drop was recorded for the last attack levels, especially in the case of the Gaussian attack and wuantization attack, where an accuracy value of 10% was reached. The trend of the Trigger dataset was very similar to
Table 15; in fact, the WMd model remained robust in the first levels of attack, starting from a percentage of 100% or 97.27%. The further increase in the power of attack led to a sharp drop in the percentage, reaching values below 20%.
10. Conclusions
In this work, two NNW methods applied to two use cases have been studied. The tests conducted on the WMd and non-WMd models revealed important insights about the quality of the injection using a regularizer and a selected trigger dataset as part of the training dataset. Multiple parameters for both techniques have been studied, and the results have been analyzed. The best models for each method and each use case have been maliciously manipulated through the application of adversarial attacks such as Gaussian noise addition, pruning, and parameter quantization. This evaluation was necessary to test the robustness of the watermark inserted in the model under deployment by the owner.
These models have also been analyzed and implemented on MCUs using the ST Edge AI Unified Core technology. The WMd models maintained their efficiency in terms of computational resources unchanged and also demonstrated superior resilience against various types of attacks. This ensured the security and reliability of ML applications without affecting the models’ capabilities. Following this procedure, the IP owner had an additional tool to be used as proof of ownership of their intellectual property, for example, in the case of unauthorized use and fraudulent copying.
The results of the study indicate that different methods for generating trigger sets offer varied levels of robustness and adaptability in NNW. Unique image generation through GANs proved highly effective in creating resilient watermarks that withstand tampering, although random image generation using PIL provided a better combination of original task and trigger set accuracy despite it being a simpler approach.
Considering the accuracy results achieved by combining the trained models and the original and generated input data, the results confirmed the robustness of these techniques even for tiny models deployed on microcontrollers.
Future works will be focused on expanding the use cases in the field of audio and GANs. To further reduce the implementation costs on tiny devices, techniques such as post-training quantization and quantization-aware training will be the subject of more studies.