Next Article in Journal
Challenges of Machine Learning Applied to Safety-Critical Cyber-Physical Systems
Previous Article in Journal
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Jacobian-Based Saliency Maps Attacks

1
CentraleSupélec, Mathematics and Computer Science Department, 3 Rue Joliot-Curie, 91192 Gif-sur-Yvette, France
2
IRT SystemX, 8 Avenue de la Vauve, 91120 Palaiseau, France
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2020, 2(4), 558-578; https://doi.org/10.3390/make2040030
Submission received: 10 October 2020 / Revised: 2 November 2020 / Accepted: 5 November 2020 / Published: 13 November 2020
(This article belongs to the Section Learning)

Abstract

:

Simple Summary

This paper introduces simple, faster and more efficient versions of the known targeted and untargeted Jacobian-based Saliency Map Attacks (JSMA). Despite creating adversarial examples with a higher average L 0 distance than the state-of-the-art Carlini-Wagner attack, the new versions of JSMA have a significant speed advantage over this attack, making them very convenient for L 0 real-time robustness testing of neural network classifiers.

Abstract

Neural network classifiers (NNCs) are known to be vulnerable to malicious adversarial perturbations of inputs including those modifying a small fraction of the input features named sparse or L 0 attacks. Effective and fast L 0 attacks, such as the widely used Jacobian-based Saliency Map Attack (JSMA) are practical to fool NNCs but also to improve their robustness. In this paper, we show that penalising saliency maps of JSMA by the output probabilities and the input features of the NNC leads to more powerful attack algorithms that better take into account each input’s characteristics. This leads us to introduce improved versions of JSMA, named Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA), and demonstrate through a variety of white-box and black-box experiments on three different datasets (MNIST, CIFAR-10 and GTSRB), that they are both significantly faster and more efficient than the original targeted and non-targeted versions of JSMA. Experiments also demonstrate, in some cases, very competitive results of our attacks in comparison with the Carlini-Wagner (CW) L 0 attack, while remaining, like JSMA, significantly faster (WJSMA and TJSMA are more than 50 times faster than CW L 0 on CIFAR-10). Therefore, our new attacks provide good trade-offs between JSMA and CW for L 0 real-time adversarial testing on datasets such as the ones previously cited.

1. Introduction

Deep learning classifiers are used in a wide variety of situations, such as vision, speech recognition, financial fraud detection, malware detection, autonomous driving, defense, and more.
The ubiquity of deep learning algorithms in many applications, especially those that are critical such as autonomous driving [1,2] or that pertain to security and privacy [3,4] makes their attack particularly useful. Indeed, this allows firstly to identify possible flaws in the intelligent model and secondly to set up a defense strategy to improve its reliability.
In this context, adversarial machine learning has appeared as a new branch that aims to thwart intelligent algorithms. Many techniques called adversarial attacks succeeded in fooling well-known architectures of neural networks, sometimes very astonishingly. Examples of these methods include for instance: Fast Gradient Sign Method [5], Basic Iterative Method [6], Projected Gradient Descent [7], JSMA [8], DeepFool [9], Universal Adversarial Perturbations [10] and CW attacks [11].
More formally, a neural network classifier (NNC) is an algorithm whose goal is to predict through a neural network which class an item x belongs to, among a family of K possible classes. It outputs a vector of probabilities F ( x ) = ( F 1 ( x ) , , F K ( x ) ) where the label of x is deduced by the rule label ( x ) = argmax k F k ( x ) .
An adversarial example constructed on the item x, is an item x * specially crafted to be as close as possible to x (with respect to some distance function), and such that it is classified by the NNC as label ( x * ) label ( x ) (Non-Targeted (NT) attack), or even such that label ( x * ) = L , with L chosen by the attacker and such that L label ( x ) (Targeted attack).
In this paper, we focus on the following class of attacks:
   L 0 (or sparse) adversarial attacks. They aim at generating adversarial samples while minimising the number of modified components. Sparse perturbations can be found in many real-life situations. As motivated in [12], sparse perturbations could correspond to some raindrops that reflect the sun on a“STOP" sign, but that are sufficient to fool an autonomous vehicle; or a crop-field with some sparse colorful flowers that force a UAV to spray pesticide on non affected areas. This reveals very disturbing and astonishing properties of neural networks as it is possible to fool them by modifying few pixels [12,13]. Their study is therefore fundamental to mitigate their effects and take a step forward towards robustness of neural networks. The first proposed example of L 0 attacks is JSMA, a targeted attack [8]. In a computer vision application, JSMA achieved 97% adversarial success rate by modifying on average 4.02% input features per sample. [8] relates this result to the human capacity of visually detecting changes. An important quality of JSMA is that it is easy to understand, set up and it is relatively fast. For instance, relying on its cleverhans implementation [14], JSMA is able to generate 9 adversarial samples on the MNIST handwritten digits dataset [15] in only 2 s on a laptop with 2 CPU cores. Speed and also the ability to run adversarial attacks with limited resources are important criteria for real-life deployment of neural networks [16,17,18]. JSMA obeys these constraints making its use widespread beyond computer vision applications such as in cybersecurity, anomaly detection and intrusion detection [19,20,21]. Later on, [11] proposes the second example of targeted L 0 attacks known as CW L 0 . The same paper shows that unlike JSMA, CW L 0 scales well to large datasets by considering the IMAGENET dataset [22]. CW  L 0 is now the state-of-the-art L 0 targeted attack with the lower average L 0 distance. However, it is computationally-expensive, much slower than JSMA on small datasets (a factor of 20 times slower is reported in [11]) and therefore, despite efficiency, it is less convenient for real-time applications.
Let us now briefly explain how the afore-mentioned attacks concretely work:
JSMA. To fool NNCs, this attack relies on the Jacobian matrix of outputs with respect to inputs. By analysing this matrix, one can deduce how the output probabilities behave given a slight modification of an input feature. Consider a NNC N as before and call Z ( x ) = ( Z 1 ( x ) , , Z K ( x ) ) the outputs of the second-to-last layer of N (no longer probabilities, but related to the final output by a softmax layer). To generate an adversarial example from x, JSMA first computes the gradient Z ( x ) . The next step is to build a saliency map and find the most salient component i that will then be changed:
S [ x , t ] [ i ] = 0 if Z t ( x ) x i < 0 or k t Z k ( x ) x i > 0 Z t ( x ) x i · k t Z k ( x ) x i otherwise .
Z t ( x ) x i and k t Z k ( x ) x i in these maps quantify how much Z t ( x ) will increase and k t Z k ( x ) will decrease, given a modification of the input feature x i . Counting on the Z k ’s instead of the F k ’s has been justified in [8] by the extreme variations induced by the softmax layer. Then the algorithm selects the component: i max = argmax i S [ x , t ] [ i ] and increases x i max by a default value θ before clipping to the valid domain. The same process is iterated until the class of x is changed or a maximum allowed number of iterations is reached. This version of JSMA will be called one-component JSMA. A second, more effective, variant of JSMA recalled later relies on doubly indexed saliency maps.
CW L 0 attack. This method is obtained as a solution to the optimisation problem (assuming the domain of inputs is [ 0 , 1 ] n ):
Minimise | | r | | 0 + c f ( x + r ) , x + r [ 0 , 1 ] n
where | | r | | 0 is the L 0 distance of the perturbation r added to x. The recommended choice of f is: f ( x ) = ( max i t Z i ( x ) Z t ( x ) ) + . Since the L 0 distance is not convenient for gradient descent, the authors of [11] solve this problem by making use of their L 2 attack and an algorithm that iteratively eliminates the components without much effect on the output classification. They finally obtain an effective L 0 attack that has a net advantage over JSMA. However, the main drawback of this attack is its high computational cost.
We summarise our main contributions as follows:
For targeted misclassification, we introduce two variants of JSMA called Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA). WJSMA applies a simple weighting to saliency maps by the output probabilities, and TJSMA does the same, while additionally penalising extremal input features. Both attacks are more efficient than JSMA according to several metrics (such as speed and mean L 0 distance). We present qualitative and quantitative results on MNIST and CIFAR-10 [23] supporting our claims. Moreover, although they are less efficient than CW L 0 , our attacks have a major speed advantage over CW L 0 highlighted by measuring the execution time for each attack.
For non-targeted (NT) misclassification, we improve the known NT variants of JSMA called NT-JSMA and Maximal-JSMA (M-JSMA) [24]. We do this by introducing NT and M versions of WJSMA and TJSMA. Our attacks yield better results than NT-JSMA and M-JSMA. Also, they are as competitive but significantly much faster than NT CW L 0 . These claims are illustrated through applications to attack a deep NNC on the GTSRB dataset [25] in the white/black-box modes.
We provide a deep comparison between WJSMA and TJSMA which is of independent interest. Our study concludes that in the targeted case, TJSMA is better than WJSMA. However, in the NT case, WJSMA is preferred over TJSMA, mainly due to the simplicity of its implementation.
We provide fast and optimised implementations of the new attacks using TensorFlow and the cleverhans library [14] that might help users working in adversarial machine learning (In all our experiments, we use the original implementation of JSMA available in the cleverhans library and the original code of CW publicly available. As for the NT versions of JSMA [24], whose implementations are not available, we re-implement these attacks using TensorFlow and cleverhans . A link to the codes is provided at the end of Section 5.2)
The rest of the paper is organised as follows. In Section 2, we discuss our main motivations and then introduce WJSMA and TJSMA as targeted attacks. Section 3 is focused on NT and Maximal versions of WJSMA and TJSMA. Section 4 is dedicated to several comparisons between our attacks, JSMA and CW L 0 . We discuss attacking, defending with targeted/non-targeted attacks in both the white/black-box setups. Section 5 offers a conclusion and a summary of the main results of the paper. Finally, Appendix A and Appendix B are appendices dedicated to supplementary results and materials.

2. Targeted Attacks

The first attacks, presented in this section, are targeted and called Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA). We give a detailed exposition of WJSMA and motivate the main idea leading to its derivation through a simple example. Then, we deduce TJSMA by applying once more and in a slightly different manner the same argument. In addition to the theoretical presentation, a few preliminary figures are given to illustrate a faster convergence of the new attacks in comparison with JSMA.

2.1. Weighted Jacobian-Based Saliency Map Attack (WJSMA)

The main idea here is to penalise gradients associated with small probabilities so as to mitigate their influences in saliency maps. The goal is to obtain more balanced saliency maps than those proposed by JSMA. We first give a concrete example illustrating a concrete limitation of JSMA which motivated this work.
Motivating example. Assume a number of classes K 4 and for some input x: F 1 ( x ) = 0.5 , F 2 ( x ) = 0.49 , F 3 ( x ) = 0.01 and F k ( x ) = 0 for all 4 k K . Consider the problem of generating an adversarial sample to x with label t = 2 . In order to decrease k 2 Z k ( x ) , the first iteration of JSMA relies on the gradients Z k ( x ) , k 2 . Our main observation is that since the probabilities F k ( x ) = 0 for 4 k K are already in their minimal values, taking into account Z k ( x ) for these values of k in the search of i max is unnecessary. In other words, by only acting on gradients, JSMA does not consider the crucial constraints on probabilities: F k ( x ) 0 . Moreover, instead of relying equally on Z 1 ( x ) and Z 3 ( x ) , for this example one would “bet” on Z 1 ( x ) than Z 3 ( x ) as the possible decrease for F 1 ( x ) is high (up to 0.5 ) and F 3 ( x ) is relatively small, thus hard to decrease further.
Weighted JSMA (WJSMA). Our first solution to the previous issue is WJSMA. Its principle is to penalise each gradient Z k ( x ) x i , where k t , by the probability F k ( x ) . Besides the intuition of this idea, we will provide a justification of it by a classical log softmax argument. First, we compute:
x i log F t ( x ) = ( 1 F t ( x ) ) Z t x i ( x ) k t F k ( x ) Z k x i ( x )
One way to maximise this derivative with respect to i, is to maximise A = Z t x i ( x ) and minimise B = k t F k ( x ) Z k x i ( x ) under the constraints A > 0 and B < 0 . These constraints ensure, in particular, that F t x i ( x ) remains positive, a fact which is not necessarily guaranteed under JSMA constraints. According to this, we introduce one-component weighted saliency maps as follows:
S W [ x , t ] [ i ] = 0 if Z t ( x ) x i < 0 or k t F k ( x ) Z k ( x ) x i > 0 Z t ( x ) x i · k t F k ( x ) Z k ( x ) x i otherwise .
Based on these maps, we present Algorithm 1, the first version of WJSMA, to generate targeted adversarial samples.
Algorithm 1 Generating adversarial samples by WJSMA: version 1.
  • Inputs: N: a NNC, Z: second-to-last output of N, x: input to N, t: target label ( t class ( x ) ), maxIter : maximum number of iterations, θ min , θ max lower and upper bounds for features values, θ : positive default increase value.
  • Output:  x * : adversarial sample to x.
  • x * x  
  • iter 0  
  • Γ 1 , | x | \ { p 1 , | x | | x [ p ] = θ m a x }  
  • while class ( x * ) t   and   iter < maxIter   and   Γ do
    • p m a x = argmax p Γ S W [ x * , t ] ( p )  
    •  Modify x * by x * [ p m a x ] = Clip [ θ min , θ max ] ( x * [ p m a x ] + θ ) / / Clip is the clipping function  
    •  Remove p max from Γ  
    • iter + +  
  • end while 
  • return x *
When the output x * of Algorithm 1 satisfies class ( x * ) = t , the attack is considered as successful.
To relax a bit the search of salient components and motivated by a computer vision application, ref. [8] introduces saliency maps indexed by pairs of components. The main argument is that the conditions required in (1) may be too strict for some applications and very few components will verify it. Our doubly indexed versions of these maps are introduced in the same way as follows:
S W [ x , t ] [ i , j ] = 0 if a { i , j } Z t ( x ) x a < 0 or k t F k ( x ) a { i , j } Z k ( x ) x a > 0 a { i , j } Z t ( x ) x a ·   k t F k ( x ) a { i , j } Z k ( x ) x a otherwise .
Algorithm 2 presented below relies on S W [ x , t ] [ i , j ] to generate targeted adversarial samples and is our second version of WJSMA.
Algorithm 2 Generating adversarial samples by WJSMA: version 2.
  Same inputs and output as Algorithm 1.
  • x * x  
  • iter 0  
  • Γ { ( p , q ) , p , q 1 , | x | , x [ p ] θ m a x , x [ q ] θ m a x }  
  • while class ( x * ) t   and   iter < maxIter   and   Γ do
    • ( p max , q max ) = argmax p , q Γ S W [ x * , t ] ( p , q )  
    •  Modify x * by x * [ a ] = Clip [ θ min , θ max ] ( x * [ a ] + θ ) , a = p max , q max  
    •  Remove ( p max , q max ) from Γ  
    • iter + +  
  • end while 
  • return x *
In practice and despite the fact that each iteration of Algorithm 2 is more computationally-expensive than each iteration of Algorithm 1, we find that it gives better results. This agrees with the recommendations of [8] on the superiority of two-components versions for JSMA. Finally, we notice that while in the two previous algorithms, the selected components are always augmented by positive default values, decreasing versions can be given following a similar logic.

2.2. Taylor Jacobian-Based Saliency Map Attack (TJSMA)

The principle of our second attack, TJSMA, is to additionally penalise the choice of feature components that are close to the maximum value θ max . Assume i and j have the same WJSMA score S W [ x , t ] [ i ] = S W [ x , t ] [ j ] and that x i is very close to θ max , while x j is far enough from θ max . In this case, looking for more impact, TJSMA prefers x j over x i . Concretely, we simultaneously maximise S 1 = θ max x i and S 2 = x i log p t ( x ) by maximising the product S = S 1 S 2 . Accordingly, we introduce new saliency maps for one and two-components selection as follows:
S T [ x , t ] [ i ] = 0 if α i < 0 or β i > 0 α i | β i | otherwise .
where
α i = ( θ max x i ) Z t ( x ) x i , β i = ( θ max x i ) k t F k ( x ) Z k ( x ) x i
and
S T [ x , t ] [ i , j ] = 0 if α i , j < 0 or β i , j > 0 α i , j | β i , j | otherwise .
where
α i , j = a { i , j } ( θ max x a ) Z t ( x ) x a , β i , j = k t a { i , j } F k ( x ) ( θ max x a ) Z k ( x ) x a
Due to the presence of the Taylor terms ( θ max x a ) Z k ( x ) x a , we call these maps Taylor saliency maps. We introduce one and two-components TJSMA following exactly Algorithms 1 and 2 and only replacing S W with S T .
Figure 1a,b offer a concrete illustration of the convergence of JSMA, WJSMA and TJSMA. In particular, we observe that WJSMA and TJSMA decrease/increase the predicted/targeted probability of the original/targeted class much sooner than JSMA. Also, we note that TJSMA behaves like WJSMA until it is able to find a more vulnerable component that makes it converge much faster.

3. Non-Targeted Attacks

NT variants of JSMA have been studied in [24]. In particular, the paper introduces NT-JSMA-F, NT-JSMA-Z based on NT saliency maps whose role is to select the most salient pairs of components to decrease as much as possible the probability of the current class. The notations -F and -Z indicate if the saliency maps either use the F k ’s or the Z k ’s. Second, [24] proposes maximal JSMA (M-JSMA) as a more flexible attack allowing both increasing/decreasing features and also combining targeted/non-targeted strategies at the same time.
In what follows, we again leverage the idea of penalising saliency maps to give our proper NT JSMA attacks. For a unified presentation, we use the letter X to denote either W (Weighted) or T (Taylor) and the letter Y to denote either Z (logits) or F (probabilities). We notice that while the first version of JSMA uses the logits, variants that rely on the F k ’s also demonstrated good performances [11,24]. Thus for a more complete study, we give versions with both Z and F.
By a NT reasoning, similar to the previous section, we define Weighted and Taylor NT saliency maps as follows:
S X , Y [ x , t ] [ i , j ] = 0 if α i , j X , Y > 0 or β i , j X , Y < 0 | α i , j X , Y | β i , j X , Y otherwise .
where, for X = W ,
α i , j W , Y = a { i , j } Y t ( x ) x a , β i , j W , Y = k t a { i , j } F k ( x ) Y k ( x ) x a
and, for X = T ,
α i , j T , Y = a { i , j } ( θ max x a ) Y t ( x ) x a , β i , j T , Y = k t a { i , j } F k ( x ) ( θ max x a ) Y k ( x ) x a
These maps can be motivated, like in the previous section, by considering the simple one-component case: i = j . For example, the role of the penalisation by F k ( x ) in S W , Z is to reduce the impact of high gradients Z k ( x ) x i when the probability F k ( x ) is very small (in this case we penalise choosing k as a new target for x).
Relying on saliency maps, Algorithm 3 below presents our improvements of NT-JSMA-Z and NT-JSMA-F. For the sake of simplification, we have employed a unified notation NT-XJSMA-Y. For example, when we use S W , Z , the obtained attack is NT-WJSMA-Z. Again, we only write the increasing version.
Algorithm 3 NT-WJSMA and NT-TJSMA attacks.
  • Inputs: x: input to N with label t, maxIter: maximum number of iterations, X { W , T } , Y { Z , F }
  • Output:  x * : adversarial sample to x.
  • x * x  
  • iter 0  
  • Γ { ( p , q ) , p , q θ max , | x | , x [ p ] θ max , x [ q ] θ max }  
  • while class ( x * ) t   and   iter < maxIter   and   | Γ | 2 do
    • ( p max , q max ) = argmax p , q Γ S X , Y [ x * , t ] ( p , q )  
    •  Modify x * by x * [ p max ] , x * [ q max ] = θ max  
    •  Remove ( p max , q max ) from Γ  
    • iter + +  
  • end while 
  • return x *
We now turn to extensions of M-JSMA [24]. This attack modifies the pairs of components achieving the best score among NT-JSMA and all possible targeted (including increasing and decreasing) JSMA. It has a greater capacity to craft adversarial samples but is relatively slower than NT-JSMA. Our extensions of M-JSMA, which we call M-WJSMA and M-TJSMA, are described in Algorithm 4.
Algorithm 4 M-WJSMA and M-TJSMA attack.
  • Inputs: x: input to N with label t, maxIter: maximum number of iterations, X { W , T } , Y { Z , F }
  • Output:  x * : adversarial sample to x.
  • x * x  
  • iter 0  
  • Γ { ( p , q ) , p , q 1 , | x | , x [ p ] , x [ q ] θ min , θ max }  
  • while class ( x * ) t   and   iter < maxIter   and   | Γ | 2 do
    •  - Compute all targeted increasing/decreasing saliency maps scores S X , Y [ x , s ] ( p , q ) , s t (Section 2) and all NT increasing/decreasing saliency maps scores S X , Y [ x , t ] ( p , q ) (5).  
    •  - Choose ( p max , q max ) achieving the best score and saturate x * [ p max ] , x * [ q max ] to θ min or θ max according to the chosen saliency map.  
    •  - Remove ( p max , q max ) from Γ  
    •  - iter + +  
  • end while 
  • return x *
Note that saliency maps S W , Y for targeted or non-targeted, features-increasing or features-decreasing attacks are exactly the same (one only needs to decide between an argmax or argmin). This is not the case for M-TJSMA since for example ( θ max x a ) has to be changed to x a θ min when decreasing features. As a consequence of this fact, M-WJSMA is less cumbersome to implement than M-TJSMA. Trying to keep the paper and code as simple as possible, we choose M-WJSMA over M-TJSMA and do not include M-TJSMA in our experiments. M-WJSMA already gives us satisfactory results.

4. Experiments

This section is dedicated to a variety of experiments on the proposed attacks and several comparisons with the state-of-the-art methods. The first part focuses on targeted attacks and provides intensive comparisons between JSMA, WJSMA and TJSMA on deep NNCs on MNIST and CIFAR-10 as well as comparisons with CW L 0 . In the second part, we show that our approach is still relevant for non-targeted L 0 misclassification in both the white-box and black-box modes. A particular emphasis will be put on the speed of our attacks in comparison with CW L 0 .

4.1. Experiments on Targeted Attacks

In the following, we give attack and defense applications illustrating the interest of WJSMA and TJSMA over JSMA. In doing so, we also compare WJSMA and TJSMA and report overall better results for TJSMA despite the fact that for a large part of samples WJSMA outperforms TJSMA. Finally, we provide a comparison with CW L 0 attack and comment on all the obtained results.
The datasets used in this section are:
MNIST [15]. This dataset contains 70,000 28 × 28 greyscale images in 10 classes, divided into 60,000 training images and 10,000 test images. The possible classes are digits from 0 to 9.
CIFAR-10 [23]. This dataset contains 60,000 32 × 32 × 3 RGB images. There are 50,000 training images and 10,000 test images. These images are divided into 10 different classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck), with 6000 images per class.
Figure 2 and Figure 3 display one sample per class, from MNIST and CIFAR-10 respectively.
On each dataset, a deep NNC is trained and then confronted to attacks.
NNC on MNIST. Similarly to [8], we consider LeNet-5 model on this dataset [26]. Its full architecture is described in Appendix A. We implement and train this model using a cleverhans model that optimises crafting adversarial examples. The number of epochs is fixed to 20, the batch-size to 128, the learning rate to 0.001 and the Adam optimizer is used. Training results in 99.98% accuracy on the training dataset and 99.49% accuracy on the test dataset.
NNC on CIFAR-10. We consider a more complex NNC, trained to reach a good performance on this dataset. Its architecture is inspired by the All Convolutional model proposed in cleverhans and is fully described in Appendix A. Likewise, this model is implemented and trained using cleverhans for 10 epochs, with a batch size of 128, a learning rate of 0.001 and the Adam optimizer. Training results in a 99.96% accuracy on the training dataset and 83.81% accuracy on the test dataset.
Our first objective is to compare the performances between JSMA, WJSMA and TJSMA on the previous two NNCs. To run JSMA, we use its original implementation, available in cleverhans . We have also adapted the code to WJSMA and TJSMA thus obtaining fast implementations of these two attacks.
For testing, we consider all images in MNIST, the first 10,000 training and 10,000 test images of CIFAR-10. Moreover, we only test on the images which are correctly predicted by the neural networks as this makes more sense. In this way, the attacks are applied to the whole training set and the 9949 well-predicted images of the MNIST test images. Similarly, CIFAR-10 adversarial examples are crafted from the well-predicted 9995 images of the first training 10,000 images and the 8381 well-predicted test images.
To compare the three attacks, we rely on the notion of maximum distortion of adversarial samples defined as the ratio of altered components to the total number of components. Following [8], we choose a maximum distortion of γ = 14.5 % on the adversarial samples from MNIST, corresponding to maxIter = 784 * γ 2 * 100 . On CIFAR-10, we fix γ = 3.7 % in order to have the same maximum number of iterations for both experiments. This allows a comparison between the attacks in two different settings. Furthermore, for both experiments, we set θ = 1 (note that θ min = 0 , θ max = 1 ).
We report the metrics:
  • Success rate: This is the percentage of successful adversarial examples, i.e crafted before reaching the maximal number of iterations maxIter ,
  • Mean L 0 distance: This is the average number of altered components of the successful adversarial examples,
  • Strict dominance of an attack: Percentage of adversarial examples for which this attack does strictly fewer iterations than the two other ones (As additional results, we give in the Appendix B more statistics on the dominance between any two attacks),
  • Run-time of an attack on a set of samples targeting every possible class.
Results on the metrics 1 and 2 are shown in Table 1 for MNIST and Table 2 for CIFAR-10.
First, we observe that overall, WJSMA and TJSMA significantly outperform JSMA according to metrics 1 and 2. Here are more comments:
On MNIST. Results in terms of success rate are quite remarkable for WJSMA and TJSMA respectively outperforming JSMA with near 9.46, 10.98 percentage points (pp) on the training set and 9.46, 11.34 pp on the test set. The gain in the average number of altered components exceeds 6 components for WJSMA and 9 components for TJSMA in both experiments.
On CIFAR-10. WJSMA and TJSMA outperform JSMA in success rate by near 9.74, 11.23 pp on the training set and more than 10, 12 pp on the test set. For both training and test sets, we report better mean L 0 distances exceeding 7 features in all cases and up to 10.14 features for TJSMA on the training set.
Dominance of the attacks.Figure 4 illustrates the (strict) dominance of the attacks for the two experiments. In these statistics, we do not count the samples for which TJSMA and WJSMA have the same number of iterations and strictly less than JSMA.
For both experiments, TJSMA has a notable advantage over WJSMA and JSMA. The benefit of WJSMA over JSMA is also considerable. This shows that, in most cases, WJSMA and TJSMA craft better adversarial examples than JSMA, while being faster. Our results are indeed better when directly comparing WJSMA or TJSMA with JSMA. As additional results, we give in the appendix the statistics for the pairwise dominance between the attacks. As it might be expected, both WJSMA and TJSMA dominate JSMA, moreover TJSMA dominate WJSMA.

4.1.1. Avoid Confusion

It is important to stress that our results do not contradict [8] obtaining 97 % success rate on LeNet-5. Indeed, we use a more efficient LeNet-5 model (the one in [8] has 98.93 % and 99.41 % accuracies on the training and test sets). For completeness, we also generated a second model (with 99.34 % and 98.94 % accuracies on the training and test sets) and evaluated the three attacks on the first 1000 test MNIST images. We obtain 96.7 % success rate for JSMA (very similar to [8]) and more than 99.5 % for WJSMA and TJSMA. We preferred to work with the more effective model as this makes the paper shorter and moreover, it values more our approach (giving us more advantage with respect to JSMA).

4.1.2. Run-Time Comparison

In order to have a meaningful speed comparison between the three attacks, we computed the time needed for each attack to successfully craft the first 1000 test MNIST images in the targeted mode. Results are shown in Table 3 and reveal that TJSMA/WJSMA are 1.41/1.28 times faster than JSMA. These performances were measured on a machine equipped with an Intel Xeon 6126 processor and a Nvidia Tesla P100 graphics processor. Note that for WJSMA/TJSMA, the additional computations of one iteration compared to JSMA are negligible (simple multiplications). Thus the difference in speed between the attacks is mainly due to the number of iterations for each attack.
We also notice that in this evaluation, the adversarial samples were crafted one by one. In practice, it is possible to generate samples by batch. In this case, the algorithm stops when all samples are finished. Most of the time, with a batch of large size, the three attacks approximately take the same time to converge. For example, on the same machine as previously and with a batch size equal to 1000, we were able to craft the same amount of samples in about 250 s, for all the attacks.
Defense. The objective now is to train neural networks so that the attacks fail as much as possible. One way of doing this is by adding adversarial samples crafted by JSMA, WJSMA and TJSMA to the training set. This method of training may imply a decrease in the model accuracy but adversarial examples will be more difficult to generate.
We experiment this idea on the MNIST model in every possible configuration. To this end, 2000 adversarial samples per class (20,000 more images in total), with distortion under 14.5%, are added to the original MNIST training set, crafted by either JSMA, WJSMA or TJSMA. Then, three distinct models are trained on these augmented datasets. The models roughly achieve an accuracy of 99.9% on the training set and 99.3% on the test set, showing a slight loss compared to our previous MNIST model. Nevertheless, the obtained neural networks are more robust to the attacks as shown in Table 4. Note that each experiment is made over the well-predicted samples of the test images. For each model and image, nine adversarial examples are generated by the three attacks.
Overall, the attacks are less efficient on each of these models, compared to Table 1. The success rates drop by about 8 pp, whereas the number of iterations is increased by approximately 26%. From the defender’s point of view, networks trained against WJSMA and TJSMA give the best performance. The JSMA trained model provides the lowest success rates while the TJSMA trained network is more robust from the L 0 distance point of view. From the attacker’s point of view, TJSMA remains the most efficient attack regardless of the augmented used dataset.

4.1.3. Comparison with CW L 0 Attack

Because of the complexity of this attack, comparison on a large number of images as before is very costly. For this reason, we only provide results on the first 100 well-predicted images of CIFAR-10, thus on 1000 adversarial images given in Table 5.
We report better results of CW in terms of success rate and L 0 distance and a remarkable speed advantage of our attacks. Indeed, generating 9 adversarial samples (one by one) from a CIFAR-10 image by CW took on average near one hour and a half on GPU. The same task took 100 s for our attacks (without batching and 25 s when batching). This makes our attacks at least 54 times faster than CW L 0 .

4.2. Experiments on Non-Targeted Attacks

In this part, we test the new NT attacks in the white/black-box modes and compare their performances with NT-JSMA and M-JSMA. In the white-box mode, we also compare with NT CW L 0 . For experimentation, we chose the GTSRB dataset [25] widely used to challenge neural networks, especially in autonomous driving environments [27]. We recall that GTSRB contains RGB 32 × 32 × 3 traffic signs images, 86,989 in the training set, and 12,630 in the test set classified into 43 different possible categories. Figure 5 displays some images from this dataset.

4.2.1. White-Box Experiments

We consider a simplified NNC, described in Appendix A, whose architecture is inspired by Alexnet [28]. After training this model with cleverhans , it reaches 99.98% accuracy on the training dataset and 95.83% accuracy on the test set.
We implemented our attacks and those of [24] again with TensorFlow using cleverhans . During the first experiment, we run the attacks given in Table 6 on the first 1000 test images after taking a maximum distortion γ = 3.7 % similar to CIFAR-10. The obtained results are shown in the same table.
To analyse these results, it makes more sense to separately compare the NT-versions (faster), the M-versions (the most effective in success rate but slower) and discuss a global comparison with CW L 0 . First, we notice a significant advantage of our NT versions over NT-JSMA according to the three metrics. In particular, NT-TJSMA is up to 5.4 × faster than NT-JSMA with nearly 4.8 less modified pixels on average and more than 5 percentage points in success rate. We also notice that NT-TJSMA is the most effective attack among the three NT versions. Its notable benefit over NT-WJSMA is due to the fact that it converges in much less iterations than NT-WJSMA while both attacks need approximately the same time for an iteration. As for the M-versions, our attack M-WJSMA outperforms M-JSMA according to the three reported metrics. Both attacks are however slower than the NT versions. Finally, we notice that the gap between our attack M-WJSMA and CW L 0 is reduced as we obtain a better success rate, achieve less than two pixels in L 0 average while being more than 40 times faster.

4.2.2. Black-Box Experiments

Here, we consider that black-box attack means the algorithm can use the targeted model as an oracle: when feeding it an image, the oracle returns a single class label. Moreover, we want to make only a limited amount of queries, which in a realistic setup would mean avoiding any suspicious activity, or at least trying not to depend too much on the oracle. To overcome this restriction, we train a substitute NNC using the Jacobian-based Dataset Augmentation (JBDA) [27] method. We then perform white-box attacks on the substitute NNC. If it is good enough, the resulting adversarial images should transfer to the oracle, i.e., effectively fool it even though they were designed to fool the substitute. Details about the substitute NNC architecture can be found in Appendix A. The JBDA method allows us to make queries to the oracle only during the training of the substitute network. Black-box attacks are thus closer to real-life setups, as one only needs to access for a short period of time the targeted model before being able to durably fool it. A dangerous application of such techniques is against autonomous driving cars and their sign recognition systems. NT black-box attacks are exactly the kind of threat that can be put in practice with relative ease and still disrupt considerably the car’s behaviour. We will illustrate this insecurity through the GTSRB dataset.
Attacks in this paragraph are NT-JSMA and M-JSMA [24], along with our contributions NT-WJSMA, NT-TJSMA and M-WJSMA. We experiment with different distortion rates. Contrary to white-box attacks, in the context of black-box attacks, the distortion rate is not an upper bound on the percentage of pixels that can be modified, but the exact percentage of pixels we want to perturb. This slight difference accounts for the imperfection of the substitute network: even if it mimics quite well the oracle, stopping as soon as the image switches according to the substitute will often not yield good results, so it is necessary to force the algorithm to push a little bit further. As a consequence, to evaluate the performance of the attacks, we will use two metrics: the success and transferability rates, for each distortion value. The first one measures the percentage of attacks that have been successful on the substitute NNC, while the second one measures the same percentage but for the oracle. The obtained results are given in Table 7.
We can see in this table that in terms of success rate, the results are compatible with the white-box attack results, meaning that M-WJSMA mostly outperforms NT-TJSMA, which is better than NT-WJSMA which beats NT-JSMA, at least for the F variants. This is somewhat expected, because the attacks as performed on the substitute are merely white-box. However, one can notice that the Z variants of Maximal attacks do not perform well on this substitute network. More interestingly, concerning the transferability of the attacks, one can notice that for the Z attacks, the same hierarchy NT-TJSMA > NT-WJSMA > NT-JSMA is respected, while for the F attacks, NT-WJSMA is overall inferior to all the other attacks, and NT-TJSMA outperforms NT-JSMA, M-JSMA and M-WJSMA.
Overall, NT-TJSMA is the best attack for black-box non-targeted purposes, but the best variant (F or Z) depends on the distortion rate: NT-TJSMA-Z only beats its counterpart NT-TJSMA-F for γ = 4 or 5%. Finally, if one variant were to be chosen, it would be NT-TJSMA-F due to its speed and overall best transferability.

5. Comparisons with Non- L 0 Attacks and Conclusion

5.1. Comparison with Non- L 0 Attacks

Previously, we only compared with L 0 attacks as it makes more sense to consider methods that optimise the same metric. In this section, we compare our NT-TJSMA with a well-known non-targeted L attack which is the Fast Gradient Sign Method (FGSM) [5]. We recall that FGSM attempts to minimise the L norm. It is a very fast method; significantly faster than NT-TJSMA. To this end, we run both attacks by dropping the assumption on the number of modified input features for our attack and by experimenting with different values of the L threshold ε for FGSM where the results of the best threshold are kept. Then, we computed the mean L 1 and L 2 errors for each attack as alternative comparison metrics. Table 8 shows the obtained results.
As it can be seen, NT-TJSMA is always successful for each model, while FGSM is far from reaching 100 % SR. Moreover, NT-TJSMA obtains better L 1 and L 2 scores. Thus our attack outperforms FGSM for the SR and the L 0 , L 1 and L 2 metrics, while FGSM has only the L and speed advantages. Note that for FGSM we considered the best results for different thresholds, while our attack is run one time.

5.2. Conclusions

In this section, we summarise our main findings and also discuss the limitation of our work.
We have introduced WJSMA and TJSMA, new probabilistic adversarial variants of JSMA for targeted and non-targeted misclassification of deep neural network classifiers.
Experiments in the targeted case have demonstrated, after analysing a large amount of images (more than 790,000 images), that our targeted attacks are more efficient and also faster than JSMA. It is important to recall the quite natural derivation of these attacks from a simple and classical log softmax reasoning which has not been noticed before. Our attacks do not beat CW L 0 but have an important speed advantage highlighted in the paper (more than 50 times faster on CIFAR-10). Therefore, for targeted L 0 misclassification, they offer substantial tools to test neural networks in real-time. This fact is supported by our fast implementation provided with the paper.
As a second contribution, we have introduced NT and M variants of WJSMA/TJSMA and have shown that they outperform the previous NT and M versions of JSMA. Through experiments on GTSRB, we noticed that the gap between our attacks and CW in L 0 average is reduced. Moreover, we obtained better success rates, while remaining at least 40 times faster than CW L 0 .
In the NT part of the paper, we did not compare our attacks with the one pixel attack [13]. Indeed, this approach has a high computational cost and we only claim an advantage in speed which is quite evident for us (see also the time evaluation in [12]). Also, we did not provide a comparison with SparseFool [12] an effective NT L 0 attack because of the need to reimplement this attack with TensorFlow. On CIFAR-10, [12] found that crafting an example by SparseFool takes on average 0.34 and 0.69 s on two different neural networks. Our speed performances are very competitive with these values. Indeed, on GTSRB which has many more classes than CIFAR-10, our NT-TJSMA was able to craft an example in near 0.68 s on average (counting only successful images). Thus, regarding SparseFool, we first claim competitive results in speed. Second, our results obtained on LeNet-5 (more than 99.5% on a model similar to [12], see Section 4.1.1) are very close to [12] although we only run the attacks up to a limited maximum number of iterations contrary to [12].
Overall, our results suggest that for adversarial purposes, TJSMA, M-WJSMA and NT-TJSMA should be preferred over the original variants of JSMA, respectively in the case of white-box targeted attacks, white-box non-targeted attacks, and black-box non-targeted attacks. We recall that despite the fact that TJSMA is a more elaborate version of WJSMA, it is hardly compatible with the “Maximal” approach, which in turn proves to be very efficient for non-targeted purposes. For this reason, as we have demonstrated, M-WJSMA is indeed the right choice for this type of attacks. On the other hand, because the “Maximal” approach has not proved to be very efficient on black-box non-targeted attacks, it is the non-targeted version of TJSMA (NT-TJSMA) that is the best in this case.
Finally, we should mention that despite improving JSMA in different ways, like JSMA, our approach is still not scalable to large datasets. This is because of the high computational cost of saliency maps when the dimension of inputs becomes large. Our approach is therefore intended for “small” datasets such as those considered in the paper. Nevertheless, this kind of datasets is very common in real-life applications. Codes are publicly available through the Supplementary Materials.

Supplementary Materials

All our codes are publicly available through the link https://github.com/probabilistic-jsmas/probabilistic-jsmas.

Author Contributions

Conceptualization: T.C., A.L., M.F. and H.H.; Software: T.C., A.L., M.F. and H.H.; Data curation: T.C., A.L. and M.F.; Supervision: H.H.; Writing—original draft: T.C., A.L., M.F. and H.H.; Writing—review and editing: T.C., A.L., M.F. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This work was done in the context of an internship by T.C., A.L. and M.F. supervised by H.H. We thank Gabriel Zeller for his assistance. We are grateful to Wassila Ouerdane and Jean-Philippe Poli at CentraleSupélec for their support. We thank the mesocentre de calcul Fusion, Metz computing center of CentraleSupélec and Stéphane Vialle for providing us effective computing resources. H. Hajri is grateful to Sylvain Lamprier for useful discussions, the scientific direction and the EPI project (Évaluation des Performances de systèmes de décision à base d’Intelligence Artificielle) at IRT SystemX for their support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NNCNeural Network Classifier
JSMAJacobian-based Saliency Map Attack
MJSMAMaximal Jacobian-based Saliency Map Attack
WJSMAWeighted Jacobian-based Saliency Map Attack
TJSMATaylor Jacobian-based Saliency Map Attack
NTNon-Targeted
MMaximal
CWCarlini-Wagner

Appendix A. Architectures of the Deep NNCs

Table A1. Architecture of the used NNC on MNIST (LeNet-5 inspired).
Table A1. Architecture of the used NNC on MNIST (LeNet-5 inspired).
LayerParameters
Input Layersize: ( 28 × 28 )
Conv2Dkernel size: ( 5 × 5 ), 20 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 5 × 5 ), 50 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Flatten
Densesize: 500
ReLu
Densesize: number of classes (10 for MNIST)
Softmax
Table A2. Architecture of the used NNC on CIFAR-10.
Table A2. Architecture of the used NNC on CIFAR-10.
LayerParameters
Input Layersize: ( 32 × 32 )
Conv2Dkernel size: ( 3 × 3 ), 64 kernels, no stride
ReLu
Conv2Dkernel size: ( 3 × 3 ), 128 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 3 × 3 ), 128 kernels, no stride
ReLu
Conv2Dkernel size: ( 3 × 3 ), 256 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 3 × 3 ), 256 kernels, no stride
ReLu
Conv2Dkernel size: ( 3 × 3 ), 512 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 3 × 3 ), 10 kernels, no stride
GlobalAveragePoolingkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Softmax
Table A3. Architecture of the used NNC on GTSRB (AlexNet inspired).
Table A3. Architecture of the used NNC on GTSRB (AlexNet inspired).
LayerParameters
Input Layersize: ( 32 × 32 )
Conv2Dkernel size: ( 5 × 5 ), 64 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 3 × 3 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 5 × 5 ), 64 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 3 × 3 ), stride: ( 2 × 2 )
Flatten
Densesize: 384
ReLu
Densesize: 192
ReLu
Densesize: 43
Softmax
Table A4. Architecture of the used substitute NNC on GTSRB.
Table A4. Architecture of the used substitute NNC on GTSRB.
LayerParameters
Input Layersize: ( 32 × 32 × 3 )
Conv2Dkernel size: ( 3 × 3 ), 16 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 3 × 3 ), 32 kernels, no stride
ReLu
MaxPooling2Dkernel size: ( 2 × 2 ), stride: ( 2 × 2 )
Conv2Dkernel size: ( 3 × 3 ), 64 kernels, no stride
ReLu
Flatten
Densesize: 43
Softmax

Appendix B. Pairwise Dominance

Figure A1. Pairwise dominance on MNIST comparing WJSMA with JSMA (a), TJSMA with JSMA (b) and TJSMA with WJSMA (c). On these charts, “=” corresponds to samples with the same number of iterations by the attacks including when both attacks fail.
Figure A1. Pairwise dominance on MNIST comparing WJSMA with JSMA (a), TJSMA with JSMA (b) and TJSMA with WJSMA (c). On these charts, “=” corresponds to samples with the same number of iterations by the attacks including when both attacks fail.
Make 02 00030 g0a1
Further analysis of the results obtained on MNIST reveals that, even for examples where JSMA is better than WJSMA or TJSMA, on average, less than 10 more components are changed by WJSMA or TJSMA, whereas JSMA changes more than 17 more components on average when it is dominated by WJSMA or TJSMA. A similar gap can be noticed on CIFAR-10.
Figure A2. Pairwise dominance on CIFAR-10 comparing WJSMA with JSMA (a), TJSMA with JSMA (b) and TJSMA with WJSMA (c). “=” has the same significance as before.
Figure A2. Pairwise dominance on CIFAR-10 comparing WJSMA with JSMA (a), TJSMA with JSMA (b) and TJSMA with WJSMA (c). “=” has the same significance as before.
Make 02 00030 g0a2

References

  1. Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust Physical-World Attacks on Deep Learning Visual Classification. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1625–1634. [Google Scholar]
  2. Sitawarin, C.; Bhagoji, A.N.; Mosenia, A.; Chiang, M.; Mittal, P. DARTS: Deceiving Autonomous Cars with Toxic Signs. arXiv 2018, arXiv:1802.06430. [Google Scholar]
  3. Papernot, N.; Song, S.; Mironov, I.; Raghunathan, A.; Talwar, K.; Erlingsson, Ú. Scalable Private Learning with PATE. arXiv 2018, arXiv:1802.08908. [Google Scholar]
  4. Song, L.; Shokri, R.; Mittal, P. Privacy Risks of Securing Machine Learning Models against Adversarial Examples. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, 11–15 November 2019; pp. 241–257. [Google Scholar] [CrossRef] [Green Version]
  5. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2015, arXiv:1412.6572. [Google Scholar]
  6. Kurabin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. arXiv 2017, arXiv:1607.02533. [Google Scholar]
  7. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2017, arXiv:1607.02533. [Google Scholar]
  8. Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Berkay Celik, Z.; Swami, A. The limitations of deep learning in adversarial settings. arXiv 2015, arXiv:1511.07528. [Google Scholar]
  9. Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. arXiv 2015, arXiv:1511.04599. [Google Scholar]
  10. Moosavi-Dezfooli, S.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal Adversarial Perturbations. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 86–94. [Google Scholar] [CrossRef] [Green Version]
  11. Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. arXiv 2017, arXiv:1608.04644. [Google Scholar]
  12. Modas, A.; Moosavi-Dezfooli, S.; Frossard, P. SparseFool: A Few Pixels Make a Big Difference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 9087–9096. [Google Scholar] [CrossRef] [Green Version]
  13. Su, J.; Vargas, D.V.; Sakurai, K. One Pixel Attack for Fooling Deep Neural Networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef] [Green Version]
  14. Papernot, N.; Faghri, F.; Carlini, N.; Goodfellow, I.J.; Feinman, R.; Kurakin, A.; Xie, C.; Sharma, Y.; Brown, T.H.; Roy, A.; et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library. arXiv 2016, arXiv:1610.00768. [Google Scholar]
  15. LeCun, Y.; Cortes, C. MNIST Handwritten Digit Database. 2010. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 10 October 2020).
  16. Erba, A.; Taormina, R.; Galelli, S.; Pogliani, M.; Carminati, M.; Zanero, S.; Tippenhauer, N.O. Real-time Evasion Attacks with Physical Constraints on Deep Learning-based Anomaly Detectors in Industrial Control Systems. arXiv 2019, arXiv:1907.07487. [Google Scholar]
  17. Gong, Y.; Li, B.; Poellabauer, C.; Shi, Y. Real-Time Adversarial Attacks. arXiv 2019, arXiv:1905.13399. [Google Scholar]
  18. Lin, J.; Dzeparoska, K.; Zhang, S.Q.; Leon-Garcia, A.; Papernot, N. On the Robustness of Cooperative Multi-Agent Reinforcement Learning. arXiv 2020, arXiv:2003.03722. [Google Scholar]
  19. Parisi, A. Hands-On Artificial Intelligence for Cybersecurity; Packt Publishing: Birmingham, UK, 2019. [Google Scholar]
  20. Chio, C.; Freeman, D. Machine Learning and Security; Oreilly: Newton, MA, USA, 2018. [Google Scholar]
  21. Sethi, K.; Edupuganti, S.; Kumar, R.; Bera, P.; Madhav, Y. A context-aware robust intrusion detection system: A reinforcement learning-based approach. Int. J. Inf. Secur. 2019, 19, 657–678. [Google Scholar] [CrossRef]
  22. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  23. Krizhevsky, A.; Nair, V.; Hinton, G. CIFAR-10 (Canadian Institute for Advanced Research). Available online: http://www.cs.toronto.edu/~kriz/cifar.html (accessed on 10 October 2020).
  24. Wiyatno, R.; Xu, A. Maximal Jacobian-based Saliency Map Attack. arXiv 2018, arXiv:1808.07945. [Google Scholar]
  25. Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 2012, 32, 323–332. [Google Scholar] [CrossRef]
  26. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  27. Papernot, N.; McDaniel, P.D.; Goodfellow, I.J.; Jha, S.; Celik, Z.B.; Swami, A. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples. arXiv 2016, arXiv:1602.02697. [Google Scholar]
  28. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef]
Figure 1. Evolution of the (a) origin class and (b) target class probabilities until the target class is reached for JSMA, WJSMA and TJSMA changing the image of a one into a five.
Figure 1. Evolution of the (a) origin class and (b) target class probabilities until the target class is reached for JSMA, WJSMA and TJSMA changing the image of a one into a five.
Make 02 00030 g001
Figure 2. Examples of images from MNIST.
Figure 2. Examples of images from MNIST.
Make 02 00030 g002
Figure 3. Examples of images from CIFAR-10.
Figure 3. Examples of images from CIFAR-10.
Make 02 00030 g003
Figure 4. Distribution of the (strict) dominance of JSMA, WJSMA and TJSMA over the MNIST (a) and CIFAR10 (b) datasets (training and test sets included).
Figure 4. Distribution of the (strict) dominance of JSMA, WJSMA and TJSMA over the MNIST (a) and CIFAR10 (b) datasets (training and test sets included).
Make 02 00030 g004
Figure 5. Examples of images from GTSRB.
Figure 5. Examples of images from GTSRB.
Make 02 00030 g005
Table 1. Comparison between JSMA, WJSMA and TJSMA on MNIST.
Table 1. Comparison between JSMA, WJSMA and TJSMA on MNIST.
MetricJSMAWJSMATJSMA
Targeted (Training dataset: Nb of well predicted images = 60,000)
Success rate87.68%97.14%98.66%
Mean L 0 distance on successful samples44.3437.8635.22
Targeted (Test dataset: Nb of well predicted images = 9949))
Success rate87.34%96.98%98.68%
Mean L 0 distance on successful samples44.6338.1035.50
Table 2. Comparison between JSMA, WJSMA and TJSMA on CIFAR-10.
Table 2. Comparison between JSMA, WJSMA and TJSMA on CIFAR-10.
MetricJSMAWJSMATJSMA
Targeted (Training dataset: Nb of well predicted images = 9995)
Success rate86.1795.91%97.40%
Mean L 0 distance on successful samples4738.5436.86
Targeted (Test dataset: Nb of well predicted images = 8381))
Success rate84.9194.99%96.96%
Mean L 0 distance on successful samples46.1338.8237.45
Table 3. Time comparison between JSMA, WJSMA and TJSMA.
Table 3. Time comparison between JSMA, WJSMA and TJSMA.
AttackJSMAWJSMATJSMA
Time (second)396430922797
Table 4. Metrics (1) and (2) on JSMA, WJSMA and TJSMA augmented sets.
Table 4. Metrics (1) and (2) on JSMA, WJSMA and TJSMA augmented sets.
MetricJSMAWJSMATJSMA
Model trained over JSMA augmented set (9940 well predicted samples)
Success rate77.94%84.79%85.08%
Mean L 0 distance on successful samples54.4852.6652.83
Model trained over WJSMA augmented set (9936 well predicted samples)
Success rate77.61%90.05%92.01%
Mean L 0 distance on successful samples56.2952.7252.18
Model trained over TJSMA augmented set (9991 well predicted samples)
Success rate76.42%86.18%87.36%
Mean L 0 distance on successful samples54.2654.2054.49
Table 5. Results for L 0 CW on CIFAR-10.
Table 5. Results for L 0 CW on CIFAR-10.
Success RateMean L 0 DistanceTime
99.89%24.97On average more than one
hour and a half to generate 9
adversarial samples run one
by one on GPU
Table 6. Comparison of the performances between the NT attacks over the 1000 first test images of GTSRB.
Table 6. Comparison of the performances between the NT attacks over the 1000 first test images of GTSRB.
Attack
(U = JSMA)
Success Rate L 0 AverageTime (s)
NT-U91.35%20.663604
NT-WU (ours)95.31%17.63760
NT-TU (ours)96.87%15.86660
M-U98.44%18.997470
M-WU (ours)99.37%15.526302
CW L 0 97.81%13.56260,751
Table 7. Success and transferability rates (SR & TR) in % for the black-box NT attacks on the 1000 to 2000 test samples of GTSRB for a distortion γ varying from 1% to 5%.
Table 7. Success and transferability rates (SR & TR) in % for the black-box NT attacks on the 1000 to 2000 test samples of GTSRB for a distortion γ varying from 1% to 5%.
γ MetricNT-Attack-Z (X = JSMA)NT-Attack-F (X = JSMA)
XWXTXM-XM-WXXWXTXM-XM-WX
1%SR65.975.478.63740.775.377.180.17777.4
TR4346.548.231.834.248.144.850.345.443.4
2%SR81.289.493.155.456.888.691.394.594.995.3
TR56.659.263.450.149.562.15864.156.558.8
3%SR88.595.597.768.967.494.495.898.69999.2
TR65.167.270.258.859.269.364.170.764.565.3
4%SR92.997.199.277.774.797.498.399.5100100
TR69.27175.368.565.173.967.574.869.969.5
5%SR9698.799.983.579.598.599100100100
TR71.973.677.972.768.476.371.177.272.372.1
Table 8. Comparison between NT-TJSMA and FGSM.
Table 8. Comparison between NT-TJSMA and FGSM.
MetricMNISTCIFAR10GTSRB
Performances of NT-TJSMA
Success rate100%100%100%
Mean L 1 distance13.5813.3315.01
Mean L 2 distance3.492.932.88
Performances of FGSM
Success rate93.2% ( ε = 0.75 )88.2% ( ε = 0.05 )99.5% ( ε = 0.9 )
Mean L 1 distance227.21150.851522.31
Mean L 2 distance12.802.7432.54
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Combey, T.; Loison, A.; Faucher, M.; Hajri, H. Probabilistic Jacobian-Based Saliency Maps Attacks. Mach. Learn. Knowl. Extr. 2020, 2, 558-578. https://doi.org/10.3390/make2040030

AMA Style

Combey T, Loison A, Faucher M, Hajri H. Probabilistic Jacobian-Based Saliency Maps Attacks. Machine Learning and Knowledge Extraction. 2020; 2(4):558-578. https://doi.org/10.3390/make2040030

Chicago/Turabian Style

Combey, Théo, António Loison, Maxime Faucher, and Hatem Hajri. 2020. "Probabilistic Jacobian-Based Saliency Maps Attacks" Machine Learning and Knowledge Extraction 2, no. 4: 558-578. https://doi.org/10.3390/make2040030

APA Style

Combey, T., Loison, A., Faucher, M., & Hajri, H. (2020). Probabilistic Jacobian-Based Saliency Maps Attacks. Machine Learning and Knowledge Extraction, 2(4), 558-578. https://doi.org/10.3390/make2040030

Article Metrics

Back to TopTop