A Jigsaw Puzzle Solver-Based Attack on Image Encryption Using Vision Transformer for Privacy-Preserving DNNs
Round 1
Reviewer 1 Report
This paper proposes an attack to reveal visual information directly from ciphertext images. The proposed attack is based on a jigsaw puzzle solver. Although promising results are attained, there are questions/concerns about the paper:
(a) Some of the terms need clearer definitions. For example, is privacy-preserving DNNs mentioned in this work refer to networks trained for the purpose of image classification when the images are encrypted?
(b) The operation "block scrambling" mentioned on line 55 of Page 2 is not clear.
(c) Can the proposed attack reveal visual information from non-block-based encrypted images?
(d) Line 138 of Page 4 - Should it be 2^8 instead of 2^555?
(e) Line 204 of Page 6 - what does PE stand for? Pixel encryption?
(f) What is the main novelty of this work? And what has contributed to the improvement in SSIM (Table 2)? Why the values for [11], [12], [13] and [27] remain low for the proposed method?
(g) The experiments were conducted using mostly the authors' previous work (4 out of 5). Is the proposed attack feasible for other block-based image encryption methods (both DNN and non-DNN cases).
(h) The descriptions of the compared methods can be improved to include more information.
(i) More insights should be provided - one question though. From Table 2, can we conclude that PE [12] is still a relatively robust (safe) method? The same goes for ELE [13].
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
In this paper, the authors propose a novel jigsaw puzzle solver-based attack on image encryption for privacy-preserving deep neural networks (DNNs). This work presents certain novelty and significance. It has a better performance than other attacking methods for some types of encryption schemes. The proposed scheme is verified by experimental results. This paper is clearly written and well organized. The reviewer thinks this paper can be accepted for publication after some minor revisions. The revision suggestions are given below.
(1)In Table 2, the recovered image quality is compared. How about the difference in computational cost?
(2)The proposed scheme has the best performance for certain types of encryption schemes but does not for other schemes. More explanations may be clearly given.
(3)In Fig. 7, the groundtruth images may be given. In addition, more than one attacking examples may be visually shown.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The article concerns a new method of cryptanalysis of image ciphers - Jigsaw Puzzle Solver-based attack - and its use in Vision Transformer for Privacy-preserving DNNs.
I found the article very interesting and well-written. The proposed method of cryptanalysis may be an interesting and useful tool in the field of image encryption. However, I do have a point about the novelty of the manuscript. The authors in several other works deal with very similar topics. How does the proposed approach differ from, for example, doi: 10.1117/12.2665805 ("A jigsaw puzzle solver-based attack on block-wise image encryption for privacy-preserving DNNs" by the same authors).
Here are some minor notes:
1) It is worth indicating the organization of the article in the Introduction
2) Figures and Tables should appear close to the commentary text, e.g. Figure 2, 3, Table 2.
3) The caption under Figure 5 is very brief and not informative.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors have addressed most comments satisfactorily. The paper can now be accepted for publication.
Reviewer 3 Report
Thanks to the authors for their answers. I have no further comments. However, for the future, I would like to point out the clear marking of the corrected/new content in the text - it significantly facilitates reviewing.