A New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image
Abstract
:1. Introduction
- Inspired by Patch Dropout, we propose a Multi-PatchDrop (MPD) mechanism specifically for a Transformer-based subject-sensitive hashing algorithm to improve the algorithm’s robustness;
- We have made deep improvements to the Swin-Unet model to make it more suitable for subject-sensitive hashing;
- A new subject-sensitive hashing algorithm based on Multi-PatchDrop and Swin-Unet is built.
2. Preliminaries
2.1. Subject-Sensitive Hashing
2.2. Transformers and PatchDropout
3. The Proposed Method
3.1. Multi-PatchDrop
3.2. Improved Swin-Unet Based on Multi-PatchDrop
- (1)
- The bottleneck of the original Swin-Unet only contains two Swin Transformer modules. In our improved Swin-Unet, we decompose it into two blocks, each containing two Swin Transformer modules. This is because the low-level Transformer is more important to the tampering sensitivity of subject-sensitive hashing, and it is necessary to increase the Swin Transformer block;
- (2)
- A Layer Normalization operation is added between the encoder and decoder to ensure the stable distribution of data features and accelerate the convergence speed of the model;
- (3)
- The biggest difference is that the proposed Multi-Patchout mechanism is integrated into our improved Swin-Unet, while there is no use of Patch Dropout in the original Swin-Unet.
3.3. Overview of Our Proposed Subject-Sensitive Hashing Algorithm
4. Experiments and Analysis
4.1. Datasets and Training Details
- (1)
- (2)
- Training dataset based on the Inria Aerial dataset [38]. Each image’s size in the Inria Aerial dataset is 5000 × 5000 pixels. To meet the needs of extracting subject-sensitive features, we obtained 11,490 images by cropping these images and then added 30 sets of hand-drawn robust edge images to obtain a dataset with a total of 11,520 training samples.
4.2. Instances of Integrity Authentication
4.3. Algorithms’ Robustness Testing
- (1)
- Whether the model is trained based on the Inria dataset or WHU dataset, our algorithm’s robustness to JPEG compression is the best among all comparison algorithms, especially at lower thresholds. We used two datasets to train the model separately, avoiding the chance of a single dataset;
- (2)
- Compared with algorithms based on Transformer models such as Swin-Unet and STDU-net, our algorithm has been greatly improved, which indicates that the proposed Multi-PatchDrop has a significant effect in improving the robustness of subject-sensitive hashing;
- (3)
- With the increase of the threshold T, each algorithm’s robustness would be enhanced, but the higher threshold reduces the tampering sensitivity of an algorithm.
4.4. Algorithms’ Tampering Sensitivity Testing
5. Discussion
5.1. Comprehensive Evaluation of Algorithm Performance
5.1.1. Robustness
5.1.2. Tampering Sensitivity
5.1.3. Digestibility
5.1.4. Security
- The complexity of the Transformer network is much greater than that of CNN, and the interpretability is more difficult than that of CNN, which makes the security of the Transformer-based algorithm stronger than that of the CNN-based algorithm.
- Since the Multi-PatchDrop mechanism makes our improved Swin-Unet randomly drop patches during the training process, increasing the difficulty of reversely obtaining HRRS image content from the hash sequence, our algorithm is more secure than the Swin-Unet-based algorithm.
5.2. Impact of Multi-PatchDrop on Algorithm’s Performance
6. Conclusions and Future Works
- (1)
- Multi-PatchDrop can improve the comprehensive performance of Transformer-based subject-sensitive hashing, especially robustness;
- (2)
- Compared with the original Swin-Unet, our improved Swin-Unet based on Multi-PatchDrop is more fit to implement a subject-sensitive hashing algorithm of HRRS images;
- (3)
- Different training datasets have a certain impact on the improved Swin-Unet based on Multi-PatchDrop, but our algorithm performs better than existing methods under different training datasets.
- (1)
- Study the adaptive mechanism for determining patch dropout value;
- (2)
- Construct a Transformer model based on an adaptive patch dropout mechanism for subject-sensitive hashing;
- (3)
- Explore the Patch dropout mechanism of the Transformer model for the authentication of multispectral images.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, X.; Lei, L.; Kuang, G. Multilevel Adaptive-Scale Context Aggregating Network for Semantic Segmentation in High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6003805. [Google Scholar] [CrossRef]
- Han, R.; Fan, X.; Liu, J. EUNet: Edge-UNet for Accurate Building Extraction and Edge Emphasis in Gaofen-7 Images. Remote Sens. 2024, 16, 2397. [Google Scholar] [CrossRef]
- Ouyang, X.; Xu, Y.; Mao, Y.; Liu, Y.; Wang, Z.; Yan, Y. Blockchain-Assisted Verifiable and Secure Remote Sensing Image Retrieval in Cloud Environment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1378–1389. [Google Scholar] [CrossRef]
- Islam, K.A.; Wu, H.; Xin, C.; Ning, R.; Zhu, L.; Li, J. Sub-Band Backdoor Attack in Remote Sensing Imagery. Algorithms 2024, 17, 182. [Google Scholar] [CrossRef]
- Ren, N.; Wang, H.; Chen, Z.; Zhu, C.; Gu, J. A Multilevel Digital Watermarking Protocol for Vector Geographic Data Based on Blockchain. J. Geovisualization Spat. Anal. 2023, 7, 31. [Google Scholar] [CrossRef]
- Ding, K.; Zeng, Y.; Wang, Y.; Lv, D.; Yan, X. AGIM-Net Based Subject-Sensitive Hashing Algorithm for Integrity Authentication of HRRS Images. Geocarto Int. 2023, 38, 2168071. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern. Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Marjani, M.; Mahdianpari, M.; Mohammadimanesh, F.; Gill, E.W. CVTNet: A Fusion of Convolutional Neural Networks and Vision Transformer for Wetland Mapping Using Sentinel-1 and Sentinel-2 Satellite Data. Remote Sens. 2024, 16, 2427. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 200. [Google Scholar] [CrossRef]
- Ding, K.; Chen, S.; Zeng, Y.; Wang, Y.; Yan, X. Transformer-Based Subject-Sensitive Hashing for Integrity Authentication of High-Resolution Remote Sensing (HRRS) Images. Appl. Sci. 2023, 13, 1815. [Google Scholar] [CrossRef]
- Liu, Y.; Matsoukas, C.; Strand, F.; Azizpour, H.; Smith, K. Patch Dropout: Economizing Vision Transformers Using Patch Dropout. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 3942–3951. [Google Scholar]
- Han, J.; Li, P.; Tao, Y.; Ren, P. Encrypting Hashing Against Localization. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5607414. [Google Scholar] [CrossRef]
- Qin, C.; Liu, E.; Feng, G.; Zhang, X. Perceptual Image Hashing for Content Authentication Based on Convolutional Neural Network With Multiple Constraint. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4523–4537. [Google Scholar] [CrossRef]
- Samanta, P.; Jain, S. Analysis of Perceptual Hashing Algorithms in Image Manipulation Detection. Procedia Comput. Sci. 2021, 185, 203–212. [Google Scholar] [CrossRef]
- Lv, Y.; Wang, C.; Yuan, W.; Qian, X.; Yang, W.; Zhao, W. Transformer-Based Distillation Hash Learning for Image Retrieval. Electronics 2022, 11, 2810. [Google Scholar] [CrossRef]
- Huang, Z.; Liu, S. Perceptual Image Hashing With Texture and Invariant Vector Distance for Copy Detection. IEEE Trans. Multimedia 2021, 23, 1516–1529. [Google Scholar] [CrossRef]
- Wang, X.; Pang, K.; Zhou, X.; Zhou, Y.; Li, L.; Xue, J. A Visual Model-Based Perceptual Image Hash for Content Authentication. IEEE Trans. Inf. Forensics Secur. 2015, 7, 1336–1349. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Girardi, A.M.; Cardell, E.A.; Bird, S.P. Artificial Intelligence in the Interpretation of Video fluoroscopic Swallow Studies: Implications and Advances for Speech–Language Pathologists. Big Data Cogn. Comput. 2023, 7, 178. [Google Scholar] [CrossRef]
- Zhang, K.; Zhao, K.; Tian, Y. Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning. Mathematics 2024, 12, 2200. [Google Scholar] [CrossRef]
- Liu, Q.; Wang, X. Bidirectional Feature Fusion and Enhanced Alignment Based Multimodal Semantic Segmentation for Remote Sensing Images. Remote Sens. 2024, 16, 2289. [Google Scholar] [CrossRef]
- Zhang, G.; Hong, X.; Liu, Y.; Qian, Y.; Cai, X. Video Colorization Based on Variational Autoencoder. Electronics 2024, 13, 2412. [Google Scholar] [CrossRef]
- Wang, X.; Guo, Z.; Feng, R. A CNN- and Transformer-Based Dual-Branch Network for Change Detection with Cross-Layer Feature Fusion and Edge Constraints. Remote Sens. 2024, 16, 2573. [Google Scholar] [CrossRef]
- Qin, Y.; Wang, J.; Cao, S.; Zhu, M.; Sun, J.; Hao, Z.; Jiang, X. SRBPSwin: Single-Image Super-Resolution for Remote Sensing Images Using a Global Residual Multi-Attention Hybrid Back-Projection Network Based on the Swin Transformer. Remote Sens. 2024, 16, 2252. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Zhu, X.; Huang, X.; Cao, W.; Yang, X.; Zhou, Y.; Wang, S. Road Extraction from Remote Sensing Imagery with Spatial Attention Based on Swin Transformer. Remote Sens. 2024, 16, 1183. [Google Scholar] [CrossRef]
- Chen, X.; Pan, H.; Liu, J. SwinDefNet: A Novel Surface Water Mapping Model in Mountain and Cloudy Regions Based on Sentinel-2 Imagery. Electronics 2024, 13, 2870. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Adiga, V.; Sivaswamy, J. FPD-M-net: Fingerprint Image Denoising and Inpainting Using M-Net Based Convolutional Neural Networks. In Inpainting and Denoising Challenges; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Ding, K.; Chen, S.; Zeng, Y.; Liu, Y.; Xu, B.; Wang, Y. SDTU-Net: Stepwise-Drop and Transformer-Based U-Net for Subject-Sensitive Hashing of HRRS Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3836–3849. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Zhang, C.; Jiang, W.; Zhang, Y.; Wang, W.; Zhao, Q.; Wang, C. Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408820. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Ding, K.; Zhu, C.; Lu, F. An adaptive grid partition based perceptual hash algorithm for remote sensing image authentication. Wuhan Daxue Xuebao 2015, 40, 716–720. [Google Scholar]
- Kokila, S.; Jayachandran, A. Hybrid Behrens-Fisher- and Gray Contrast–Based Feature Point Selection for Building Detection from Satellite Images. J. Geovisualization Spat. Anal. 2023, 7, 8. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S. Building extraction via convolutional neural networks from an open remote sensing building dataset. Acta Geod. Cartogr. Sin. 2019, 48, 448–459. [Google Scholar]
- Emmanuel, M.; Yuliya, T.; Guillaume, C.; Pierre, A. Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium(IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 3226–3229. [Google Scholar]
- Ibtehaz, N.; Rahman, M. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Net. 2020, 121, 74–87. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Zheng, S.; Duan, C.; Su, J.; Zhang, C. Multistage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8009205. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Xu, D.; Chen, S.; Zhu, C.; Li, H.; Hu, L.; Ren, N. Deep Subject-Sensitive Hashing Network for High-Resolution Remote Sensing Image Integrity Authentication. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6010705. [Google Scholar] [CrossRef]
- Deng, S.; Zhan, Y.; Xiao, D.; Li, Y. Analysis and improvement of a hash-based image encryption algorithm. Commun. Nonlinear Sci. Numer. Simul. 2011, 16, 3269–3278. [Google Scholar] [CrossRef]
Figure 4b | Figure 4c | Figure 4d | Figure 4e | Figure 4f | Figure 4g | Figure 4h | |
---|---|---|---|---|---|---|---|
JPEG compression | Format conversion | Watermark embedding | Subject-unrelated tampering | Subject-related tampering 1 | Subject-related tampering 2 | Random tampering | |
U-net-based algorithm | 0.234 | 0 | 0 | 0.507 | 2.968 | 2.031 | 2.070 |
M-net-based algorithm | 0.195 | 0 | 0.156 | 0.507 | 2.382 | 1.601 | 2.421 |
AGIM-net-based algorithm | 0 | 0 | 0.078 | 1.523 | 2.890 | 2.539 | 3.125 |
MultiResUnet based algorithm | 0 | 0 | 0.078 | 0.664 | 1.718 | 1.289 | 2.070 |
Attention U-Net-based algorithm | 0 | 0 | 0 | 0.429 | 1.835 | 1.445 | 2.343 |
Attention ResU-Net-based algorithm | 0.234 | 0 | 0 | 0.859 | 1.641 | 1.445 | 2.539 |
STDU-net-based algorithm | 0 | 0 | 0 | 0.546 | 2.734 | 0.976 | 1.835 |
TransUnet-based algorithm | 0.156 | 0 | 0.039 | 0.703 | 1.992 | 1.914 | 2.265 |
Swin-Unet-based algorithm | 0.039 | 0 | 0 | 0.585 | 1.289 | 1.367 | 1.055 |
Our algorithm | 0 | 0 | 0 | 0.625 | 1.723 | 1.602 | 1.132 |
PDe | 0.0 | 0.1 | 0.2 | |
---|---|---|---|---|
PEn | ||||
0.0 | 1.2% | 3.0% | 5.2% | |
0.2 | 1.5% | 0.8% | 3.5% | |
0.4 | 0.5% | 0.1% | 3.3% | |
0.6 | 0.1% | 0.4% | 2.9% |
PDe | 0.0 | 0.1 | 0.2 | |
---|---|---|---|---|
PEn | ||||
0.0 | 90.1% | 96.9% | 98.0% | |
0.2 | 91.7% | 95.7% | 98.8% | |
0.4 | 93.4% | 98.1% | 98.4% | |
0.6 | 94.2% | 98.0% | 98.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, K.; Wang, Y.; Wang, C.; Ma, J. A New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image. ISPRS Int. J. Geo-Inf. 2024, 13, 336. https://doi.org/10.3390/ijgi13090336
Ding K, Wang Y, Wang C, Ma J. A New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image. ISPRS International Journal of Geo-Information. 2024; 13(9):336. https://doi.org/10.3390/ijgi13090336
Chicago/Turabian StyleDing, Kaimeng, Yingying Wang, Chishe Wang, and Ji Ma. 2024. "A New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image" ISPRS International Journal of Geo-Information 13, no. 9: 336. https://doi.org/10.3390/ijgi13090336
APA StyleDing, K., Wang, Y., Wang, C., & Ma, J. (2024). A New Subject-Sensitive Hashing Algorithm Based on Multi-PatchDrop and Swin-Unet for the Integrity Authentication of HRRS Image. ISPRS International Journal of Geo-Information, 13(9), 336. https://doi.org/10.3390/ijgi13090336