Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment
Abstract
:1. Introduction
2. Datasets and Algorithms
3. Methods
3.1. VCS
3.2. Sim-VCS
3.3. Dilated-VCS
3.4. AE-LSKA
4. Results
4.1. Performance of VCS and Variants
4.2. Experimental Analysis of AE-LSKA
5. Discussion
5.1. Analysis of Color Shift Augmentation Techniques
5.2. Effectiveness of AE-LSKA
5.3. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Setiawan, A.B.; Sastrosubroto, A.S. Strengthening the Security of Critical Data in Cyberspace, a Policy Review. In Proceedings of the 2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA), Tangerang, Indonesia, 3–5 October 2016; pp. 185–190. [Google Scholar]
- von Ahn, L.; Blum, M.; Hopper, N.J.; Langford, J. CAPTCHA: Using Hard AI Problems for Security. In Advances in Cryptology—EUROCRYPT 2003; Biham, E., Ed.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 294–311. [Google Scholar]
- Yan, J.; El Ahmad, A.S. Usability of CAPTCHAs or Usability Issues in CAPTCHA Design. In Proceedings of the 4th Symposium on Usable Privacy and Security—SOUPS ’08, Pittsburgh, PA, USA, 23–25 July 2008; ACM Press: Pittsburgh, PA, USA, 2008; p. 44. [Google Scholar]
- Alsuhibany, S.A. Evaluating the Usability of Optimizing Text-Based CAPTCHA Generation. Int. J. Adv. Comput. Sci. Appl. IJACSA 2016, 7, 164–169. [Google Scholar] [CrossRef]
- Wang, J.; Qin, J.; Xiang, X.; Tan, Y.; Pan, N. CAPTCHA Recognition Based on Deep Convolutional Neural Network. Math. Biosci. Eng. 2019, 16, 5851–5861. [Google Scholar] [CrossRef] [PubMed]
- Guerar, M.; Verderame, L.; Migliardi, M.; Palmieri, F.; Merlo, A. Gotta CAPTCHA ’Em All: A Survey of 20 Years of the Human-or-Computer Dilemma. ACM Comput. Surv. 2022, 54, 1–33. [Google Scholar] [CrossRef]
- Chellapilla, K.; Larson, K.; Simard, P.Y.; Czerwinski, M. Building Segmentation Based Human-Friendly Human Interaction Proofs (HIPs). In Human Interactive Proofs; Baird, H.S., Lopresti, D.P., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–26. [Google Scholar]
- Zhang, J.; Sang, J.; Xu, K.; Wu, S.; Zhao, X.; Sun, Y.; Hu, Y.; Yu, J. Robust CAPTCHAs Towards Malicious OCR. IEEE Trans. Multimed. 2021, 23, 2575–2587. [Google Scholar] [CrossRef]
- Wang, P.; Gao, H.; Guo, X.; Xiao, C.; Qi, F.; Yan, Z. An Experimental Investigation of Text-Based CAPTCHA Attacks and Their Robustness. ACM Comput. Surv. 2023, 55, 196:1–196:38. [Google Scholar] [CrossRef]
- Xing, W.; Mohd, M.R.S.; Johari, J.; Ruslan, F.A. A Review on Text-Based CAPTCHA Breaking Based on Deep Learning Methods. In Proceedings of the 2023 International Conference on Computer Engineering and Distance Learning (CEDL), Shanghai, China, 29 June–1 June 2023; pp. 171–175. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Walia, J.S.; Odugoudar, A. Vulnerability Analysis of Captcha Using Deep Learning. In Proceedings of the 2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG), online, 8 December 2023; pp. 1–7. [Google Scholar]
- Wang, Z.; Wang, P.; Liu, K.; Wang, P.; Fu, Y.; Lu, C.-T.; Aggarwal, C.C.; Pei, J.; Zhou, Y. A Comprehensive Survey on Data Augmentation. arXiv 2024, arXiv:2405.09591. [Google Scholar]
- Bursztein, E.; Martin, M.; Mitchell, J.C. Text-Based CAPTCHA Strengths and Weaknesses. In Proceedings of the Proceedings of the 18th Acm Conference on Computer & Communications Security (CCS 11), Chicago, IL, USA, 17–21 October 2011; Assoc Computing Machinery: New York, NY, USA, 2011; pp. 125–137. [Google Scholar]
- Mocanu, I.G.; Yang, Z.; Belle, V. Breaking CAPTCHA with Capsule Networks. Neural Netw. 2022, 154, 246–254. [Google Scholar] [CrossRef] [PubMed]
- Shi, Y.; Liu, X.; Han, S.; Lu, Y.; Zhang, X. A Transformer Network for CAPTCHA Recognition. In Proceedings of the 2021 2nd International Conference on Artificial Intelligence and Information Systems, Chongqing, China, 28–30 May 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1–5. [Google Scholar]
- Qing, K.; Zhang, R. An Efficient ConvNet for Text-Based CAPTCHA Recognition. In Proceedings of the 2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Penang, Malaysia, 22–25 November 2022; pp. 1–4. [Google Scholar]
- Noury, Z.; Rezaei, M. Deep-CAPTCHA: A Deep Learning Based CAPTCHA Solver for Vulnerability Assessment. arXiv 2020, arXiv:2006.08296. [Google Scholar] [CrossRef]
- Wan, X.; Johari, J.; Ruslan, F.A. Adaptive CAPTCHA: A CRNN-Based Text CAPTCHA Solver with Adaptive Fusion Filter Networks. Appl. Sci. 2024, 14, 5016. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Wang, X.; Yu, J. Learning to Cartoonize Using White-Box Cartoon Representations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 8087–8096. [Google Scholar]
- Ishkov, D.O.; Terekhov, V.I. Text CAPTCHA Traversal with ConvNets: Impact of Color Channels. In Proceedings of the 2022 4th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), Moscow, Russia, 17–19 March 2022; pp. 1–5. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
- Zhang, Q.-L.; Yang, Y.-B. SA-Net: Shuffle Attention for Deep Convolutional Neural Networks. In Proceedings of the ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Goyal, A.; Bochkovskiy, A.; Deng, J.; Koltun, V. Non-Deep Networks. Adv. Neural Inf. Process. Syst. 2022, 35, 6789–6801. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN. Expert. Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Guo, M.-H.; Lu, C.-Z.; Liu, Z.-N.; Cheng, M.-M.; Hu, S.-M. Visual Attention Network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends® Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
Dataset/Model | Description |
---|---|
Deep-CAPTCHA | Weak CAPTCHA recognizer |
Adaptive-CAPTCHA | Strong CAPTCHA recognizer |
P-CAPTCHA | Simple CAPTCHA dataset |
M-CAPTCHA | Complex CAPTCHA dataset |
AE-LSKA | New Feature extraction module |
VCS, Sim-VCS, Dilated-VCS | New color shift methods |
Type | Input | Dilated Rate | Stride | Padding | PARAMs | FLOPs | |
---|---|---|---|---|---|---|---|
Dilated-VCS | (64, 192) | (4, 4) | (7, 21) | (22, 64) | (1, 0) | 450 | 5514 |
Dilated-VCS | (64, 192) | (8, 8) | (3, 9) | (22, 64) | (1, 0) | 1314 | 21,066 |
Dilated-VCS | (64, 192) | (22, 22) | (1, 3) | (22, 64) | (1, 0) | 8874 | 156,978 |
VCS | (64, 192) | (64, 64) | (1, 1) | (64, 64) | (0, 0) | 162 | 73,728 |
/ | Conv 5 × 5 | AE-LSKA5 | AE-LSKA7 | AE-LSKA11 | AE-LSKA15 |
---|---|---|---|---|---|
Kernel (LSKA) | / | ||||
Dilation | / | 2 | 2 | 2 | 3 |
PARAMs | |||||
FLOPs |
Type | Name | Specification | Version |
---|---|---|---|
Hardware | Graphics Processing Unit (GPU) | NVIDIA GeForce RTX 3060 12 GB | - |
Central Processing Unit (CPU) | Intel(R) Core (TM) i5-8265U CPU @ 1.60 GHz 1.80 GHz | - | |
Software | PyTorch | - | 2.2 |
Python | - | 3.11.9 | |
Cuda | - | 12.2.140 | |
Cudnn | - | 9.1.0 | |
Dataset | M-CAPTCHA | 5000 images | - |
P-CAPTCHA | 3000 images | ||
Model | -Deep-CAPTCHA | - | - |
Adaptive-CAPTCHA | - | ||
Evaluation | AASR, Loss, PARAMs, FLOPs | - | - |
Setup | Dataset splitting | 80% for training and 20% for validation | - |
Epochs | 130 | - | |
Batch size | 256 | - | |
Metrics | AASR, PARAMs, FLOPs, Loss | - |
Algorithm | Dilated Kernel | Dropout | Model | Train|Val AASR (%) P-CAPTCHA | Train|Val AASR (%) M-CAPTCHA |
---|---|---|---|---|---|
Dilated-VCS | 4 | 0.0 | Deep-CAPTCHA | 96.9|37.0 | 95.0|26.2 |
4 | 0.3 | 97.2|36.1 | 95.6|26.6 | ||
8 | 0.0 | 96.4|37.8 | 94.6|27.2 | ||
8 | 0.3 | 97.3|37.9 | 94.8|25.7 | ||
22 | 0.0 | 97.2|36.8 | 93.3|25.2 | ||
22 | 0.3 | 97.0|39.1 | 93.7|24.0 | ||
4 | 0.0 | Adaptive-CAPTCHA | 99.9|73.2 | 99.9|69.2 | |
4 | 0.3 | 99.9|73.1 | 99.9|72.0 | ||
8 | 0.0 | 99.9|71.6 | 99.9|70.1 | ||
8 | 0.3 | 99.9|74.0 | 99.9|69.0 | ||
22 | 0.0 | 99.9|67.6 | 99.9|57.6 | ||
22 | 0.3 | 99.9|68.2 | 99.9|55.6 | ||
VCS | - | - | Deep-CAPTCHA | 94.0|35.8 | 93.7|26.6 |
- | - | Adaptive-CAPTCHA | 99.9|74.5 | 99.9|78.0 | |
Sim-VCS | - | - | Deep-CAPTCHA | 95.6|35.6 | 93.4|26.2 |
- | - | Adaptive-CAPTCHA | 99.9|72.4 | 99.9|76.9 |
Algorithm | Model | PARAMs | FLOPs | AASR (%) |
---|---|---|---|---|
Conv (Baseline) | Deep-CAPTCHA | 6.46 M | 212.78 M | 28.5 |
AE | 6.40 M | 146.13 M | 28.9 | |
AE + LSKA (k = 7) | 6.41 M | 176.02 M | 28.4 | |
AE + LSKA (k = 11) | 6.41 M | 178.38 M | 32.9 | |
AE + CBAM (ratio = 8) | 6.40 M | 148.37 M | 25.0 | |
AE + CBAM (ratio = 16) | 6.40 M | 148.37 M | 31.2 | |
AE + ECA (ratio = 2) | 6.40 M | 146.72 M | 26.1 | |
AE + ECA (ratio = 4) | 6.40 M | 146.72 M | 21.1 | |
AE + GC (ATT + ADD) | 6.41 M | 146.78 M | 28.7 | |
AE + GC (AVG + MUL) | 6.41 M | 146.73 M | 23.8 | |
AE + SA (groups = 8) | 6.40 M | 146.43 M | 25.4 | |
AE + SA (groups = 16) | 6.40 M | 146.45 M | 22.9 | |
AE + SE (ratio = 8) | 6.40 M | 146.72 M | 20.9 | |
AE + SE (ratio = 16) | 6.40 M | 146.72 M | 23.0 | |
AE + PNA | 6.48 M | 379.51 M | 48.9 | |
Conv (Baseline) | Adaptive-CAPTCHA | 3.82 M | 259.80 M | 30.3 |
AE | 3.29 M | 192.97 M | 79.1 | |
AE + LSKA (k = 7) | 3.39 M | 229.59 M | 89.8 | |
AE + LSKA (k = 11) | 3.39 M | 232.10 M | 88.3 | |
AE + CBAM (ratio = 8) | 3.31 M | 195.32 M | 85.4 | |
AE + CBAM (ratio = 16) | 3.30 M | 195.29 M | 77.3 | |
AE + ECA (ratio = 2) | 3.29 M | 193.60 M | 82.6 | |
AE + ECA (ratio = 4) | 3.29 M | 193.60 M | 85.9 | |
AE + GC (ATT + ADD) | 3.38 M | 193.74 M | 66.3 | |
AE + GC (AVG + MUL) | 3.38 M | 193.69 M | 81.8 | |
AE + SA (groups = 8) | 3.29 M | 193.29 M | 84.3 | |
AE + SA (groups = 16) | 3.29 M | 193.31 M | 84.3 | |
AE + SE (ratio = 8) | 3.31 M | 193.62 M | 84.0 | |
AE + SE (ratio = 16) | 3.30 M | 193.61 M | 79.7 | |
AE + PNA | 4.27 M | 489.68 M | 24.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wan, X.; Johari, J.; Ruslan, F.A. Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment. Information 2024, 15, 717. https://doi.org/10.3390/info15110717
Wan X, Johari J, Ruslan FA. Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment. Information. 2024; 15(11):717. https://doi.org/10.3390/info15110717
Chicago/Turabian StyleWan, Xing, Juliana Johari, and Fazlina Ahmat Ruslan. 2024. "Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment" Information 15, no. 11: 717. https://doi.org/10.3390/info15110717
APA StyleWan, X., Johari, J., & Ruslan, F. A. (2024). Variational Color Shift and Auto-Encoder Based on Large Separable Kernel Attention for Enhanced Text CAPTCHA Vulnerability Assessment. Information, 15(11), 717. https://doi.org/10.3390/info15110717