Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review
Abstract
:1. Introduction
2. Legal and Regulatory Perspectives on Privacy in Generative AI
2.1. Legal Definitions of Privacy and Personal Data
2.2. Anonymization and Its Limitations
2.3. Key Regulations Affecting Generative AI
2.4. The Role of Privacy-Preserving Techniques in Legal Compliance
2.5. Balancing Innovation and Legal Obligations
3. Overview of Privacy Risks in Generative AI
3.1. Data Memorization and Model Inversion Attacks
3.2. Membership Inference Attacks
3.3. Model Poisoning and Adversarial Attacks
3.4. Data Leakage from Fine-Tuning
3.5. Privacy Risks in Real-World Applications
4. Privacy-Preserving Techniques for Generative AI
4.1. Differential Privacy (DP)
4.2. Federated Learning (FL) and Privacy-Preserving Federated Learning
4.3. Homomorphic Encryption (HE)
4.4. Secure Multi-Party Computation (SMPC)
4.5. Privacy-Preserving Synthetic Data Generation
4.6. Privacy-Enhancing Technologies (PETs)
4.7. Data Masking and Anonymization
4.8. Techniques for Preventing Unintended Data Memorization
4.8.1. Selective Forgetting and Scrubbing
4.8.2. Retraining with Privacy Filters
4.8.3. Privacy-Preserving Fine-Tuning
4.8.4. Real-Time Privacy Audits
4.8.5. Use of Synthetic Data
5. Emerging Trends and Future Directions
5.1. Blockchain for Privacy in Generative AI
5.2. Advancing the Efficiency of Privacy-Enhancing Technologies (PETs) in AI
5.3. Differential Privacy and Federated Learning in Real-Time Applications
5.4. Privacy-Preserving AI in Synthetic Data Generation
5.5. Addressing Privacy Attacks in Large Language Models (LLMs)
5.6. Legal and Ethical Frameworks for Privacy in Generative AI
5.7. AI and Quantum Cryptography for Privacy Preservation
5.8. Emerging Applications in Generative AI-Enabled Networks
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
LLMs | Large Language Models |
DP | Differential Privacy |
FL | Federated Learning |
HE | Homomorphic Encryption |
SMPC | Secure Multi-Party Computation |
AML | Adversarial Machine Learning |
PII | Personally Identifiable Information |
GDPR | General Data Protection Regulation |
HIPAA | Health Insurance Portability and Accountability Act |
GANs | Generative Adversarial Networks |
VAEs | Variational Autoencoders |
MIA | Membership Inference Attack |
SDXL | Stable Diffusion Extended Latent |
S2L | Shake-to-Leak |
TGAN | Tabular Generative Adversarial Network |
PETs | Privacy-Enhancing Technologies |
CCPA | California Consumer Privacy Act |
NNs | Neural Networks |
EW-Tune | Edgeworth-Tune |
NLP | Natural Language Processing |
References
- Yang, Y.; Zhang, B.; Guo, D.; Du, H.; Xiong, Z.; Niyato, D.; Han, Z. Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing. arXiv 2024, arXiv:2405.10521. [Google Scholar] [CrossRef]
- Baig, A. Generative AI Privacy: Issues, Challenges & How to Protect? Available online: https://securiti.ai/generative-ai-privacy/ (accessed on 10 September 2024).
- Aziz, R.; Banerjee, S.; Bouzefrane, S.; Le Vinh, T. Exploring Homomorphic Encryption and Differential Privacy Techniques towards Secure Federated Learning Paradigm. Future Internet 2023, 15, 310. [Google Scholar] [CrossRef]
- Carlini, N.; Nasr, M.; Choquette-Choo, C.A.; Jagielski, M.; Gao, I.; Awadalla, A.; Koh, P.W.; Ippolito, D.; Lee, K.; Tramer, F.; et al. Are Aligned Neural Networks Adversarially Aligned? Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Xu, R.; Baracaldo, N.; Joshi, J. Privacy-Preserving Machine Learning: Methods, Challenges and Directions. arXiv 2021, arXiv:2108.04417. [Google Scholar]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks Against Machine Learning Models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 3–18. [Google Scholar]
- Cilloni, T.; Fleming, C.; Walter, C. Privacy Threats in Stable Diffusion Models. arXiv 2023, arXiv:2311.09355. [Google Scholar]
- Hayes, J.; Melis, L.; Danezis, G.; De Cristofaro, E. LOGAN: Membership Inference Attacks Against Generative Models. Proc. Priv. Enhancing Technol. 2019, 2019, 133–152. [Google Scholar] [CrossRef]
- Shan, S.; Ding, W.; Passananti, J.; Wu, S.; Zheng, H.; Zhao, B.Y. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. arXiv 2023, arXiv:2310.13828. [Google Scholar]
- Li, Z.; Hong, J.; Li, B.; Wang, Z. Shake to Leak: Fine-Tuning Diffusion Models Can Amplify the Generative Privacy Risk. In Proceedings of the 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), Toronto, ON, Canada, 9–11 April 2024. [Google Scholar]
- Templin, T.; Perez, M.W.; Sylvia, S.; Leek, J.; Sinnott-Armstrong, N. Addressing 6 challenges in generative AI for digital health: A scoping review. PLoS Digit. Health 2024, 3, e0000503. [Google Scholar] [CrossRef]
- Erlingsson, Ú.; Pihur, V.; Korolova, A. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
- Su, B.; Wang, Y.; Schiavazzi, D.; Liu, F. Privacy-Preserving Data Synthesis via Differentially Private Normalizing Flows with Application to Electronic Health Records Data. In Proceedings of the Inaugural AAAI 2023 Summer Symposium, Second Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics, Singapore, 17–19 July 2023; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2023; Volume 1. [Google Scholar] [CrossRef]
- PySyft. Available online: https://github.com/OpenMined/PySyft (accessed on 10 October 2024).
- Gu, X.; Sabrina, F.; Fan, Z.; Sohail, S. A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems. Int. J. Environ. Res. Public Health 2023, 20, 6539. [Google Scholar] [CrossRef]
- TensorFlow Federated. Available online: https://www.tensorflow.org/federated (accessed on 12 September 2024).
- Dhanaraj, R.K.; Suganyadevi, S.; Seethalakshmi, V.; Ouaissa, M. Introduction to Homomorphic Encryption for Financial Cryptography. In Homomorphic Encryption for Financial Cryptography; Seethalakshmi, V., Dhanaraj, R.K., Suganyadevi, S., Ouaissa, M., Eds.; Springer International Publishing: Cham, Germany, 2023; pp. 1–12. ISBN 9783031355349. [Google Scholar]
- Chillotti, I.; Gama, N.; Georgieva, M.; Izabachène, M. TFHE: Fast Fully Homomorphic Encryption Over the Torus. J. Cryptol. 2020, 33, 34–91. [Google Scholar] [CrossRef]
- Yao, A. Protocols for Secure Computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science, Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar] [CrossRef]
- Keller, M.; Pastro, V.; Rotaru, D. Overdrive: Making SPDZ Great Again. In Proceedings of the Advances in Cryptology—EUROCRYPT 2018, Tel Aviv, Israel, 29 April–3 May 2018; Nielsen, J.B., Rijmen, V., Eds.; Springer International Publishing: Cham, Germany; pp. 158–189. [Google Scholar]
- Aceto, G.; Giampaolo, F.; Guida, C.; Izzo, S.; Pescapè, A.; Piccialli, F.; Prezioso, E. Synthetic and Privacy-Preserving Traffic Trace Generation Using Generative AI Models for Training Network Intrusion Detection Systems. J. Netw. Comput. Appl. 2024, 229, 103926. [Google Scholar] [CrossRef]
- Microsoft Presidio. Available online: https://microsoft.github.io/presidio/ (accessed on 23 September 2024).
- Prasser, F.; Kohlmayer, F.; Lautenschläger, R.; Kuhn, K.A. ARX—A Comprehensive Tool for Anonymizing Biomedical Data. AMIA Annu. Symp. Proc. 2014, 2014, 984–993. [Google Scholar] [PubMed]
- Kua, J.; Hossain, M.B.; Natgunanathan, I.; Xiang, Y. Privacy Preservation in Smart Meters: Current Status, Challenges and Future Directions. Sensors 2023, 23, 3697. [Google Scholar] [CrossRef] [PubMed]
- Sebastian, G. Privacy and Data Protection in ChatGPT and Other AI Chatbots: Strategies for Securing User Information. Int. J. Secur. Priv. Pervasive Comput. 2023, 15, 1–14. [Google Scholar] [CrossRef]
- Hans, A.; Wen, Y.; Jain, N.; Kirchenbauer, J.; Kazemi, H.; Singhania, P.; Singh, S.; Somepalli, G.; Geiping, J.; Bhatele, A.; et al. Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs. arXiv 2024, arXiv:2406.10209. [Google Scholar]
- Ginart, A.A.; Guan, M.Y.; Valiant, G.; Zou, J. Making AI Forget You: Data Deletion in Machine Learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA; pp. 3518–3531. [Google Scholar]
- Mireshghallah, F.; Inan, H.A.; Hasegawa, M.; Rühle, V.; Berg-Kirkpatrick, T.; Sim, R. Privacy Regularization: Joint Privacy-Utility Optimization in Language Models. arXiv 2021, arXiv:2103.07567. [Google Scholar]
- Chen, T.; Da, L.; Zhou, H.; Li, P.; Zhou, K.; Chen, T.; Wei, H. Privacy-Preserving Fine-Tuning of Large Language Models through Flatness. arXiv 2024, arXiv:2403.04124. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; Association for Computing Machinery: New York, NY, USA; pp. 308–318. [Google Scholar]
- Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting Training Data from Large Language Models. arXiv 2020, arXiv:2012.07805. [Google Scholar]
- Goyal, M.; Mahmoud, Q.H. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. Electronics 2024, 13, 3509. [Google Scholar] [CrossRef]
- Song, C.; Ristenpart, T.; Shmatikov, V. Machine Learning Models That Remember Too Much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
- Halevi, S.; Shoup, V. Design and Implementation of HElib: A Homomorphic Encryption Library. Cryptol. ePrint Arch. 2020; prepint. [Google Scholar]
- Nguyen, C.T.; Liu, Y.; Du, H.; Hoang, D.T.; Niyato, D.; Nguyen, D.N.; Mao, S. Generative AI-Enabled Blockchain Networks: Fundamentals, Applications, and Case Study. arXiv 2024, arXiv:2401.15625. [Google Scholar] [CrossRef]
- Li, Z.; Kong, D.; Niu, Y.; Peng, H.; Li, X.; Li, W. An Overview of AI and Blockchain Integration for Privacy-Preserving. arXiv 2023, arXiv:2305.03928. [Google Scholar]
- Li, Y.; Du, W.; Han, L.; Zhang, Z.; Liu, T. A Communication-Efficient, Privacy-Preserving Federated Learning Algorithm Based on Two-Stage Gradient Pruning and Differentiated Differential Privacy. Sensors 2023, 23, 9305. [Google Scholar] [CrossRef] [PubMed]
- Behnia, R.; Ebrahimi, M.R.; Pacheco, J.; Padmanabhan, B. EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy. In Proceedings of the 2022 IEEE International Conference on Data Mining Workshops (ICDMW), Orlando, FL, USA, 28 November–1 December 2022; pp. 560–566. [Google Scholar]
- Li, Q.; Hong, J.; Xie, C.; Tan, J.; Xin, R.; Hou, J.; Yin, X.; Wang, Z.; Hendrycks, D.; Wang, Z.; et al. LLM-PBE: Assessing Data Privacy in Large Language Models. Proc. VLDB Endow. 2024, 17, 3201–3214. [Google Scholar] [CrossRef]
- Li, H.; Chen, Y.; Luo, J.; Kang, Y.; Zhang, X.; Hu, Q.; Chan, C.; Song, Y. Privacy in Large Language Models: Attacks, Defenses and Future Directions. arXiv 2023, arXiv:2310.10383. [Google Scholar]
- Feretzakis, G.; Verykios, V.S. Trustworthy AI: Securing Sensitive Data in Large Language Models. arXiv 2024, arXiv:2409.18222. [Google Scholar]
- Al-kfairy, M.; Mustafa, D.; Kshetri, N.; Insiew, M.; Alfandi, O. Ethical Challenges and Solutions of Generative AI: An Interdisciplinary Perspective. Informatics 2024, 11, 58. [Google Scholar] [CrossRef]
- Radanliev, P. Artificial Intelligence and Quantum Cryptography. J. Anal. Sci. Technol. 2024, 15, 4. [Google Scholar] [CrossRef]
- Radanliev, P.; De Roure, D.; Santos, O. Red Teaming Generative AI/NLP, the BB84 Quantum Cryptography Protocol and the NIST-Approved Quantum-Resistant Cryptographic Algorithms. arXiv 2023, arXiv:2310.04425. [Google Scholar] [CrossRef]
- Zhang, R.; Du, H.; Niyato, D.; Kang, J.; Xiong, Z.; Jamalipour, A.; Zhang, P.; Kim, D.I. Generative AI for Space-Air-Ground Integrated Networks. IEEE Wirel. Commun. 2024, 1–11. [Google Scholar] [CrossRef]
- Zhang, R.; Du, H.; Liu, Y.; Niyato, D.; Kang, J.; Xiong, Z.; Jamalipour, A.; Kim, D.I. Generative AI Agents with Large Language Model for Satellite Networks via a Mixture of Experts Transmission. IEEE J. Sel. Areas Commun. 2024, 1. [Google Scholar] [CrossRef]
- Brand, M.; Pradel, G. Practical Privacy-Preserving Machine Learning Using Homomorphic Encryption. Available online: https://eprint.iacr.org/2023/1320.pdf (accessed on 20 October 2024).
- Krasadakis, P.; Sakkopoulos, E.; Verykios, V.S. A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages. Electronics 2024, 13, 648. [Google Scholar] [CrossRef]
Technique | Key Strengths | Limitations | Best Suited For | Open-Source Tools |
---|---|---|---|---|
Differential Privacy | Strong privacy guarantees, scalable | Reduced model accuracy due to noise | General-purpose generative models | PySyft [14] |
Federated Learning | No data sharing, decentralized training | Communication overhead, vulnerable to inference attacks | Healthcare, finance | TensorFlow Federated [16] |
Homomorphic Encryption | Computation on encrypted data, strong privacy | High computational cost, scalability issues | Cloud-based generative AI | HElib [34] |
Secure MPC | Joint computation without revealing inputs | Significant computational complexity | Regulated industries (e.g., healthcare, finance) | MP-SPDZ [20] |
Adversarial Defense Mechanisms | Defense against privacy attacks, enhanced robustness | Specialized training needed, increased computational cost | High-security environments | Adversarial Defense Mechanisms [4,5] |
Synthetic Data Generation | Anonymized, realistic data for model training | Data quality may be lower than real-world data | Data-rich environments requiring privacy | Presidio [22], ARX [23] |
Privacy-Enhancing Technologies | Combined methods for stronger privacy–utility trade-offs | Complexity in implementation | Critical sectors (e.g., healthcare, finance) | Privacy-Enhancing Technologies |
Blockchain for Privacy | Decentralized, transparent audit trails | Scalability concerns, emerging technology | High-transparency sectors (e.g., healthcare, finance) | Blockchain for Privacy |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feretzakis, G.; Papaspyridis, K.; Gkoulalas-Divanis, A.; Verykios, V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information 2024, 15, 697. https://doi.org/10.3390/info15110697
Feretzakis G, Papaspyridis K, Gkoulalas-Divanis A, Verykios VS. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information. 2024; 15(11):697. https://doi.org/10.3390/info15110697
Chicago/Turabian StyleFeretzakis, Georgios, Konstantinos Papaspyridis, Aris Gkoulalas-Divanis, and Vassilios S. Verykios. 2024. "Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review" Information 15, no. 11: 697. https://doi.org/10.3390/info15110697
APA StyleFeretzakis, G., Papaspyridis, K., Gkoulalas-Divanis, A., & Verykios, V. S. (2024). Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information, 15(11), 697. https://doi.org/10.3390/info15110697