SSKM_DP: Differential Privacy Data Publishing Method via SFLA-Kohonen Network
Abstract
:1. Introduction
- (1)
- A clustering method based on the SFLA-Kohonen network is proposed, which improves the fitting accuracy of connection.
- (2)
- Weights to training data and the accuracy of clustering results. The validity of the SSKM_DP algorithm is proven theoretically.
- (3)
- Considering that the k-means algorithm is very sensitive to the selection of the initial point, the number of clusters needs to be carefully set empirically, and the DBSCAN algorithm does not work well on high-dimensional data; a clustering method based on the Kohonen neural network was introduced to solve the above problems. In order to initialize the Kohonen network, the single-population frog leaping algorithm (SFLA) was introduced to speed up network convergence.
- (4)
- Considering that there may be complex correlations between attributes in the data, the correlation between non-sensitive attributes and sensitive attributes is bound to lead to the inference of sensitive information from non-sensitive attributes. To solve this problem, we introduced the maximum information coefficient to measure the correlation. An appropriate amount of noise is added to the cluster of non-sensitive attributes to protect non-sensitive attributes and further prevent the private leakage of sensitive data.
- (5)
- In view of the effectiveness of the SSKM_DP algorithm, compared to the algorithms MDAV, IDP_KMENAS, and MDAV_DP on the real datasets NLTCS and UCI Adult, SSKM_DP was carried out in a lot of experiments. Experimental results show that, compared to these similar methods, SSKM_DP not only ensures the privacy of the published data, but it also greatly improves the usability of the published data.
2. Related Works
- (1)
- Noise is directly added to the original data record, and then the data with noise is released. This method has high privacy protection ability, but it leads to the poor utility of published data.
- (2)
- First, the original data is processed by using compression, transformation, and other technologies, and then noise is added to the processed data. Finally, the data with noise are released. Although this method may lead to a small part of the data information being missing, it greatly improves the effectiveness of published data.
3. Definitions
3.1. Differential Privacy
3.2. Kohonen Network
- (1)
- Initializing The Network
- (2)
- Looking For Winning Neurons
- (3)
- Adjusting And Updating The Weight
- (4)
- Iterating The Process
3.2.1. Leap Frog Algorithm
- (1)
- Population Initialization
- (2)
- Subgroup Division
- (3)
- Local Search
3.2.2. Maximum Information Coefficient
4. The Proposed Data Publishing Method
4.1. Description of Problem
4.2. SSKM_DP Multi-Sensi tive Attribute Data Publishing Mechanism
- (1)
- Attribute Clustering
- (2)
- Attribute Correlation Judgment
- (3)
- Data Noise
Algorithm 1 SSKM-DP |
Input:, the number of neurons in the input layer of Kohnen’s network t, the number of frogs , learning rate , Maximum learning times . Output:. 1: 2: 3: 4: 5: |
4.3. SFLA-Kohonen Data Clustering Algorithm
Algorithm 2 SFLA optimizes the initial weight of Kohomen network |
Input: data the number of neurons in the input layer of Kohomen’s network; the number of frogs. Output: the optimal initial weight of the SOM nwtwork. 1: R = {} 2: 3: 4: 5: 6: 7: 8: 9: 10: |
Algorithm 3 SFLA—Kohonen networks to achieve data clustering |
Input: dataset ; the learning rate is and its value range is (0,1); Maximum learning times T Output: Clusters formed by clustering . 1: 2: for do 3: 4: Obtain new winning neurons, 5: 6: 7: end for 8: return FModel |
4.4. Attribute Correlation Determination Method
Algorithm 4 Attribute correlation determation method |
Input: cluster formed by SFLA-SOM network clustering ; Connection strength threshold Output: Clusters with sensitive attributes Vs; there exists cluster with non-sensitive attributes strongly connected to sensitive attributes. 1: Mark all sensitive attributes xs in the data 2: add 3: Calculate Connection strength 4: 5: if then 6: 7: end if 8: return |
4.5. Data Noise
5. Analysis of Privacy Protection Effect of the Algorithm
6. Experimental Evaluation
6.1. Experimental Environment
6.2. Experimental Data
6.3. Experimental Evaluation Indexes
6.4. Analysis of Experimental Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Xiao, Y.; Li, H. Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level. Information 2020, 11, 166. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Xu, Z.; Chen, J.; Jia, S. B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential Privacy. Entropy 2022, 24, 404. [Google Scholar] [CrossRef]
- Yan, Y.; Sun, Z.; Mahmood, A.; Xu, F.; Dong, Z.; Sheng, Q.Z. Achieving Differential Privacy Publishing of Location-Based Statistical Data Using Grid Clustering. ISPRS Int. J. Geo-Inf. 2022, 11, 404. [Google Scholar] [CrossRef]
- Zhang, X.; Luo, Y.; Yu, Q.; Xu, L.; Lu, Z. Privacy-Preserving Method for Trajectory Data Publication Based on Local Preferential Anonymity. Information 2023, 14, 157. [Google Scholar] [CrossRef]
- Utaliyeva, A.; Shin, J.; Choi, Y.-H. Task-Specific Adaptive Differential Privacy Method for Structured Data. Sensors 2023, 23, 1980. [Google Scholar] [CrossRef]
- Zhuo, M.; Huang, W.; Liu, L.; Zhou, S.; Tian, Z. A High-Utility Differentially Private Mechanism for Space Information Networks. Remote Sens. 2022, 14, 5844. [Google Scholar] [CrossRef]
- Soria-Comas, J.; Domingo-Ferrer, J.; Sanchez, D.; Martínez, S. Enhancing Data Utility in Differential Privacy via Microaggregation-based K-anonymity. VLDB J. 2014, 23, 771–794. [Google Scholar] [CrossRef]
- Sweeney, L. k-ANONYMITY: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef] [Green Version]
- Zhao, X.W.; Liang, J.Y. An Attribute Weighted Clustering Algorithm for Mixed Data Based on Information Entropy. J. Comput. Res. Dev. 2016, 53, 1018–1028. [Google Scholar]
- Sanchez, D.; Domingo-Ferrer, J.; Martinez, S.; Soria-Comas, J. Utility-Preserving Differentially Private Data Releases via Individual Ranking Micro Aggregation. Inf. Fusion 2016, 30, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Monedero, D.R.; Mezher, A.M.; Colome, X.C.; Forné, J.; Soriano, M. Efficient K-anonymous Micro Aggregation of Multivariate Numerical Data via Principal Component Analysis. Inf. Sci. 2019, 503, 417–443. [Google Scholar] [CrossRef]
- Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. L-diversity: Privacy beyond K-anonymity. ACM Trans. Knowl. Discov. Data 2006, 1, 3–5. [Google Scholar] [CrossRef]
- Li, Y.; Zhou, F.; Xu, Z. Privacy protection scheme for mobile social networks supporting k-nearest neighbor search. J. Comput. Sci. 2021, 44, 1481–1500. [Google Scholar]
- Parra-Arnau, J.; Domingo-Ferrer, J.; Soria-Comas, J. Differentially private data publishing via cross-moment microaggregation. Inf. Fusion 2020, 53, 269–288. [Google Scholar] [CrossRef]
- Gu, Z.; Zhang, G.; Ma, C.; Song, L. Differential privacy data publishing method based on probabilistic principal component analysis. J. Harbin Eng. Univ. 2021, 1–8. Available online: https://kns-cnki-net.wvpn.lnut.edu.cn/kcms/detail/23.1390.U.20210609.1219.004.html (accessed on 10 August 2021).
- Chen, S.; Fu, A.; Ke, H.; Su, C.; Sun, H. MCDP: Multi cluster distributed differential privacy data publishing method based on neural network. Acta Electron. Sin. 2020, 48, 2297–2303. [Google Scholar]
- Ye, Y.; Wang, L.; Han, J.; Qiu, S.; Luo, F. An Anonymization Method Combining Anatomy and Permutation for Protecting Pprivacy in Microdata with Multiple Sensitive Attributes. In Proceedings of the 2017 International Conference on Machine Learning and Cybernetics, Ningbo, China, 9–12 July 2017; pp. 404–411. [Google Scholar]
- Saraswathi, S.; Thirukumar, K. Enhancing Utility and Privacy Using T-closeness for Multiple Sensitive Attributes. Adv. Nat. Appl. Sci. 2016, 10, 6–14. [Google Scholar]
- Li, N.; Li, T.; Venkatasubramanian, S. t-Closeness: Privacy beyond k-Anonymity and l-Diversit. In Proceedings of the IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; pp. 106–115. [Google Scholar]
- Acs, G.; Melis, L.; Castelluccia, C.; De Cristofaro, E. Differentially Private Mixture of Generative Neural Networks. IEEE Trans. Knowl. Data Eng. 2019, 31, 1109–1121. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Ge, L.N.; Wang, S.Q.; Wang, L.; Zhang, Y.; Liang, J. Improvement of Differential Privacy Protection Algorithm Based on Optics Clustering. J. Comput. Appl. 2018, 38, 73–78. (In Chinese) [Google Scholar] [CrossRef]
- Yao, S. An Improved Differential Privacy K-Means Algorithm Based on MapReduce. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design, Hangzhou, China, 8–9 December 2018; pp. 141–145. [Google Scholar]
- Soria-Comas, J.; Domingo-Ferrer, J. Differentially Private Data Publishing via Optimal Univariate Micro-aggregation and Record perturbation. Knowl.-Based Syst. 2018, 153, 78–90. [Google Scholar] [CrossRef]
- Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata Languages and Programming, Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
- Ji, Z.; Lipton, Z.C.; Elkan, C. Differential Privacy and Machine Learning: A Survey and Review. arXiv 2014, arXiv:1412.7584. [Google Scholar]
- Onishi, A. Landmark Map: An Extension of the Self-organizing Map for a User-intended Nonlinear Projection. Neurocomputing 2020, 388, 228–245. [Google Scholar] [CrossRef] [Green Version]
- Eusuff, M.M.; Lansey, K.E. Optimization of Water Distribution Network Design Using the Shuffled Frog Leaping Algorithm. J. Water Resour. Plan. Manag. 2003, 129, 210–225. [Google Scholar] [CrossRef]
- Eusuff, M.; Lanmy, K.; Pasha, F. Shuffled Frog-leaping Algorithm: A Memetic Meta-heuristic for Discrete Optimization. Eng. Optim. 2006, 38, 129–154. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
- Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting Novel Associations in Large datas. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef] [Green Version]
- Ye, Q.Q.; Meng, X.F.; Zhu, M.J.; Huo, Z. Survey on Local Differential Privacy. J. Softw. 2018, 29, 1981–2005. (In Chinese) [Google Scholar]
- Bai, L.; Liang, J.; Cao, F. A Multiple K-means Clustering Ensemble Algorithm to Find Nonlinearly Separable Clusters. Inf. Fusion 2020, 61, 36–47. [Google Scholar] [CrossRef]
- Scitovski, R.; Sabo, K. DBSCAN-like Clustering Method for Various Data Densities. Pattern Anal. Appl. 2019, 23, 541–554. [Google Scholar] [CrossRef]
Algorithm | Main Idea | Limitation |
---|---|---|
MDAV [7] | By micro-aggregating all attributes to achieve K anonymization, the amount of noise required can be effectively reduced. | Too much noise, poor utility, limited clustering effect, information loss. |
IDP_KMENAS [22] | It uses a canopy to select the initial center point and uses the Laplace mechanism to realize the differential privacy protection. | Poor clustering results, poor utility, slow convergence. |
MDAV_DP [23] | Adds noise to the micro-aggregated version ofthe original dataset, with the micro-aggregation dataset as our protection target. | Not suitable for complex data; value attribute utility is not considered. |
Scope of Application | Standardized | Computational Complexity | Robustness | |
---|---|---|---|---|
Pearson coefficient | Linear data | Yes | Low | Low |
Spearman coefficient | Linear data simple monotone nonlinear data | Yes | Low | Medium |
KNN | Linear data nonlinear data | No | High | High |
MIC | Linear data nonlinear data | Yes | Low | High |
Hardware and Software Information | Specific Configuration |
---|---|
CPU | Intel(R) Core(TM) i5-9400F CPU(2.90 GHz) |
Memory | 16 GB |
The operating system | Win10 64-bit |
The development environment | PyCharm-professional-2021 |
Programming language | Python 3 |
Datasets | Type | Number of Attributes | Date Size |
---|---|---|---|
NLTCS | Binary | 16 | 21,574 |
Adult | Non-binary | 14 | 48,842 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chu, Z.; He, J.; Li, J.; Wang, Q.; Zhang, X.; Zhu, N. SSKM_DP: Differential Privacy Data Publishing Method via SFLA-Kohonen Network. Appl. Sci. 2023, 13, 3823. https://doi.org/10.3390/app13063823
Chu Z, He J, Li J, Wang Q, Zhang X, Zhu N. SSKM_DP: Differential Privacy Data Publishing Method via SFLA-Kohonen Network. Applied Sciences. 2023; 13(6):3823. https://doi.org/10.3390/app13063823
Chicago/Turabian StyleChu, Zhiguang, Jingsha He, Juxia Li, Qingyang Wang, Xing Zhang, and Nafei Zhu. 2023. "SSKM_DP: Differential Privacy Data Publishing Method via SFLA-Kohonen Network" Applied Sciences 13, no. 6: 3823. https://doi.org/10.3390/app13063823
APA StyleChu, Z., He, J., Li, J., Wang, Q., Zhang, X., & Zhu, N. (2023). SSKM_DP: Differential Privacy Data Publishing Method via SFLA-Kohonen Network. Applied Sciences, 13(6), 3823. https://doi.org/10.3390/app13063823