Parallelly Running and Privacy-Preserving k-Nearest Neighbor Classification in Outsourced Cloud Computing Environments
Abstract
:1. Introduction
Contributions
- We propose a secure comparison (SCI) protocol to solve the information disclosure problem in existing works.
- Using the secure comparison, we propose new secure k-largest/smallest element (SkLE/SkSE) protocols, which solve the information disclosure problem and hide data access patterns.
- Using the proposed SkLE/SkSE, we implement a privacy-preserving k-nearest neighbour classification (PkNC) protocol.
- We prove the securities of the proposed protocols formally and demonstrate that the proposed protocols are practical through PkNC experiments with real datasets. In other words, PkNC is suitable for big data analysis to handle large-scale dataset and have large k of nearest neighbors, since it is executed for each dataset in parallel.
2. Related Works
3. Preliminaries
3.1. System Model
3.2. Adversary Model and Security Definitions
3.3. Paillier Cryptosystem
3.4. Performance Evaluation
3.5. Notation
3.6. Referenced Functionalities
4. Proposed Secure Comparison and SkLE/SkSE Protocols
4.1. Secure Comparison and Inequality (SCI) Protocol
Algorithm 1: Secure Comparison and Inequality (SCI) |
4.2. Proof of SCI Protocol
- (1)
- The view of DH in the real protocol is as follows.
- Simulation
- -
- The simulator chooses values , and uniformly at random, where and .
- -
- The simulator defines as the view of DH.
- -
- The simulator outputs the view of DH and halts.
- (2)
- The view of CSP in the real protocol is as follows.
- Simulation
- -
- The simulator tosses a random coin .
- -
- If a random coin c is 0, the simulator sets to 0 and chooses uniformly at random where .
- -
- If a random coin c is 1, the simulator chooses uniformly at random where .
- -
- The simulator permutes uniformly at random and sets them to .
- -
- The simulator computes .
- -
- The simulator defines as the view of CSP.
- -
- The simulator outputs the view of CSP and halts.
- (3)
- As explained earlier, the result of satisfies the condition . As shown in Table 2, when () and DH chooses a random coin (resp., ), then CSP returns (resp., ). On the contrary, when () and DH chooses a random coin (resp., ), CSP returns (resp., ). Therefore, the result of is the same as that of . As for , if ; otherwise, (line 10). If s is equal to k (i.e., all is 0), returns (line 14) and (line 39); otherwise if s is unequal to k (i.e., there is an element with 1 in a vector x), returns (line 14) and (line 39). In other words, the result of is the same as that of . Therefore, the output of is computationally indistinguishable from that of . □
4.3. Secure Version of SLE/SSE (SLE/SSE)
Algorithm 2: Secure version of SkLE (SkLE) |
4.4. Proof of SLE Protocol
- (1)
- We define DH’s view of the real protocol as follows.
- Simulation
- -
- The simulator chooses values , , , , , , , , uniformly at random where the values are in .
- -
- The simulator defines as the view of DH.
- -
- The simulator outputs DH’s view and halts.
- (2)
- Let compute auxiliary data for the j-th bit of all elements in the ()-th round (). It is clear that a predicted k-largest element will be larger than other elements for the j-th bit because, from ()-th bit to j-th bit, a predicted k-largest element has bit 1 at least once but the other elements are all 0. Predicted k-largest elements include k largest elements in the corresponding round. As shown in case 1 of Table 4, when , predicted k-largest elements are included to the set of k largest elements (). The prior k-largest elements remain the same since the predicted k-largest elements include them. When , predicted k-largest elements are included in the set of k largest elements () as shown in Case 3. In addition, the other candidate elements (i.e., unpredicted k-largest elements) are excluded from the set of candidate elements () as shown in Case 4, so that k largest elements remain the same as in case 5. Since all elements in the set of k largest elements are larger than the other elements, output of is computationally indistinguishable from that of functionality .
5. Implementation and Experimental Results of Privacy-Preserving -Nearest Neighbor Classification
5.1. Privacy-Preserving -Nearest Neighbor Classification
- Step 1: computing distances between an input query and data in a dataset.
- Step 2: selecting k smallest distances.
- Step 3: computing the majority class of k data corresponding to the k smallest distances.
5.2. Implementation and Experimental Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Secure Zero Protocol (SZP)
Algorithm A1: Secure Zero Protocol (SZP) |
Input | () | () | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i | 1 | 2 | 3 | 4 | 5 | 6 | i | 1 | 2 | 3 | 4 | 5 | 6 | |
(Case 1) | ||||||||||||||
(Case 2) | ||||||||||||||
(Case 3) | ||||||||||||||
Appendix B. Privacy-Preserving k-Nearest Neighbor Classification (PkNC)
Algorithm A2: Privacy-preserving k-Nearest Neighbor Classification (PkNC) |
References
- Wu, X.; Zhu, X.; Wu, G.Q.; Ding, W. Data mining with big data. IEEE Trans. Knowl. Data Eng. 2013, 26, 97–107. [Google Scholar]
- Beam, A.L.; Kohane, I.S. Big data and machine learning in health care. JAMA 2018, 319, 1317–1318. [Google Scholar] [CrossRef] [PubMed]
- Hashem, I.A.T.; Yaqoob, I.; Anuar, N.B.; Mokhtar, S.; Gani, A.; Khan, S.U. The rise of “big data” on cloud computing: Review and open research issues. Inf. Syst. 2015, 47, 98–115. [Google Scholar] [CrossRef]
- Acar, A.; Aksu, H.; Uluagac, A.S.; Conti, M. A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surv. (CSUR) 2018, 51, 1–35. [Google Scholar] [CrossRef]
- Price, W.N.; Cohen, I.G. Privacy in the age of medical big data. Nat. Med. 2019, 25, 37–43. [Google Scholar] [CrossRef]
- Mehmood, A.; Natgunanathan, I.; Xiang, Y.; Hua, G.; Guo, S. Protection of big data privacy. IEEE Access 2016, 4, 1821–1834. [Google Scholar] [CrossRef] [Green Version]
- Botta, A.; De Donato, W.; Persico, V.; Pescapé, A. Integration of cloud computing and internet of things: A survey. Future Gener. Comput. Syst. 2016, 56, 684–700. [Google Scholar] [CrossRef]
- Li, F.; Shin, R.; Paxson, V. Exploring privacy preservation in outsourced k-nearest neighbors with multiple data owners. In Proceedings of the 2015 ACM Workshop on Cloud Computing Security Workshop, New York, NY, USA, 16 October 2015; pp. 53–64. [Google Scholar]
- Bost, R.; Popa, R.A.; Tu, S.; Goldwasser, S. Machine learning classification over encrypted data. In Proceedings of the NDSS, San Diego, CA, USA, 8–11 February 2015; Volume 4324, p. 4325. [Google Scholar]
- Park, J.; Lee, D.H. Parallelly running k-nearest neighbor classification over semantically secure encrypted data in outsourced environments. IEEE Access 2020, 8, 64617–64633. [Google Scholar] [CrossRef]
- Samanthula, B.K.; Elmehdwi, Y.; Jiang, W. K-nearest neighbor classification over semantically secure encrypted relational data. IEEE Trans. Knowl. Data Eng. 2014, 27, 1261–1273. [Google Scholar] [CrossRef] [Green Version]
- Du, J.; Bian, F. A Privacy-Preserving and Efficient k-nearest neighbor query and classification scheme based on k-dimensional tree for outsourced data. IEEE Access 2020, 8, 69333–69345. [Google Scholar] [CrossRef]
- Lian, H.; Qiu, W.; Yan, D.; Huang, Z.; Tang, P. Efficient and secure k-nearest neighbor query on outsourced data. Peer-to-Peer Netw. Appl. 2020, 13, 2324–2333. [Google Scholar] [CrossRef]
- Song, F.; Qin, Z.; Liu, Q.; Liang, J.; Ou, L. Efficient and Secure k-Nearest Neighbor Search Over Encrypted Data in Public Cloud. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
- Sun, M.; Yang, R. An efficient secure k nearest neighbor classification protocol with high-dimensional features. Int. J. Intell. Syst. 2020, 35, 1791–1813. [Google Scholar] [CrossRef]
- Haque, R.U.; Hasan, A.; Jiang, Q.; Qu, Q. Privacy-preserving K-nearest neighbors training over blockchain-based encrypted health data. Electronics 2020, 9, 2096. [Google Scholar]
- Elmehdwi, Y.; Samanthula, B.K.; Jiang, W. Secure k-nearest neighbor query over encrypted data in outsourced environments. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014; pp. 664–675. [Google Scholar]
- Rong, H.; Wang, H.M.; Liu, J.; Xian, M. Privacy-preserving k-nearest neighbor computation in multiple cloud environments. IEEE Access 2016, 4, 9589–9603. [Google Scholar] [CrossRef]
- Wu, W.; Liu, J.; Rong, H.; Wang, H.; Xian, M. Efficient k-nearest neighbor classification over semantically secure hybrid encrypted cloud database. IEEE Access 2018, 6, 41771–41784. [Google Scholar] [CrossRef]
- Wu, W.; Parampalli, U.; Liu, J.; Xian, M. Privacy preserving k-nearest neighbor classification over encrypted database in outsourced cloud environments. World Wide Web 2019, 22, 101–123. [Google Scholar] [CrossRef]
- Chen, H.; Chillotti, I.; Dong, Y.; Poburinnaya, O.; Razenshteyn, I.; Riazi, M.S. SANNS: Scaling up secure approximate k-nearest neighbors search. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Berkeley, CA, USA, 12–14 August 2020; pp. 2111–2128. [Google Scholar]
- Zhu, D.; Zhu, H.; Liu, X.; Li, H.; Wang, F.; Li, H.; Feng, D. CREDO: Efficient and privacy-preserving multi-level medical pre-diagnosis based on ML-kNN. Inf. Sci. 2020, 514, 244–262. [Google Scholar]
- Zheng, Y.; Lu, R.; Shao, J. Achieving efficient and privacy-preserving k-nn query for outsourced ehealthcare data. J. Med. Syst. 2019, 43, 1–13. [Google Scholar]
- Jagadish, H.V. Linear clustering of objects with multiple attributes. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 23–26 May 1990; pp. 332–342. [Google Scholar]
- Yang, S.; Tang, S.; Zhang, X. Privacy-preserving k nearest neighbor query with authentication on road networks. J. Parallel Distrib. Comput. 2019, 134, 25–36. [Google Scholar]
- Kolahdouzan, M.; Shahabi, C. Voronoi-based k nearest neighbor search for spatial network databases. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, ON, Canada, 31 August–3 September 2004; Volume 30, pp. 840–851. [Google Scholar]
- Wang, Y.; Tian, Z.; Sun, Y.; Du, X.; Guizani, N. LocJury: An IBN-based location privacy preserving scheme for IoCV. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5028–5037. [Google Scholar]
- Sun, Y.; Yin, L.; Sun, Z.; Tian, Z.; Du, X. An IoT data sharing privacy preserving scheme. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 984–990. [Google Scholar]
- Jia, M.; He, K.; Chen, J.; Du, R.; Chen, W.; Tian, Z.; Ji, S. PROCESS: Privacy-Preserving On-Chain Certificate Status Service. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications, Virtual, 10–13 May 2021; pp. 1–10. [Google Scholar]
- Raj, D.; Mohanasundaram, R. An efficient filter-based feature selection model to identify significant features from high-dimensional microarray data. Arab. J. Sci. Eng. 2020, 45, 2619–2630. [Google Scholar] [CrossRef]
- Goldreich, O. Foundations of Cryptography: Volume 2, Basic Applications; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Asharov, G.; Lindell, Y. A full proof of the BGW protocol for perfectly secure multiparty computation. J. Cryptol. 2017, 30, 58–151. [Google Scholar] [CrossRef] [Green Version]
- Canetti, R. Security and composition of multiparty cryptographic protocols. J. Cryptol. 2000, 13, 143–202. [Google Scholar] [CrossRef]
- Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; Springer: Berlin/Heidelberg, Germany, 1999; pp. 223–238. [Google Scholar]
- Samanthula, B.K.; Chun, H.; Jiang, W. An efficient and probabilistic secure bit-decomposition. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, Hangzhou, China, 8–10 May 2013; pp. 541–546. [Google Scholar]
- Bethencourt, J. Paillier Library. 2010. Available online: https://acsc.cs.utexas.edu/libpaillier/ (accessed on 11 December 2022).
- Intel. Intel Core i7-4790 Processor Specification. 2014. Available online: https://ark.intel.com/content/www/us/en/ark/products/80806/intel-core-i74790-processor-8m-cache-up-to-4-00-ghz.html (accessed on 11 December 2022).
- Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 11 December 2022).
- Marko Bohanec, B.Z. Car Evaluation Data Set. UCI Machine Learning Repository. 1997. Available online: https://archive.ics.uci.edu/ml/datasets/Car+Evaluation (accessed on 11 December 2022).
- Schlimmer, J. Mushroom Data Set. UCI Machine Learning Repository. 1987. Available online: https://archive.ics.uci.edu/ml/datasets/Mushroom (accessed on 11 December 2022).
Input | Functionality | j | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(Case 1) | 4 | 1 | ||||||||||||
3 | 1 | |||||||||||||
2 | 1 | |||||||||||||
1 | 0 | |||||||||||||
0 | 1 | |||||||||||||
4 | 1 | |||||||||||||
3 | 1 | |||||||||||||
2 | 1 | |||||||||||||
1 | 0 | |||||||||||||
0 | 1 | |||||||||||||
(Case 2) | 4 | 1 | ||||||||||||
3 | 1 | |||||||||||||
2 | 0 | |||||||||||||
1 | 1 | |||||||||||||
0 | 0 | |||||||||||||
4 | 1 | |||||||||||||
3 | 1 | |||||||||||||
2 | 0 | |||||||||||||
1 | 1 | |||||||||||||
0 | 0 | |||||||||||||
(Case 3) | 4 | 1 | ||||||||||||
3 | 1 | |||||||||||||
2 | 0 | |||||||||||||
1 | 1 | |||||||||||||
0 | 0 | |||||||||||||
4 | 1 | |||||||||||||
3 | 1 | |||||||||||||
2 | 0 | |||||||||||||
1 | 1 | |||||||||||||
0 | 0 |
j | 1 | ||
---|---|---|---|
· | · | · | · |
· | |||
7 | |||
6 | |||
5 | |||
4 | |||
3 | |||
2 | |||
1 | |||
0 | |||
2 |
Candidate () | Non-Candidate () | |||
---|---|---|---|---|
Predicted () | Unpredicted () | |||
(case 2) |
(case 5) | |||
(case 1) | ||||
(case 3) | (case 4) |
Step 1 | Step 2 | Step 3 | Total | ||
---|---|---|---|---|---|
SBD | SkLE | ||||
Ratio | 5% | 9% | 75% | 11% | 100% |
Communication Amount (Megabytes) | Network Delay (s) | |
---|---|---|
[11] | 154.78 | 123.82 |
54.72 | 43.77 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.; Lee, D.H. Parallelly Running and Privacy-Preserving k-Nearest Neighbor Classification in Outsourced Cloud Computing Environments. Electronics 2022, 11, 4132. https://doi.org/10.3390/electronics11244132
Park J, Lee DH. Parallelly Running and Privacy-Preserving k-Nearest Neighbor Classification in Outsourced Cloud Computing Environments. Electronics. 2022; 11(24):4132. https://doi.org/10.3390/electronics11244132
Chicago/Turabian StylePark, Jeongsu, and Dong Hoon Lee. 2022. "Parallelly Running and Privacy-Preserving k-Nearest Neighbor Classification in Outsourced Cloud Computing Environments" Electronics 11, no. 24: 4132. https://doi.org/10.3390/electronics11244132
APA StylePark, J., & Lee, D. H. (2022). Parallelly Running and Privacy-Preserving k-Nearest Neighbor Classification in Outsourced Cloud Computing Environments. Electronics, 11(24), 4132. https://doi.org/10.3390/electronics11244132