Scalable Multi-Party Privacy-Preserving Gradient Tree Boosting over Vertically Partitioned Dataset with Outsourced Computations
Abstract
:1. Introduction
- We propose sub-protocols based on an additive HE scheme used to perform primitive secure operations during a machine learning task. The sub-protocols are collaboratively executed by two non-colluding servers.
- We design a novel scalable and privacy-preserving multi-party XGBoost training algorithm and a corresponding prediction algorithm. The algorithms are constructed under the semi-honest security assumption and there is no limit on the number of participants involved.
- We conduct experiments using real-world datasets to demonstrate the effectiveness and efficiency of our proposed framework.
2. Related Work
2.1. Horizontal Multi-Party Gradient Tree Boosting Frameworks
2.2. Vertical Multi-Party Gradient Tree Boosting Frameworks
2.3. Generic Multi-Party Gradient Tree Boosting Frameworks
3. Preliminaries
3.1. XGBoost
3.2. Homomorphic Encryption (HE)
3.3. BCP Sub-Protocols
4. Proposed Computation Primitives
4.1. Div
4.2. Sargmax
Algorithm 1: Secure argmax Computation Algorithm |
1: function Sargmax(dict) 2: //Input: dict -is a dictionary of encrypted values 3: max=None 4: for key in dict.keys() do 5: if max==None then 6: max = dict[key] 7: else 8: //Securely compare the encrypted values 9: LGT(max, dict[key]) 10: if max>dict[key] then 11: max=max 12: else 13: max=dict[key] 14: end 15: end 16: end 17: return key 18: end |
5. Overview of Our Proposed Framework
5.1. Entities of the Framework
5.2. Security Assumptions
5.3. Workflow of Our Proposed Scheme
Algorithm 2: Scalable and Secure XGBoost Training |
1: function SxgbTrain(X, Y) 2: //Input: X: is an aggregation of X from n participants. 3: //Input: Y is the label borne by the LBP 4: for t = do 5: if TreeList==Empty then 6: LbpXgbTrain() 7: else 8: Compute: ⟦G⟧: and ⟦H⟧: 9: //Construct a tree using ⟦G⟧ and ⟦H⟧ 10: () 11: //Predict using the current tree 12: () 13: TreeList 14: 15: end 16: end 17: return TreeList 18: end |
6. Scalable and Secure Multi-Party XGBoost Building
6.1. The First Secure Tree Building by LBP
Algorithm 3: The First Tree Building by the LBP |
1: function LbpXgbTrain() 2: //Input: X: where is the number of features borne by the LBP 3: //Input: Y: is the label 4: //Compute the base score 5: 6: TreeList = [] 7: //Initial prediction 8: 9: Compute: G: and H: 10: //Construct a tree using and 11: () 12: //Predict using the tree 13: () 14: 15: //Encrypt the base score and the tree node values, and update the model list 16: TreeList 17: //Encrypt the label information and the updated prediction matrix 18: Enc(, Enc( 19: return TreeList, Enc(, Enc( 20: end |
6.2. Secure BuildTree and PredTree
Algorithm 4: Secure Build and Pred Tree |
1: function SBuildSPredTree() 2: //Input:- :, ⟦H: 3: /* Computed by Server C */ 4: if RootNode==None then 5: //Register the current node as the root node 6: RootNode=CurrentNode 7: end 8: /* Computed at each Participant i */ 9: foreach feature j do 10: Propose split candidates 11: end 12: foreach do 13: Compute and 14: Create a lookup table and record and in the table 15: Create a tuple () 16: end 17: Send the tuple ((), ) to Server C 18: /* Computed by Server C */ 19: = SSplitNode(, ) 20: /* Computed at the optimal Participant */ 21: Receive the optimal from Server C 22: Check the lookup for associated with 23: Partition I based on 24: Record the instance space I with Server C 25: end |
6.3. Secure Node Split Decision
Algorithm 5: Securely Finding Node Split |
1: function SSplitNode(, ) 2: //Input:- ⟦G:, ⟦H:, and . 3: /*Collaboratively Computed by Server S and Server C*/ 4: if CurrentNode==LeafNode then 5: //Compute the weight of the leaf node 6: = Div 7: = 8: return 9: end 10: //Compute the CurrentNode’s gain () 11: 12: 13: 14: //Initialize the gain dictionary 15: 16: //Enumerate all the Participants 17: for p = do 18: //Enumerate all the features of a Participant 19: for j = do 20: //Enumerate all the proposed thresholds 21: for k = do 22: Receive and from a Participant 23: //Compute the first derivative for the right branch 24: 25: //Compute the second derivatives for the right branch 26: 27: //Compute gains 28: 29: 30: 31: //Update the gain dictionary 32: 33: end 34: end 35: end 36: = () 37: return to optimal participant p 38: Server C receives I from optimal participant p 39: Server C partitions its instance space into I and I 40: Server C associates the CurrentNode with the optimal participant p as [] 41: end |
7. Secure Prediction
Algorithm 6: Secure Prediction Algorithm |
1: function SPredict(TreeList, ) 2: //Input:-TreeList and , where is the number of features for the record 3: /* Computed by all the entities */ 4: //Client c encrypt the values of the record 5: 6: Send the encrypted record to the Server C 7: while True do 8: //Start from the RootNode 9: if CurrentNode LeafNode then 10: Server C identifies feature j for the split at CurrentNode 11: Server C identifies the Participant p bearing the feature j 12: //Transform the encryption to be under the public key of the Participant p 13: // 14: Server C send to Server S 15: Server S decrypts using mDec 16: Server S re-encrypts as and sends to Server C 17: Server C extracts as: 18: //Collaboration with the participant bearing the optimal feature for the current node 19: Server C sends to the Participant p bearing the feature j 20: Participant p decrypts and compares it with the threshold value at the node 21: Based on whether is greater or less than the threshold, Participant p decides on the tree branch to follow 22: Participant p forwards the decision to Server C to continue with the process at NextNode 23: //Update the CurrentNode 24: CurrentNode = NextNode 25: else 26: Return the encrypted weight ⟦w⟧ of the LeafNode 27: end 28: end |
8. Analysis
8.1. Security Analysis
8.1.1. Security of BCP Sub-Protocols
8.1.2. Security of SSXGB
8.2. Communication Overhead Analysis
9. Performance Evaluation
9.1. Experimental Setup
9.1.1. Hardware and Software
9.1.2. Datasets
9.2. Evaluation of SSXGB
9.3. Privacy-Preservation Computation Overhead
9.4. Efficiency of SSXGB
10. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Details of the LGT Sub-Protocol
Appendix B. Proof of Correctness for the Div Sub-protocol
References
- Gong, M.; Feng, J.; Xie, Y. Privacy-enhanced multi-party deep learning. Neural Netw. 2020, 121, 484–496. [Google Scholar] [CrossRef] [PubMed]
- Cock, M.d.; Dowsley, R.; Nascimento, A.C.; Newman, S.C. Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, Denver, CO, USA, 12–16 October 2015; pp. 3–14. [Google Scholar]
- Hall, R.; Fienberg, S.E.; Nardi, Y. Secure multiple linear regression based on homomorphic encryption. J. Off. Stat. 2011, 27, 669. [Google Scholar]
- Kim, M.; Song, Y.; Wang, S.; Xia, Y.; Jiang, X. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR Med. Inform. 2018, 6, e19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Aono, Y.; Hayashi, T.; Phong, L.T.; Wang, L. Privacy-preserving logistic regression with distributed data sources via homomorphic encryption. IEICE Trans. Inf. Syst. 2016, 99, 2079–2089. [Google Scholar] [CrossRef] [Green Version]
- Zheng, L.; Chen, C.; Liu, Y.; Wu, B.; Wu, X.; Wang, L.; Wang, L.; Zhou, J.; Yang, S. Industrial scale privacy preserving deep neural network. arXiv 2020, arXiv:2003.05198. [Google Scholar]
- Phong, L.T. and Phuong, T.T.Privacy-preserving deep learning via weight transmission. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3003–3015. [Google Scholar] [CrossRef] [Green Version]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Minastireanu, E.A.; Mesnita, G. Light gbm machine learning algorithm to online click fraud detection. J. Inform. Assur. Cybersecur. 2019, 2019, 263928. [Google Scholar] [CrossRef]
- Punmiya, R.; Choe, S. Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing. IEEE Trans. Smart Grid 2019, 10, 2326–2329. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, D.; Li, D.; Chen, X.; Zhao, Y.; Niu, X. A mobile recommendation system based on logistic regression and gradient boosting decision trees. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1896–1902. [Google Scholar]
- Li, Q.; Wen, Z.; He, B. Practical federated gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 4642–4649. [Google Scholar]
- Ong, Y.J.; Zhou, Y.; Baracaldo, N.; Ludwig, H. Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning. arXiv 2020, arXiv:2012.06670. [Google Scholar]
- Liu, Y.; Ma, Z.; Liu, X.; Ma, S.; Nepal, S.; Deng, R. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv 2019, arXiv:1907.10218. [Google Scholar]
- Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Yang, Q. Secureboost: A lossless federated learning framework. arXiv 2019, arXiv:1901.08755. [Google Scholar] [CrossRef]
- Feng, Z.; Xiong, H.; Song, C.; Yang, S.; Zhao, B.; Wang, L.; Chen, Z.; Yang, S.; Liu, L.; Huan, J. Securegbm: Secure multi-party gradient boosting. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 1312–1321. [Google Scholar]
- Fang, W.; Chen, C.; Tan, J.; Yu, C.; Lu, Y.; Wang, L.; Wang, L.; Zhou, J.; Liu, A.X. A hybrid-domain framework for secure gradient tree boosting. arXiv 2020, arXiv:2005.08479. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Phong, L.T.; Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1333–1345. [Google Scholar] [CrossRef]
- Shokri, R.; Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1310–1321. [Google Scholar]
- Leung, C. Towards Privacy-Preserving Collaborative Gradient Boosted Decision Tree Learning. 2020. Available online: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-100.pdf (accessed on 4 August 2021).
- Yang, M.; Song, L.; Xu, J.; Li, C.; Tan, G. The tradeoff between privacy and accuracy in anomaly detection using federated XGBoost. arXiv 2019, arXiv:1907.07157. [Google Scholar]
- Wang, Z.; Yang, Y.; Liu, Y.; Liu, X.; Gupta, B.B.; Ma, J. Cloud-based federated boosting for mobile crowdsensing. arXiv 2020, arXiv:2005.05304. [Google Scholar]
- Deforth, K.; Desgroseilliers, M.; Gama, N.; Georgieva, M.; Jetchev, D.; Vuille, M. XORBoost: Tree boosting in the multiparty computation setting. Cryptol. ePrint Arch. Report 2021/432. 2021. Available online: https://eprint.iacr.org/2021/432 (accessed on 15 June 2022).
- Carpov, S.; Deforth, K.; Gama, N.; Georgieva, M.; Jetchev, D.; Katz, J.; Leontiadis, I.; Mohammadi, M.; Sae-Tang, A.; Vuille, M. Manticore: Efficient framework for scalable secure multiparty computation protocols. Cryptol. ePrint Arch. Paper 2021/200. 2021. Available online: https://eprint.iacr.org/2021/200 (accessed on 15 June 2022).
- Bresson, E.; Catalano, D.; Pointcheval, D. A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, 30 November–4 December 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 37–54. [Google Scholar]
- Peter, A.; Tews, E.; Katzenbeisser, S. Efficiently outsourcing multiparty computation under multiple keys. IEEE Trans. Inf. Forensics Secur. 2013, 8, 2046–2058. [Google Scholar] [CrossRef] [Green Version]
- Liu, X.; Deng, R.H.; Choo, K.R.; Weng, J. An Efficient Privacy-Preserving Outsourced Calculation Toolkit with Multiple Keys. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2401–2414. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. Available online: http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com (accessed on 20 August 2021).
- Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–11. [Google Scholar]
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
- Fisher, R.A. Iris Data Set. 1936. Available online: https://archive.ics.uci.edu/ml/datasets/iris (accessed on 7 June 2021).
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Zhong, S. Privacy-preserving backpropagation neural network learning. IEEE Trans. Neural Netw. 2009, 20, 1554–1564. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.W.; Edemacu, K.; Kim, J.S.; Chung, Y.D.; Jang, B. A Survey Of differential privacy-based techniques and their applicability to location-Based services. Comput. Secur. 2021, 111, 102464. [Google Scholar] [CrossRef]
Operation | Key Size (Bits) | ||
---|---|---|---|
512 | 1024 | 2048 | |
Enc | 10.61 ms | 12.02 ms | 16.98 ms |
Dec | 18.27 ms | 29.13 ms | 38.64 ms |
mDec | 29.34 ms | 40.96 ms | 63.72 ms |
KeyProd | 0.02 ms | 0.05 ms | 0.11 ms |
Add | 8.86 ms | 10.23 ms | 12.61 ms |
Mult | 180.74 ms | 259.97 ms | 310.03 ms |
TransDec | 161.20 ms | 196.69 ms | 283.74 ms |
Exp | 2.04 ms | 2.32 ms | 4.01 ms |
Sub | 8.93 ms | 12.06 ms | 13.02 ms |
LGT | 120.17 ms | 143.72 ms | 171.05 ms |
Div | 140.29 ms | 183.18 ms | 251.67 ms |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Edemacu, K.; Kim, J.W. Scalable Multi-Party Privacy-Preserving Gradient Tree Boosting over Vertically Partitioned Dataset with Outsourced Computations. Mathematics 2022, 10, 2185. https://doi.org/10.3390/math10132185
Edemacu K, Kim JW. Scalable Multi-Party Privacy-Preserving Gradient Tree Boosting over Vertically Partitioned Dataset with Outsourced Computations. Mathematics. 2022; 10(13):2185. https://doi.org/10.3390/math10132185
Chicago/Turabian StyleEdemacu, Kennedy, and Jong Wook Kim. 2022. "Scalable Multi-Party Privacy-Preserving Gradient Tree Boosting over Vertically Partitioned Dataset with Outsourced Computations" Mathematics 10, no. 13: 2185. https://doi.org/10.3390/math10132185
APA StyleEdemacu, K., & Kim, J. W. (2022). Scalable Multi-Party Privacy-Preserving Gradient Tree Boosting over Vertically Partitioned Dataset with Outsourced Computations. Mathematics, 10(13), 2185. https://doi.org/10.3390/math10132185