Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model
Abstract
:1. Introduction
- (1)
- Improve the effect of breast cancer prediction. Firstly, a new hybrid feature selection method is proposed to eliminate weakly correlated features and redundant features, which is divided into two steps: in the first step, the features with an absolute value of the Pearson correlation coefficient greater than or equal to 0.3 are screened out; in the second step, the optimal combination of features is screened to find the final features by the iterative RF-OOB algorithm. Then, the BGD algorithm is used to optimize the LR, and the loss function of the model is trained to minimize the loss to improve the classification effect of the model. In order to verify the effectiveness of hybrid feature selection, a control group experiment is set up to compare the results.
- (2)
- Differential privacy protection technology is added to the process of breast cancer prediction. In the BGD algorithm, Gaussian noise is added to each layer of gradient descent, which makes the model have accurate classification performance while protecting data privacy. Finally, the optimal results of the model in this paper are compared with the results in other papers.
2. Methods and Materials
2.1. Differential Privacy
2.2. Pearson Correlation Coefficient Test
2.3. Random Forest Algorithm Based on Out-of-Bag Estimation (RF-OOB)
2.4. Logistic Regression (LR)
2.5. Batch Gradient Descent (BGD)
3. Selection of Indicators for the Evaluation
- (1)
- Accuracy: represents the total proportion of all predictions that are correct (positive and negative categories) and can be expressed as . For breast cancer prediction, a high accuracy rate indicates that the model is better at correctly classifying both malignant and benign tumors. Accuracy is justified because it provides an assessment of overall classification accuracy and can help determine the model’s ability to discriminate between the two types of tumors.
- (2)
- Precision: indicates how many of the samples predicted to be positive are truly positive, which can be expressed as , also known as PPV.
- (3)
- Recall: indicates how many positive cases in the sample were predicted correctly, which can be expressed as , also known as TPR.
- (4)
- F1-score: it can be expressed as . F1-score is a comprehensive evaluation index of extrinsic methods. For breast cancer prediction, F1-score is reasonable because it balances the model’s ability to correctly classify malignant and benign tumors. It also comprehensively evaluates the precision and recall of the model, which is one of the very important evaluation indicators.
- (5)
- The receiver operating characteristic (ROC) curve: in the ROC curve, the horizontal axis is the false positive rate (FPR) and the vertical axis is the true positive rate (TPR). The points closer to (0, 1) correspond to the better classification performance of the model. AUC is the area under the ROC curve, between 0 and 1. As a numerical value it can be visualized to evaluate the classifier, the larger the value the better. When , it is a perfect classifier. When , it is better than random guessing. When , like random guessing, the model has no predictive value. When , the model is less predictive than random guessing.
4. Data Preprocessing
4.1. Introduction to Data
4.2. Data Standardization
5. A Logistic Regression Optimization Model Based on Hybrid Feature Selection and Differential Privacy
5.1. Hybrid Feature Selection
5.2. Logistic Regression Optimization Model Based on Batch Gradient Descent (BGD-LR)
Algorithm 1 BGD-LR algorithm |
that has been filtered with a mixture of features, initialize the Output: prediction results 1. Take the partial derivative of the loss function and compute the gradient using the full training set of samples. 2. Update the model parameters according to Equation (10). 3. Repeat steps 2 through 3 for multiple iterations until the specified number of iterations is reached and return . 4. Calculate the predicted classification results: calculate the predicted values according to the updated θ and Formula (8) in step 2, and output the classification results. |
5.3. Logistic Regression Optimization Model for Batch Gradient Descent with Differential Privacy (BDP-LR)
Algorithm 2 BDP-LR algorithm |
that has been filtered with a mixture of features, initialize the Output: prediction results 1. Take the partial derivative of the loss function and compute the gradient using the full training set of samples. 2. Add Gaussian noise to the gradient descent algorithm for a single layer: choose a suitable privacy budget . According to Equations (11) and (12), the L2 sensitivity upper bound b obtained by the cropping technique is utilized to increase the Gaussian noise. The after adding noise is obtained according to Equation (13). 3. Add Gaussian noise to the BGD algorithm: a. Based on step 1, add noise to gradients, and sum the noise gradient values. b. Calculate the Gaussian noise count value for the number of training samples (sensitivity of 1). c. Divide the value of the noise gradient in (a) by the value of the Gaussian noise figure in (b). 4. Calculate the predicted classification results: calculate the predicted values according to the updated θ and Formula (8) in step 2, and output the classification results. |
5.4. Privacy Analysis of the BDP-LR Algorithm
6. Experimental Results and Analysis
6.1. Experimental Environment and Model Hyperparameters
- Firstly, select a set of parameter value ranges for each hyperparameter.
- Then, evaluate the performance of the adjusted model by the cross-validation method.
- Finally, select the parameter with the best performance as the best combination.
6.2. Experimental Design
- In order to reduce the influence of data magnitude on the model impact, the data are subjected to Z-score standardization in this paper.
- In order to eliminate weakly correlated variables and redundant features from breast cancer data, this paper includes hybrid feature selection on the data.
- In order to test the optimization effect of the BGD model on the LR algorithm, the loss function graph of the BGD-LR is analyzed in this paper.
- In order to testify to the impact of hybrid feature selection algorithms on model performance, we set up a control group experiment and analyze the results from the four main evaluation indicators: accuracy, precision, recall, and F1-score.
- The experimental results of this paper are compared with those of other papers. The breast cancer classification model proposed in this paper is compared with existing research results without considering privacy protection.
- The prediction results of the BDP-LR model in this paper are compared with other machine learning models based on differential privacy when privacy protection is considered.
6.3. Analysis of Experimental Results
6.3.1. Results of Data Standardization
6.3.2. Results of Hybrid Feature Selection
6.3.3. Loss Function of the BGD-LR Model
6.3.4. Impact of Hybrid Feature Selection Algorithms on Model Performance
6.3.5. Comparative Analysis with Previous Studies
6.3.6. Comparative Analysis of BDP-LR Model Results with Other Models
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Yan, Q.; Fan, C.; Mo, Y.; Wang, Y.; Li, X.; Liao, Q.; Guo, C.; Li, G.; Zeng, Z.; et al. Overview and countermeasures of cancer burden in China. Sci. China Life Sci. 2023, 66, 1–12. [Google Scholar] [CrossRef]
- Jakkaladiki, S.P.; Maly, F. An efficient transfer learning based cross model classification (TLBCM) technique for the prediction of breast cancer. PeerJ Comput. Sci. 2023, 9, e1281. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Wang, N.; Du, X.; Mei, K.; Zhou, Y.; Cai, G. Classification Prediction of Breast Cancer Based on Machine Learning. Comput. Intell. Neurosci. 2023, 2023, 6530719. [Google Scholar] [CrossRef] [PubMed]
- Xiao, X. A Study of the Correlation between the Pathologic, Ultrasound, and MRI Manifestations of Breast Cancer and Localized Intravascular Cancerous Emboli. Master’s Thesis, University of South China, Hengyang, China, 2021. [Google Scholar]
- Qin, J.; Wang, T.Y.; Willmann, J.K. Sonoporation: Applications for Cancer Therapy. Adv. Exp. Med. Biol. 2016, 880, 263–291. [Google Scholar]
- Alromema, N.; Syed, A.H.; Khan, T. A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data. Diagnostics 2023, 13, 708. [Google Scholar] [CrossRef] [PubMed]
- Amorim, J.P.; Abreu, P.H.; Fernández, A.; Reyes, M.; Santos, J.; Abreu, M.H. Interpreting Deep Machine Learning Models: An Easy Guide for Oncologists. Rev. Biomed. Eng. 2023, 16, 192–207. [Google Scholar] [CrossRef]
- Arpit, B.; Harshit, B.; Aditi, S.; Ziya, U.; Maneesha, S.; Wubshet, I. Tree-Based and Machine Learning Algorithm Analysis for Breast Cancer Classification. Comput. Intell. Neurosci. 2022, 2022, 6715406. [Google Scholar]
- Ak, M.F. A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications. Healthcare 2020, 8, 111. [Google Scholar] [CrossRef]
- Mahesh, T.R.; Vinoth Kumar, V.; Vivek, V.; Karthick Raghunath, K.M.; Sindhu Madhuri, G. Early predictive model for breast cancer classification using blended ensemble learning. Int. J. Syst. Assur. Eng. Manag. 2022. [Google Scholar] [CrossRef]
- Naseem, U.; Rashid, J.; Ali, L.; Kim, J.; Haq, Q.E.U.; Awan, M.J.; Imran, M. An Automatic Detection of Breast Cancer Diagnosis and Prognosis Based on Machine Learning Using Ensemble of Classifiers. IEEE Access 2022, 10, 78242–78252. [Google Scholar] [CrossRef]
- Abdar, M.; Zomorodi-Moghadam, M.; Zhou, X.; Gururajan, R.; Tao, X.; Barua, P.D.; Gururajan, R. A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recognit. Lett. 2020, 132, 123–131. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Y.; Wang, D.; Yin, Y.; Wang, Y.; Jin, Y. An improved random forest-based rule extraction method for breast cancer diagnosis. Appl. Soft Comput. 2020, 86, 105941. [Google Scholar] [CrossRef]
- Wang, H.; Zheng, B.; Yoon, S.W.; Ko, H.S. A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur. J. Oper. Res. 2018, 267, 687–699. [Google Scholar] [CrossRef]
- Zheng, B.; Yoon, S.; Lam, S.S. Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Syst. Appl. 2014, 41, 1476–1482. [Google Scholar] [CrossRef]
- Kumar, A.; Sushil, R.; Tiwari, A.K. Classification of Breast Cancer using User-Defined Weighted Ensemble Voting Scheme. In Proceedings of the TENCON 2021—2021 IEEE Region 10 Conference (TENCON), Auckland, New Zealand, 7–10 December 2021; pp. 134–139. [Google Scholar]
- Jia, X.S.; Sun, X.; Zhang, X. Breast cancer identification using machine learning. Math. Probl. Eng. 2022, 2022, 8122895. [Google Scholar] [CrossRef]
- Chaurasia, V.; Pal, S. Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer. SN Comput. Sci. 2020, 1, 270. [Google Scholar] [CrossRef]
- Zohaib, M.; Akbari, Y.; Shaima, S.; Adnan, K. Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets. J. Chin. Inst. Eng. 2020, 43, 80–92. [Google Scholar]
- Sahebi, G.; Movahedi, P.; Ebrahimi, M.; Pahikkala, T.; Plosila, J.; Tenhunen, H. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance. Comput. Biol. Med. 2020, 125, 103974. [Google Scholar] [CrossRef]
- Agustian, F.; Lubis, M.D.I. Particle Swarm Optimization Feature Selection for Breast Cancer Prediction. In Proceedings of the 8th International Conference on Cyber and IT Service Management (CITSM), Pangkal, Indonesia, 23–24 October 2020. [Google Scholar]
- Murugesan, S.; Bhuvaneswaran, R.S.; Khanna Nehemiah, H.; Keerthana Sankari, S.; Nancy Jane, Y. Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner. Comput. Math. Methods Med. 2021, 2021, 6662420. [Google Scholar] [CrossRef]
- Naik, A.K.; Kuppili, V.; Edla, D.R. Efficient feature selection using one-pass generalized classifier neural network and binary bat algorithm with a novel fitness function. Soft Comput. 2020, 24, 4575–4587. [Google Scholar] [CrossRef]
- Singh, D.; Singh, B.; Kaur, M. Simultaneous feature weighting and parameter determination of Neural Networks using Ant Lion Optimization for the classification of breast cancer. Biocybern. Biomed. Eng. 2020, 40, 337–351. [Google Scholar]
- Zhang, T.; Zhu, T.; Xiong, P.; Huo, H.; Tari, Z.; Zhou, W. Correlated Differential Privacy: Feature Selection in Machine Learning. IEEE Trans. Ind. Inform. 2020, 16, 2115–2124. [Google Scholar] [CrossRef]
- Rao, H.; Shi, X.; Rodrigue, A.K.; Feng, J.; Xia, Y.; Elhoseny, M.; Yuan, X.; Gu, L. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 2019, 74, 634–642. [Google Scholar] [CrossRef]
- Algherairy, A.; Almattar, W.; Bakri, E.; Albelali, S. The Impact of Feature Selection on Different Machine Learning Models for Breast Cancer Classification. In Proceedings of the 7th International Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 1–3 March 2022. [Google Scholar]
- Abdel-Basset, M.; El-Shahat, D.; El-henawy, I.; De Albuquerque, V.H.C.; Mirjalili, S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst. Appl. 2020, 139, 112824. [Google Scholar] [CrossRef]
- Mahesh, T.R.; Vinoth Kumar, V.; Muthukumaran, V.; Shashikala, H.K.; Swapna, B.; Guluwadi, S. Performance Analysis of XGBoost Ensemble Methods for Survivability with the Classification of Breast Cancer. J. Sens. 2022, 2022, 4649510. [Google Scholar] [CrossRef]
- Singh, L.K.; Khanna, M.; Singh, R. Artificial intelligence based medical decision support system for early and accurate breast cancer prediction. Adv. Eng. Softw. 2023, 175, 103338. [Google Scholar] [CrossRef]
- Ji, S.; Du, T.; Li, J.; Shen, C.; Li, B. A Review of Machine Learning Model Security and Privacy Research. Softw. J. 2021, 32, 41–67. [Google Scholar]
- Chen, H.; Zhou, Y.; Mei, K.; Wang, N.; Cai, G. A New Density Peak Clustering Algorithm with Adaptive Clustering Center Based on Differential Privacy. IEEE Access 2023, 11, 1418–1431. [Google Scholar] [CrossRef]
- Zhao, Y.; Yang, M. A Review of Advances in Differential Privacy Research. Comput. Sci. 2023, 50, 265–276. [Google Scholar]
- Dwork, C. Differential privacy. In Proceedings of the 33rd International Colloquium Automata, Languages and Programming, Venice, Italy, 10–14 July 2006; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4052, pp. 1–12. [Google Scholar]
- Vaidya, J.; Shafiq, B.; Basu, A.; Hong, Y. Differentially Private Naive Bayes Classification. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Atlanta, GA, USA, 17–20 November 2013. [Google Scholar]
- Fletcher, S.; Islam, M.Z. Differentially private random decision forests using smooth sensitivity. Expert Syst. Appl. 2017, 78, 16–31. [Google Scholar] [CrossRef]
- Nori, H.; Caruana, R.; Bu, Z.; Shen, J.H.; Kulkarni, J. Accuracy, Interpretability, and Differential Privacy via Explainable Boosting. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021. [Google Scholar]
- Shen, Q.; Wu, P. Research Progress on Privacy Preserving Technologies in Big Data Computing Environments. J. Comput. 2022, 45, 669–701. [Google Scholar]
- Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 2013, 9, 211–407. [Google Scholar] [CrossRef]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC), New York, NY, USA, 4–7 March 2006. [Google Scholar]
- Li, Y.; Feng, Y.; Qian, Q. FDPBoost: Federated differential privacy gradient boosting decision trees. J. Inf. Secur. Appl. 2023, 74, 103468. [Google Scholar] [CrossRef]
- Xinzhou, B. Research on Application Technologies of Differential Privacy in Machine Learning. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2022. [Google Scholar]
- Liu, Y.; Mu, Y.; Chen, K.; Li, Y.; Guo, J. Daily Activity Feature Selection in Smart Homes Based on Pearson Correlation Coefficient. Neural Process. Lett. 2020, 51, 1771–1787. [Google Scholar] [CrossRef]
- Li, Y.; Chen, H.; Li, Q.; Liu, A. Random forest algorithm based on out-of-packet estimation under differential privacy. J. Harbin Inst. Technol. 2021, 53, 146–154. [Google Scholar]
- Sun, Y.; Lin, W. Application of Gradient Descent to Machine Learning. J. Suzhou Univ. Sci. Technol. Nat. Sci. Ed. 2018, 35, 26–31. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Mangasarian, O.L.; William, H.W. Cancer Diagnosis via Linear Programming; University of Wisconsin-Madison Department of Computer Sciences: Madison, WI, USA, 1990. [Google Scholar]
- Das, M.K.; Chaudhary, A.; Bryan, A.; Wener, M.H.; Fink, S.L.; Morishima, C. Rapid Screening Evaluation of SARS-CoV-2 IgG Assays Using Z-Scores to Standardize Results. Emerg. Infect. Dis. 2020, 26, 2501–2503. [Google Scholar] [CrossRef] [PubMed]
- Du, Q. An Online Logistic Regression Study Based on Differential Privacy. Master’s Thesis, Northwest University, Xi’an, China, 2021. [Google Scholar]
- Xie, Y.; Li, P.; Wu, C.; Wu, Q. Differential Privacy Stochastic Gradient Descent with Adaptive Privacy Budget Allocation. In Proceedings of the IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–16 January 2021. [Google Scholar]
- Kairouz, P.; Oh, S.; Viswanath, P. The Composition Theorem for Differential Privacy. IEEE Trans. Inf. Theory 2017, 63, 4037–4049. [Google Scholar] [CrossRef]
Label | Predicted Results | ||
---|---|---|---|
Positive | Negative | ||
Real situation | Positive | ||
Negative |
Texture_Mean | Perimeter_Mean | Area_Mean | Smoothness_Mean | Symmetry_Mean |
---|---|---|---|---|
10.38 | 122.8 | 1001 | 0.1184 | 0.2419 |
17.77 | 132.9 | 1326 | 0.08474 | 0.1812 |
21.25 | 130 | 1203 | 0.1096 | 0.2069 |
20.38 | 77.58 | 386.1 | 0.1425 | 0.2597 |
14.34 | 135.1 | 1297 | 0.1003 | 0.1809 |
15.7 | 82.57 | 477.1 | 0.1278 | 0.2087 |
19.98 | 119.6 | 1040 | 0.09463 | 0.1794 |
Model | Hyperparameter | Meaning | Value |
---|---|---|---|
RF | n_estimators | Number of weak classifiers | 200 |
oob_score | Whether to use out-of-bag samples | TRUE | |
LR | penalty | Penalty term | L2 |
solver | Optimization algorithm | liblinear | |
C | The inverse of the regularized intensity | 1 | |
GDP-EBM | min_samples_leaf | Minimum number of samples at leaf nodes | 2 |
learning_rate | Learning rate | 0.03 | |
DP-NB | var_smoothing | Smoothing parameter | 0.000000001 |
DP-DT | max_depth | Maximum number of layers generated | 9 |
DP-RF | max_depth | Maximum number of layers generated | 10 |
n_estimators | Number of weak classifiers | 100 |
Compactness_Mean | Concavity_Mean | Concave Points_Mean | Radius_se | Perimeter_se |
---|---|---|---|---|
3.283515 | 2.652874 | 2.532475 | 2.489734 | 2.833031 |
−0.48707 | −0.02385 | 0.548144 | 0.499255 | 0.263327 |
1.052926 | 1.363478 | 2.037231 | 1.228676 | 0.850928 |
3.402909 | 1.915897 | 1.451707 | 0.326373 | 0.286593 |
0.53934 | 1.371011 | 1.428493 | 1.270543 | 1.273189 |
1.244335 | 0.866302 | 0.824656 | −0.25507 | −0.3213 |
0.088295 | 0.300072 | 0.646935 | 0.149883 | 0.15541 |
Number | Feature | Feature_Importance | |
---|---|---|---|
0 | radius_worst | 0.77645 | 0.15864 |
1 | perimeter_worst | 0.78291 | 0.15337 |
2 | concave_points_worst | 0.79357 | 0.11731 |
3 | area_worst | 0.73383 | 0.11674 |
4 | concave points_mean | 0.77661 | 0.06276 |
5 | area_mean | 0.70898 | 0.05056 |
6 | perimeter_mean | 0.74264 | 0.04596 |
7 | concavity_mean | 0.69636 | 0.04008 |
8 | area_se | 0.54824 | 0.04007 |
9 | radius_mean | 0.73003 | 0.03839 |
10 | concavity_worst | 0.65961 | 0.03644 |
11 | smoothness_worst | 0.42147 | 0.01698 |
12 | texture_worst | 0.45690 | 0.01536 |
13 | texture_mean | 0.41519 | 0.01475 |
14 | perimeter_se | 0.55614 | 0.01408 |
15 | radius_se | 0.56713 | 0.01369 |
16 | symmetry_worst | 0.41629 | 0.01368 |
17 | compactness_worst | 0.59100 | 0.01291 |
18 | compactness_mean | 0.59653 | 0.00994 |
19 | concave points_se | 0.40804 | 0.00884 |
20 | fractal_dimension_worst | 0.32387 | 0.00851 |
21 | smoothness_mean | 0.35856 | 0.00646 |
22 | symmetry_mean | 0.33050 | 0.00447 |
Feature Combination | oob-Score |
---|---|
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21 | 0.96837 |
13, 15, 20, 16, 7 | 0.96662 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19 | 0.96662 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18, 5, 12, 22, 4, 8 | 0.96639 |
13, 15 | 0.96488 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9 | 0.96488 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18, 5, 12, 22 | 0.96488 |
13 | 0.96487 |
13, 15, 20, 16 | 0.96487 |
13, 15, 20, 16, 7, 3, 2, 6 | 0.96487 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18, 5, 12, 22, 4 | 0.96487 |
13, 15, 20 | 0.96313 |
13, 15, 20, 16, 7, 3, 2, 6, 11 | 0.96310 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18 | 0.96310 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17 | 0.96136 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1 | 0.96136 |
13, 15, 20, 16, 7, 3 | 0.95960 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14 | 0.95960 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0 | 0.95959 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10 | 0.95959 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18, 5, 12 | 0.95959 |
13, 15, 20, 16, 7, 3, 2, 6, 11, 0, 19, 17, 14, 1, 10, 9, 21, 18, 5 | 0.95785 |
13, 15, 20, 16, 7, 3, 2 | 0.95609 |
Model | Privacy | Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|---|---|
Control group | BGD-LR | 0 | 0.9649 | 1.0000 | 0.9545 | 0.9767 |
BDP-LR | 0.2 | 0.8936 | 0.9840 | 0.8765 | 0.9256 | |
0.4 | 0.9249 | 0.9903 | 0.9117 | 0.9490 | ||
0.6 | 0.9454 | 0.9928 | 0.9361 | 0.9634 | ||
0.8 | 0.9563 | 0.9949 | 0.9483 | 0.9709 | ||
1 | 0.9566 | 0.9954 | 0.9482 | 0.9711 | ||
Experimental group | BGD-LR | 0 | 0.9912 | 1.0000 | 0.9886 | 0.9943 |
BDP-LR | 0.2 | 0.9170 | 0.9849 | 0.9065 | 0.9431 | |
0.4 | 0.9561 | 0.9933 | 0.9495 | 0.9706 | ||
0.6 | 0.9629 | 0.9959 | 0.9559 | 0.9753 | ||
0.8 | 0.9721 | 0.9975 | 0.9664 | 0.9816 | ||
1 | 0.9777 | 0.9981 | 0.9731 | 0.9853 |
Literature | Method of Feature Selection | Method of Classification | Time | Accuracy |
---|---|---|---|---|
[26] | ABC | XGBoost | 2019 | 0.928 |
[21] | GA | SVM | 2020 | 0.988 |
[20] | GeFeS | KNN | 2020 | 0.985 |
[18] | χ2 test + (ET) + (RFE) + RF | ET | 2020 | 0.952 |
[19] | WCHI2 | KNN | 2020 | 0.986 |
[24] | ALO | BPNN | 2020 | 0.9842 |
[28] | GWO | KNN | 2020 | 0.948 |
[23] | BBA | OGCNN | 2020 | 0.935 |
[22] | Krill herd (KH) + SVM | BPNN | 2021 | 0.978 |
[27] | Forward selection | LR | 2022 | 0.982 |
[30] | ESO | RF | 2023 | 0.9896 |
[12] | - | SV-naïve Bayes-3-MetaClassifiers | 2020 | 0.981 |
[13] | - | IRFRE | 2020 | 0.951 |
[11] | - | (SVM + LR + NB + DT) + ANN | 2022 | 0.9883 |
[9] | - | LR | 2020 | 0.981 |
[8] | - | RF | 2022 | 0.9624 |
[10] | - | EL | 2022 | 0.9814 |
this paper | Pearson + RF-OOB | BGD + LR | 0.9912 |
Evaluation Indicators | BDP-LR | GDP-EBM | DP-NB | DP-RF | DP-DT |
---|---|---|---|---|---|
Accuracy | 0.9721 | 0.9439 | 0.8927 | 0.9070 | 0.8793 |
Precision | 0.9975 | 0.9826 | 0.9786 | 0.9276 | 0.9506 |
Recall | 0.9664 | 0.9443 | 0.8825 | 0.9545 | 0.8931 |
F1-score | 0.9816 | 0.9620 | 0.9119 | 0.9402 | 0.9175 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, H.; Wang, N.; Zhou, Y.; Mei, K.; Tang, M.; Cai, G. Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model. Appl. Sci. 2023, 13, 10755. https://doi.org/10.3390/app131910755
Chen H, Wang N, Zhou Y, Mei K, Tang M, Cai G. Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model. Applied Sciences. 2023; 13(19):10755. https://doi.org/10.3390/app131910755
Chicago/Turabian StyleChen, Hua, Nan Wang, Yuan Zhou, Kehui Mei, Mengdi Tang, and Guangxing Cai. 2023. "Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model" Applied Sciences 13, no. 19: 10755. https://doi.org/10.3390/app131910755
APA StyleChen, H., Wang, N., Zhou, Y., Mei, K., Tang, M., & Cai, G. (2023). Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model. Applied Sciences, 13(19), 10755. https://doi.org/10.3390/app131910755