A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection
Abstract
:1. Introduction
- (1)
- Enhanced performance: By combining multiple divide-and-conquer Markov boundary discovery algorithms into an ensemble framework, the proposed method can leverage the strengths of each individual algorithm. This ensemble approach often leads to improved performance compared to using a single algorithm, as it can capture a more diverse range of patterns and relationships in the data.
- (2)
- Balanced precision and recall: The ensemble framework based on the U-I selection strategy allows for a better balance between precision and recall. This balance is essential in ensuring that the discovered Markov boundaries are both accurate and comprehensive, which is crucial for understanding causal relationships in complex datasets.
- (3)
- Addressing dataset variability: Different datasets may exhibit varying characteristics and complexities. The ensemble method can adapt to these variations by integrating multiple algorithms, thereby improving the adaptability and generalizability of the Markov boundary discovery process across diverse datasets.
- (1)
- Enhanced robustness: By integrating multiple divide-and-conquer Markov boundary discovery algorithms into an ensemble framework, the proposed method can enhance the robustness of the overall approach. This robustness comes from the ability to leverage the strengths of different algorithms and mitigate the weaknesses of individual methods present in previous works.
- (2)
- Handling controversial parent–child variables: The proposed method focuses on judging controversial parent–child variables to further refine the Markov boundary discovery process. By addressing contentious relationships between variables, the method can provide more accurate and reliable results compared to approaches that might overlook or misinterpret such relationships.
- (3)
- Comparative analysis: By comparing the results of the ensemble method with those of individual algorithms, researchers can gain insights into the strengths and weaknesses of each algorithm. This comparative analysis can further refine the Markov boundary discovery process and enhance the overall quality of the findings.
2. Definitions and Theorems
3. Methods
3.1. Several Typical Divide-and-Conquer Markov Boundary Discovery Algorithms
3.1.1. PCMB
Algorithm 1 PCMB (T, D) |
Input: Data D, target node T Initialize PCD = ∅ Initialize CanPCD = {T} Initialize PC = ∅ Initialize SP = ∅ {In the GetPC stage} repeat for each X in CanPCD do if then for each X in PCD do if then until PCD remains unchanged return PCD for each X in PCD do if then return PC MB = PC {In the GetSP stage} for each Y in PC do for each X in PC(Y, D) do if then if then return MB Output: Markov boundary MB of T |
3.1.2. MBOR
Algorithm 2 MBOR (T, D) |
Input: Data D, target node T Initialize PC = ∅ Initialize SP = ∅ {In the GetPC stage} for X in PCS and X not in PC do if T in then {In the GetSP stage} for X in PC do for Y in and Y not in do If then Output: Markov boundary MB of T |
3.1.3. STMB
Algorithm 3 STMB (T, D) |
Input: Data D, target node T Initialize PC = ∅ Initialize SP = ∅ Initialize Del = ∅ {In the GetPC stage} {In the GetSP and remove false nodes stage} for each Y in PC do for each X in do if if Break; else for each Y in SP do for each X in SP{Y} do If PCM = PC for X in PCM do if return MB Output: Markov boundary MB of T |
3.2. Ensemble Method of Divide-and-Conquer Markov Boundary Discovery Algorithms
3.2.1. Controversial Parent–Child Nodes and U-I Select Strategy
3.2.2. Ensemble of Divide-and-Conquer Markov Boundary Algorithms
- Obtain the candidate PC set by the PCMB algorithm, and the candidate PC set by the MBOR algorithm.
- Obtain the intersection PC set , the union PC set , and the difference PC set from the candidate PC sets.
- Remove the non-PC nodes from the set using the same steps as STMB, and take the union with as a .
- Find the set of spouse nodes through the same steps as PCMB.
- Obtain the MB set by taking the union of and .
Algorithm 4 EDMB(T) |
Input: Target T, Data U Initialize , SP, {step 1: find the candidate PC sets} {step 2: obtain the intersection, union, and difference of the PC sets {step 3: remove wrong PC nodes from } For each do For each do If then If , and {step 4: find the spouse nodes} For each do For each do if ,then find st if then Return SP {step 5: obtain the MB set} |
4. Results
4.1. Standard MB Discovery Datasets
4.2. Real Word Datasets
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Nssibi, M.; Manita, G.; Korbaa, O. Advances in Nature-Inspired Metaheuristic Optimization for Feature Selection Problem: A Comprehensive Survey. Comput. Sci. Rev. 2023, 49, 100559. [Google Scholar] [CrossRef]
- Fu, S.; Desmarais, M.C. Markov Blanket Based Feature Selection: A Review of Past Decade. In Proceedings of the World Congress on Engineering, London, UK, 30 June–2 July 2010; Newswood Ltd.: Hong Kong, China, 2010; Volume 1, pp. 321–328. [Google Scholar]
- Raffa, M. Markov Blankets for Sustainability. In Proceedings of the International Conference on Software Engineering and Formal Methods, Berlin, Germany, 26–30 September 2022; Springer: Cham, Switzerland, 2022; pp. 313–323. [Google Scholar]
- Koller, D.; Sahami, M. Toward Optimal Feature Selection. In Proceedings of the ICML’96, Bari, Italy, 3–6 July 1996; Volume 96, p. 292. [Google Scholar]
- Margaritis, D. Toward Provably Correct Feature Selection in Arbitrary Domains. In Proceedings of the NIPS’09: Proceedings of the 22nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 7–10 December 2009. [Google Scholar]
- Tsamardinos, I.; Aliferis, C.F.; Statnikov, A.R.; Statnikov, E. Algorithms for Large Scale Markov Blanket Discovery. In Proceedings of the FLAIRS, St. Augustine, FL, USA, 11–15 May 2003; Volume 2, pp. 376–381. [Google Scholar]
- Yaramakala, S.; Margaritis, D. Speculative Markov Blanket Discovery for Optimal Feature Selection. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX, USA, 27–30 November 2005. [Google Scholar]
- Pena, J.M.; Nilsson, R.; Björkegren, J. Towards Scalable and Data Efficient Learning of Markov Boundaries. Int. J. Approx. Reason. 2007, 45, 211–232. [Google Scholar] [CrossRef]
- Yang, X.; Wang, Y.; Ou, Y.; Tong, Y. Three-Fast-Inter Incremental Association Markov Blanket Learning Algorithm. Pattern Recognit. Lett. 2019, 122, 73–78. [Google Scholar] [CrossRef]
- Borboudakis, G.; Tsamardinos, I. Forward-Backward Selection with Early Dropping. J. Mach. Learn. Res. 2019, 20, 1–39. [Google Scholar]
- Tsamardinos, I.; Borboudakis, G.; Katsogridakis, P.; Pratikakis, P.; Christophides, V. A Greedy Feature Selection Algorithm for Big Data of High Dimensionality. Mach. Learn. 2019, 108, 149–202. [Google Scholar] [CrossRef] [PubMed]
- Tsamardinos, I.; Aliferis, C.F.; Statnikov, A. Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 673–678. [Google Scholar]
- Aliferis, C.F.; Tsamardinos, I.; Statnikov, A. HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection. In Proceedings of the AMIA—American Medical Informatics Association Annual Symposium, Washington, DC, USA, 8–12 November 2003; pp. 21–25. [Google Scholar]
- Aliferis, C.F.; Statnikov, A.; Tsamardinos, I. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. J. Mach. Learn. Res. 2010, 11, 235–284. [Google Scholar]
- Peña, J.M.; Björkegren, J.; Tegnér, J. Scalable, Efficient and Correct Learning of Markov Boundaries under the Faithfulness Assumption. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2005; pp. 136–147. [Google Scholar]
- Fu, S.; Desmarais, M.C. Fast Markov Blanket Discovery Algorithm via Local Learning within Single Pass. In Advances in Artificial Intelligence; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2008; pp. 96–107. [Google Scholar]
- Gao, T.; Ji, Q. Efficient Markov Blanket Discovery and Its Application. IEEE Trans. Cybern. 2017, 47, 1169–1179. [Google Scholar] [CrossRef] [PubMed]
- Ling, Z.; Yu, K.; Wang, H.; Liu, L.; Ding, W.; Wu, X. BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 52. [Google Scholar] [CrossRef]
- Wang, H.; Ling, Z.; Yu, K.; Wu, X. Towards Efficient and Effective Discovery of Markov Blankets for Feature Selection. Inf. Sci. 2020, 509, 227–242. [Google Scholar] [CrossRef]
- Rodrigues de Morais, S.; Aussem, A. A Novel Scalable and Data Efficient Feature Subset Selection Algorithm. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 15–19 September 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 298–312. [Google Scholar]
- Wu, X.; Jiang, B.; Yu, K.; Miao, C.; Chen, H. Accurate Markov Boundary Discovery for Causal Feature Selection. IEEE Trans. Cybern. 2019, 50, 4983–4996. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Jiang, B.; Yu, K.; Chen, H. Separation and Recovery Markov Boundary Discovery and Its Application in EEG-Based Emotion Recognition. Inf. Sci. 2021, 571, 262–278. [Google Scholar] [CrossRef]
- Guo, X.; Yu, K.; Cao, F.; Li, P.; Wang, H. Error-Aware Markov Blanket Learning for Causal Feature Selection. Inf. Sci. 2022, 589, 849–877. [Google Scholar] [CrossRef]
- Acid, S.; de Campos, L.M.; Fernández, M. Score-Based Methods for Learning Markov Boundaries by Searching in Constrained Spaces. Data Min. Knowl. Discov. 2013, 26, 174–212. [Google Scholar] [CrossRef]
- Gao, T.; Ji, Q. Efficient Score-Based Markov Blanket Discovery. Int. J. Approx. Reason. 2017, 80, 277–293. [Google Scholar] [CrossRef]
- Wu, X.; Jiang, B.; Zhong, Y.; Chen, H. Tolerant Markov Boundary Discovery for Feature Selection. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2261–2264. [Google Scholar]
- Adam, S.P.; Alexandropoulos, S.-A.N.; Pardalos, P.M.; Vrahatis, M.N. No Free Lunch Theorem: A Review. In Approximation and Optimization: Algorithms, Complexity and Applications; Springer: Cham, Switzerland, 2019; pp. 57–82. [Google Scholar]
- Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: Burlington, MA, USA, 1988. [Google Scholar]
- Chickering, D.M.; Meek, C. On the Incompatibility of Faithfulness and Monotone DAG Faithfulness. Artif. Intell. 2006, 170, 653–666. [Google Scholar] [CrossRef]
- Tsamardinos, I.; Aliferis, C.F. Towards Principled Feature Selection: Relevancy, Filters and Wrappers. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, PMLR, Key West, FL, USA, 3–6 January 2003; pp. 300–307. [Google Scholar]
- Pellet, J.-P.; Elisseeff, A. Using Markov Blankets for Causal Structure Learning. J. Mach. Learn. Res. 2008, 9, 1295–1342. [Google Scholar]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Tsamardinos, I.; Brown, L.E.; Aliferis, C.F. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
- Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
- Friedman, M. A Comparison of Alternative Tests of Significance for the Problem of m Rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
- Brown, G.; Pocock, A.; Zhao, M.-J.; Luján, M. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
Dataset | Alarm | Alarm3 | Alarm5 | Child | Child3 | Child5 | Insurance | Insurance3 | Insurance5 |
---|---|---|---|---|---|---|---|---|---|
Variables | 37 | 111 | 185 | 20 | 60 | 100 | 27 | 81 | 135 |
Edges | 46 | 149 | 265 | 25 | 79 | 126 | 52 | 163 | 281 |
Dataset | Metric | MBOR | PCMB | STMB | IAMB | EDMB |
---|---|---|---|---|---|---|
Alarm | Distance | 6.25% ± 1.68% | 5.22% ± 2.62% | 29.53% ± 2.87% | 15.63% ± 2.88% | 5.21% ± 2.13% |
Recall | 95.79% ± 1.06% | 95.64% ± 2.27% | 96.11% ± 1.02% | 89.70% ± 0.87% | 96.39% ± 1.79% | |
Precision | 97.20% ± 1.24% | 98.65% ± 0.97% | 73.50% ± 2.68% | 92.44% ± 2.79% | 97.87% ± 1.14% | |
Alarm3 | Distance | 15.92% ± 1.60% | 17.64% ± 1.14% | 47.95% ± 2.01% | 31.75% ± 1.30% | 15.60% ± 1.47% |
Recall | 87.28% ± 0.70% | 84.57% ± 0.88% | 90.14% ± 0.88% | 83.22% ± 0.77% | 86.79% ± 0.87% | |
Precision | 94.45% ± 1.98% | 95.31% ± 0.83% | 57.42% ± 2.19% | 80.49% ± 1.40% | 95.59% ± 1.48% | |
Alarm5 | Distance | 18.14% ± 0.84% | 21.23% ± 1.13% | 55.84% ± 1.30% | 41.59% ± 1.58% | 18.66% ± 0.96% |
Recall | 85.21% ± 0.71% | 80.75% ± 1.19% | 87.51% ± 0.81% | 79.20% ± 0.63% | 83.90% ± 0.94% | |
Precision | 94.87% ± 0.73% | 96.03% ± 0.53% | 50.53% ± 1.27% | 72.16% ± 1.43% | 95.80% ± 0.83% | |
Child | Distance | 4.66% ± 3.09% | 1.51% ± 1.51% | 16.84% ± 3.33% | 20.62% ± 4.00% | 2.11% ± 2.17% |
Recall | 98.20% ± 0.93% | 99.29% ± 0.94% | 98.12% ± 1.65% | 90.11% ± 1.38% | 99.19% ± 1.09% | |
Precision | 96.75% ± 2.98% | 99.21% ± 1.02% | 84.39% ± 3.34% | 86.77% ± 3.97% | 98.67% ± 1.80% | |
Child3 | Distance | 6.27% ± 1.15% | 6.14% ± 0.82% | 35.83% ± 2.58% | 34.73% ± 3.09% | 5.58% ± 1.02% |
Recall | 96.89% ± 0.41% | 96.14% ± 0.77% | 97.09% ± 0.48% | 89.61% ± 0.61% | 96.73% ± 0.41% | |
Precision | 96.67% ± 1.24% | 97.47% ± 0.99% | 66.34% ± 2.31% | 73.00% ± 3.20% | 97.45% ± 1.17% | |
Child5 | Distance | 4.11% ± 1.50% | 3.14% ± 1.82% | 41.31% ± 2.17% | 39.33% ± 2.16% | 3.38% ± 1.74% |
Recall | 99.75% ± 0.29% | 99.34% ± 0.95% | 99.74% ± 0.33% | 92.21% ± 0.44% | 99.62% ± 0.49% | |
Precision | 96.12% ± 1.35% | 97.48% ± 1.22% | 58.81% ± 2.13% | 66.37% ± 2.14% | 96.97% ± 1.54% | |
Insurance | Distance | 28.11% ± 3.42% | 36.79% ± 2.26% | 42.22% ± 1.57% | 35.08% ± 2.45% | 30.07% ± 2.88% |
Recall | 75.10% ± 2.26% | 68.04% ± 1.60% | 80.39% ± 1.09% | 68.57% ± 1.61% | 73.74% ± 2.03% | |
Precision | 93.74% ± 3.27% | 89.03% ± 3.11% | 70.16% ± 2.19% | 92.16% ± 2.82% | 92.62% ± 2.92% | |
Insurance3 | Distance | 28.68% ± 1.68% | 27.27% ± 1.52% | 59.62% ± 1.57% | 40.61% ± 1.15% | 26.24% ± 1.53% |
Recall | 77.11% ± 0.72% | 78.23% ± 1.54% | 83.80% ± 0.87% | 68.74% ± 0.68% | 79.42% ± 1.23% | |
Precision | 89.96% ± 1.65% | 90.34% ± 0.91% | 47.57% ± 2.05% | 85.38% ± 1.53% | 90.62% ± 1.20% | |
Insurance5 | Distance | 28.64% ± 0.91% | 28.81% ± 0.84% | 64.52% ± 1.95% | 46.30% ± 1.50% | 27.14% ± 0.77% |
Recall | 76.62% ± 0.55% | 75.22% ± 0.75% | 82.35% ± 0.61% | 66.62% ± 0.62% | 77.01% ± 0.61% | |
Precision | 90.83% ± 1.14% | 91.89% ± 1.01% | 42.90% ± 2.22% | 81.80% ± 1.49% | 92.31% ± 1.06% | |
Friedman’s ranks | Distance rank score | 2.3333 | 2.3333 | 4.8889 | 4 | 1.4444 |
Rank | 2 | 2 | 5 | 4 | 1 | |
Recall rank score | 2.4444 | 3.6667 | 1.5556 | 4.8889 | 2.4444 | |
Rank | 2 | 4 | 1 | 5 | 2 | |
Precision rank score | 2.7778 | 1.6667 | 5 | 3.8889 | 1.6667 | |
Rank | 3 | 1 | 5 | 4 | 1 |
Target | Real MB | MBOR | PCMB | STMB | IAMB | EDMB | |
---|---|---|---|---|---|---|---|
2 | MB | [23,27,29] | [22,23,27,29] | [23,27,29] | [23,27,29] | [23,27,29] | [23,27,29] |
Precision | 75.00% | 100.00% | 100.00% | 100.00% | 100.00% | ||
Recall | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | ||
14 | MB | [13,15,16,18,31] | [13,15,16,18,31] | [13,16,18,31] | [7,13,15,16,18,21,29,31] | [4,13,16,18,31] | [13,15,16,18,31] |
Precision | 100.00% | 100.00% | 63.00% | 80.00% | 100.00% | ||
Recall | 100.00% | 80.00% | 100.00% | 80.00% | 100.00% | ||
15 | MB | [4,14,16,18,21,22,31] | [4,14,16,21,31] | [4,21] | [4,14,21] | [4,14,21] | [4,14,16,21,31] |
Precision | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | ||
Recall | 71.00% | 29.00% | 43.00% | 43.00% | 71.00% | ||
21 | MB | [15,19,20,22,29] | [4,15,19,20,22,29] | [15,19,20,22,29] | [15,18,19,22,29] | [4,19,22,29] | [4,15,19,20,22,29] |
Precision | 83.00% | 100.00% | 80.00% | 75.00% | 83.00% | ||
Recall | 100.00% | 100.00% | 80.00% | 60.00% | 100.00% | ||
31 | MB | [14,15,16,18] | [4,14,16,18] | [14,16,18] | [1,2,4,14,15,16,18,19,21,22,23,24,27,28,29,30] | [14,15,16,18] | [14,16,18] |
Precision | 75.00% | 100.00% | 25.00% | 100.00% | 100.00% | ||
Recall | 75.00% | 75.00% | 75.00% | 100.00% | 75.00% |
Dataset | Wine | Diabetes | Statlog |
---|---|---|---|
Instances | 178 | 768 | 1000 |
Features | 13 | 8 | 20 |
Classifier | Dataset | MBOR | PCMB | STMB | IAMB | EDMB |
---|---|---|---|---|---|---|
Naive Bayes | wine | 0.9608 | 0.9546 | 0.9775 | 0.9775 | 0.9490 |
diabetes | 0.7618 | 0.7552 | 0.7592 | 0.7618 | 0.7618 | |
Statlog | 0.7330 | 0.7450 | 0.7220 | 0.7230 | 0.7480 | |
Linear SVM | wine | 0.9608 | 0.9608 | 0.9386 | 0.9386 | 0.9663 |
diabetes | 0.7632 | 0.7632 | 0.7632 | 0.7606 | 0.7632 | |
Statlog | 0.7320 | 0.7350 | 0.7410 | 0.7420 | 0.7360 | |
Decision Tree | wine | 0.9101 | 0.9216 | 0.9271 | 0.9216 | 0.9271 |
diabetes | 0.6993 | 0.6811 | 0.6928 | 0.6902 | 0.6993 | |
Statlog | 0.7200 | 0.7120 | 0.7150 | 0.7190 | 0.7140 | |
Rank score | 2.8333 | 3.7222 | 3.0000 | 3.0556 | 2.3889 | |
Rank | 2 | 5 | 3 | 4 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Zhan, J.; Wang, H.; Zhao, Z. A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection. Mathematics 2024, 12, 2927. https://doi.org/10.3390/math12182927
Li H, Zhan J, Wang H, Zhao Z. A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection. Mathematics. 2024; 12(18):2927. https://doi.org/10.3390/math12182927
Chicago/Turabian StyleLi, Hao, Jianjun Zhan, Haosen Wang, and Zipeng Zhao. 2024. "A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection" Mathematics 12, no. 18: 2927. https://doi.org/10.3390/math12182927
APA StyleLi, H., Zhan, J., Wang, H., & Zhao, Z. (2024). A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection. Mathematics, 12(18), 2927. https://doi.org/10.3390/math12182927