Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
Abstract
:1. Introduction
2. Background
2.1. Related Work
2.2. Monte Carlo Tree Search (MCTS)
- Selection: Starting from the root node, the algorithm traverses the tree by applying a recursive child selection policy until the urgent node is reached that represents a non-terminal state and has unvisited children.
- Expansion: A tree is expanded by adding a new child node based on the set of actions available.
- Simulation: A simulation is performed from the new child node according to the default policy to produce an approximate outcome.
- Backpropagation: The reward of simulation is backed-up using the selected nodes to update the statistics of the tree.
2.3. Upper Confidence Bounds for Trees (UCT)
3. R-MOTiFS (Recursive-Monte Carlo Tree Search-Based Feature Selection)
3.1. The Recursive Procedure
3.2. Feature Selection Tree
- 1.
- The root is , which means no feature is selected yet.
- 2.
- Any node at levelhas two children,and, where.
3.3. Feature Subset Generation
3.4. Reward Calculation and Backpropagation
Algorithm 1 The R-MOTiFS Algorithm |
Load dataset and preprocess |
Initialize SCALAR, BUDGET //Scaling factor & Number of MCTS simulations (hyper parameters) |
function R-MOTiFS (featuresList) |
create rootNode |
maxReward, bestFeatureSubset ← UCTSEARCH (rootNode) |
if maxReward is greater than optimalReward then |
optimalReward ← maxReward |
optimalFeatureSubset ← bestFeatureSubset |
R-MOTiFS (bestFeatureSubset) |
else |
return (optimalReward, optimalFeatureSubset) |
function UCTSEARCH (rootNode) |
Initialize maxReward, bestFeatureSubset |
while within computational budget do |
frontNode ← TREEPOLICY (rootNode) |
reward, featureSubset ← DEFAULTPOLICY (frontNode.state) |
BACKUP (frontNode, reward) |
if reward is greater than maxReward then |
maxReward ← reward |
bestFeatureSubset ← featureSubset |
return (maxReward, bestFeatureSubset) |
function TREEPOLICY (node) |
while node is non-terminal do |
if node not fully expanded then |
return EXPAND (node) |
else |
node ← BESTCHILD (node, SCALAR) |
return node |
function EXPAND (node) |
choose a untried actions from A(node.state) |
add a newChild with f(node.state, a) |
return newChild |
function BESTCHILD (, ) |
function DEFAULTPOLICY (state) |
while state is non-terminal do |
choose a A(state) uniformly at random |
state ← f(state, a) |
traversestate.path |
if ai is equal to fi+1 then |
featureSubset ← INCLUDE (fi+1) |
reward ← REWARD (featureSubset) |
return (reward, featureSubset) |
function BACKUP (node, reward) |
while node is not null do |
node.visits ← node.visits + 1 |
if reward > node.reward then |
node.reward ← reward |
node ← node.parent |
return |
4. Experiment and Results
4.1. Datasets
4.2. Experimental Setting
4.3. Results and Comparisons
4.3.1. Comparison with MOTiFS and H-MOTiFS
4.3.2. Comparison with State-Of-The-Art Methods
4.3.3. Non-Parametric Statistical Tests
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zheng, Y.; Keong, C. A feature subset selection method based on high-dimensional mutual information. Entropy 2011, 13, 860–901. [Google Scholar] [CrossRef] [Green Version]
- Sluga, D.; Lotrič, U. Quadratic mutual information feature selection. Entropy 2017, 19, 157. [Google Scholar] [CrossRef] [Green Version]
- Reif, M.; Shafait, F. Efficient feature size reduction via predictive forward selection. Pattern Recognit. 2014, 47, 1664–1673. [Google Scholar] [CrossRef]
- Saganowski, S.; Gliwa, B.; Bródka, P.; Zygmunt, A.; Kazienko, P.; Kozlak, J. Predicting community evolution in social networks. Entropy 2015, 17, 3053–3096. [Google Scholar] [CrossRef]
- Smieja, M.; Warszycki, D. Average information content maximization-a new approach for fingerprint hybridization and reduction. PLoS ONE 2016, 11, e0146666. [Google Scholar] [CrossRef] [Green Version]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning. Elements 2009, 1, 337–387. [Google Scholar]
- Guo, Y.; Berman, M.; Gao, J. Group subset selection for linear regression. Comput. Stat. Data Anal. 2014, 75, 39–52. [Google Scholar] [CrossRef]
- Dash, M.; Choi, K.; Scheuermann, P.; Liu, H. Feature selection for clustering—A filter solution. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi, Japan, 9–12 December 2002; pp. 115–122. [Google Scholar]
- Kim, Y.; Street, W.N.; Menczer, F. Feature selection in unsupervised learning via evolutionary search. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000; pp. 365–369. [Google Scholar]
- Iguyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Hall, M. Correlation-based Feature Selection for Machine Learning. Methodology 1999, 21i195-i20, 1–5. [Google Scholar]
- Senawi, A.; Wei, H.L.; Billings, S.A. A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking. Pattern Recognit. 2017, 67, 47–61. [Google Scholar] [CrossRef]
- Zhao, G.D.; Wu, Y.; Chen, F.Q.; Zhang, J.M.; Bai, J. Effective feature selection using feature vector graph for classification. Neurocomputing 2015, 151, 376–389. [Google Scholar] [CrossRef]
- Gao, W.; Hu, L.; Zhang, P. Class-specific mutual information variation for feature selection. Pattern Recognit. 2018, 79, 328–339. [Google Scholar] [CrossRef]
- Gao, W.; Hu, L.; Zhang, P.; Wang, F. Feature selection by integrating two groups of feature evaluation criteria. Expert Syst. Appl. 2018, 110, 11–19. [Google Scholar] [CrossRef]
- Gao, W.; Hu, L.; Zhang, P.; He, J. Feature selection considering the composition of feature relevancy. Pattern Recognit. Lett. 2018, 112, 70–74. [Google Scholar] [CrossRef]
- Huang, C.L.; Wang, C.J. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
- Hamdani, T.M.; Won, J.-M.; Alimi, A.M.; Karray, F. Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate. Appl. Soft Comput. 2011, 11, 2501–2509. [Google Scholar] [CrossRef]
- Hong, J.H.; Cho, S.B. Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit. Lett. 2006, 27, 143–150. [Google Scholar] [CrossRef]
- Unler, A.; Murat, A.; Chinnam, R.B. Mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 2011, 181, 4625–4641. [Google Scholar] [CrossRef]
- Zhang, Y.; Gong, D.; Hu, Y.; Zhang, W. Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 2015, 148, 150–157. [Google Scholar] [CrossRef]
- Xue, B.; Zhang, M.J.; Browne, W.N. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans. Cybern. 2013, 43, 1656–1671. [Google Scholar] [CrossRef] [PubMed]
- Kabir, M.M.; Shahjahan, M.; Murase, K. A new hybrid ant colony optimization algorithm for feature selection. Expert Syst. Appl. 2012, 39, 3747–3763. [Google Scholar] [CrossRef]
- Wang, H.; Meng, Y.; Yin, P.; Hua, J. A Model-Driven Method for Quality Reviews Detection: An Ensemble Model of Feature Selection. In Proceedings of the 15th Wuhan International Conference on E-Business (WHICEB 2016), Wuhan, China, 26–28 May 2016; p. 2. [Google Scholar]
- Rao, H.; Shi, X.; Rodrigue, A.K.; Feng, J.; Xia, Y.; Elhoseny, M.; Yuan, X.; Gu, L. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. J. 2019, 74, 634–642. [Google Scholar] [CrossRef]
- Chaudhry, M.U.; Lee, J.-H. MOTiFS: Monte Carlo Tree Search Based Feature Selection. Entropy 2018, 20, 385. [Google Scholar] [CrossRef] [Green Version]
- Chaudhry, M.U.; Lee, J.-H. Feature selection for high dimensional data using monte carlo tree search. IEEE Access 2018, 6, 76036–76048. [Google Scholar] [CrossRef]
- Browne, C.; Powley, E. A survey of monte carlo tree search methods. IEEE Trans. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef] [Green Version]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Gaudel, R.; Sebag, M. Feature Selection as a One-Player Game. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Hazrati, F.S.M.; Hamzeh, A.; Hashemi, S. Using reinforcement learning to find an optimal set of features. Comput. Math. Appl. 2013, 66, 1892–1904. [Google Scholar] [CrossRef]
- Zokaei Ashtiani, M.-H.; Nili Ahmadabadi, M.; Nadjar Araabi, B. Bandit-based local feature subset selection. Neurocomputing 2014, 138, 371–382. [Google Scholar] [CrossRef]
- Zheng, J.; Zhu, H.; Chang, F.; Liu, Y. An improved relief feature selection algorithm based on Monte-Carlo tree search. Syst. Sci. Control Eng. 2019, 7, 304–310. [Google Scholar] [CrossRef]
- Park, C.H.; Kim, S.B. Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 2015, 42, 2336–2342. [Google Scholar] [CrossRef]
- Devroye, L.; Gyorfi, L.; Krzyzak, A.; Lugosi, G. On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates. Ann. Stat. 1994, 22, 1371–1385. [Google Scholar] [CrossRef]
- Aha, D.W.; Kibler, D.; Albert, M.K. Instance-Based Learning Algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef] [Green Version]
- Machine Learning Repository. Retrieved from University of California, Irvine. Available online: http://archive.ics.uci.edu/ml/index.php (accessed on 10 September 2019).
- Chang, C.; Lin, C. Retrieved from LIBSVM—A Library for Support Vector Machines. 2001. Available online: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (accessed on 10 September 2019).
- Paul, S.; Das, S. Simultaneous feature selection and weighting—An evolutionary multi-objective optimization approach. Pattern Recognit. Lett. 2015, 65, 51–59. [Google Scholar] [CrossRef]
- Das, A.K.; Das, S.; Ghosh, A. Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 2017, 123, 116–127. [Google Scholar] [CrossRef]
- Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. J. 2014, 18, 261–276. [Google Scholar] [CrossRef]
- Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. J. 2018, 62, 441–453. [Google Scholar] [CrossRef]
Notation | Interpretation |
---|---|
Original feature set | |
Input feature set in recursion | |
Best feature subset in recursion | |
Node at tree level | |
Number of times node is visited | |
Simulation reward |
# | Dataset | No. of Features | No. of Instances | No. of Classes |
---|---|---|---|---|
1 | Spambase | 57 | 4701 | 2 |
2 | Ionosphere | 34 | 351 | 2 |
3 | Arrhythmia | 195 | 452 | 16 |
4 | Multiple Features | 649 | 2000 | 10 |
5 | Waveform | 40 | 5000 | 3 |
6 | WBDC | 30 | 569 | 2 |
7 | German number | 24 | 1000 | 2 |
8 | DNA | 180 | 2000 | 2 |
9 | Sonar | 60 | 208 | 2 |
10 | Hillvalley | 100 | 606 | 2 |
11 | Musk 1 | 166 | 476 | 2 |
12 | Coil20 | 1024 | 1440 | 20 |
13 | Orl | 1024 | 400 | 40 |
14 | Lung_Discrete | 325 | 73 | 7 |
15 | Kr-vs-kp | 36 | 3196 | 2 |
16 | Spect | 22 | 267 | 2 |
Dataset | Accuracy Number of Selected Features | ||
---|---|---|---|
R-MOTiFS | MOTiFS [25] | H-MOTiFS [26] | |
Spambase | 0.915 ± 0.003 15.5 | 0.907 31.5 | 0.907 18.0 |
Ionosphere | 0.890 ± 0.008 4.72 | 0.889 12.3 | 0.892 7.0 |
Arrhythmia | 0.678 ± 0.008 12.3 | 0.650 94.4 | 0.640 40.0 |
Multiple features | 0.982 ± 0.002 110.5 | 0.980 321.8 | 0.983 195.0 |
Waveform | 0.817 ± 0.005 14.4 | 0.816 19.4 | 0.823 12.0 |
WDBC | 0.962 ± 0.002 12.6 | 0.967 15.4 | 0.964 6.0 |
German Number | 0.718 ± 0.014 8.6 | 0.725 11.5 | 0.728 8.0 |
DNA | 0.893 ± 0.002 12.2 | 0.810 89.3 | 0.905 18.0 |
Sonar | 0.834 ± 0.003 14.1 | 0.850 28.9 | 0.836 12.0 |
Hill valley | 0.552 ± 0.016 9.5 | 0.535 45.2 | 0.566 10.0 |
Musk 1 | 0.853 ± 0.010 32.7 | 0.852 81.3 | 0.850 50.0 |
Coil20 | 0.981 ± 0.009 81.6 | 0.980 505.4 | 0.989 308.0 |
Orl | 0.862 ± 0.011 135.3 | 0.862 498.3 | 0.883 308.0 |
Lung_discrete | 0.807 ± 0.006 41.0 | 0.810 154.8 | 0.823 98.0 |
Kr-vs-kp | 0.964 ± 0.005 16.2 | 0.961 20.1 | 0.975 8.0 |
Spect | 0.813 ± 0.008 8.7 | 0.809 10.3 | 0.817 7.0 |
DataSet | FSR | ||
---|---|---|---|
R-MOTiFS | MOTiFS [25] | H-MOTiFS [26] | |
Spambase | 0.059 | 0.029 | 0.050 |
Ionosphere | 0.188 | 0.072 | 0.127 |
Arhythmia | 0.055 | 0.007 | 0.016 |
Multiple ft. | 0.009 | 0.003 | 0.005 |
Waveform | 0.057 | 0.042 | 0.068 |
WDBC | 0.076 | 0.063 | 0.161 |
GermanNumber | 0.083 | 0.063 | 0.091 |
DNA | 0.073 | 0.009 | 0.050 |
Sonar | 0.059 | 0.029 | 0.069 |
HillValley | 0.058 | 0.012 | 0.056 |
Musk 1 | 0.026 | 0.010 | 0.017 |
Coil20 | 0.012 | 0.002 | 0.003 |
ORL | 0.006 | 0.002 | 0.003 |
Lung_discrete | 0.020 | 0.005 | 0.008 |
Kr-vs-Kp | 0.060 | 0.048 | 0.122 |
Spect | 0.093 | 0.079 | 0.116 |
Dataset | Accuracy, Number of Selected Features | ||||||
---|---|---|---|---|---|---|---|
R-MOTiFS | GA | SFSW [40] | E-FSGA [41] | PSO (4-2) [42] | WOA [43] | WOA-T [43] | |
Spambase | 0.915 15.5 | 0.910 26.0 | 0.885 26.0 | 0.922 | - | - | - |
Ionosphere | 0.891 4.72 | 0.875 11.0 | 0.883 11.5 | 0.862 | 0.873 3.3 | 0.890 21.5 | 0.884 20.2 |
Arhythmia | 0.678 12.2 | 0.635 101.0 | 0.658 100.0 | - | - | - | - |
Multiple Feat | 0.982 110.5 | 0.976 339.0 | 0.979 270.0 | 0.945 | - | - | - |
Waveform | 0.818 14.4 | 0.817 18.0 | 0.837 16.0 | - | - | 0.713 33.2 | 0.710 33.7 |
WDBC | 0.962 12.62 | 0.961 18.0 | 0.941 13.5 | 0.969 | 0.940 3.5 | 0.955 20.8 | 0.950 20.6 |
GermanNumber | 0.718 8.62 | 0.715 9.0 | 0.713 10.5 | - | 0.685 12.8 | - | - |
DNA | 0.893 12.16 | 0.860 87.0 | 0.831 71.8 | - | - | - | - |
Sonar | 0.834 14.1 | 0.856 26.0 | 0.827 20.0 | 0.808 | 0.782 11.2 | 0.854 43.4 | 0.861 38.2 |
HillValley | 0.552 9.52 | 0.564 32.0 | 0.575 40.0 | - | 0.578 12.2 | - | - |
Musk 1 | 0.852 32.7 | 0.840 75.0 | 0.815 59.3 | - | 0.849 76.5 | - | - |
Coil20 | 0.983 81.65 | 0.982 462.0 | - | 0.892 | - | - | - |
ORL | 0.860 135.32 | 0.858 571.0 | - | 0.622 | - | - | - |
Lung discrete | 0.807 41.0 | 0.800 115.0 | - | 0.713 | 0.784 6.7 | 0.730 | 0.737 |
Kr-vs-Kp | 0.964 16.2 | 0.970 17.0 | - | - | - | 0.915 27.9 | 0.896 26.7 |
Spect | 0.813 8.72 | 0.805 11.0 | - | - | - | 0.788 12.1 | 0.792 11.5 |
Dataset | FSR | |||||
---|---|---|---|---|---|---|
R-MOTiFS | GA | SFSW [40] | PSO (4-2) [42] | WOA [43] | WOA-T [43] | |
Spambase | 0.059 | 0.035 | 0.034 | - | - | - |
Ionosphere | 0.188 | 0.079 | 0.077 | 0.264 | 0.041 | 0.044 |
Arhythmia | 0.055 | 0.006 | 0.006 | - | - | - |
Multiple Feat. | 0.009 | 0.003 | 0.004 | - | - | - |
Waveform | 0.057 | 0.045 | 0.052 | - | 0.021 | 0.021 |
WDBC | 0.076 | 0.053 | 0.070 | 0.268 | 0.046 | 0.046 |
GermanNumber | 0.083 | 0.079 | 0.068 | 0.053 | - | - |
DNA | 0.073 | 0.010 | 0.011 | - | - | - |
Sonar | 0.059 | 0.033 | 0.041 | 0.069 | 0.020 | 0.022 |
HillValley | 0.058 | 0.017 | 0.014 | 0.047 | - | - |
Musk 1 | 0.026 | 0.011 | 0.014 | 0.011 | - | - |
Coil20 | 0.012 | 0.002 | - | - | - | - |
ORL | 0.006 | 0.002 | - | - | - | - |
Lung_discrete | 0.020 | 0.007 | - | 0.117 | 0.010 | 0.011 |
Kr-vs-Kp | 0.060 | 0.057 | - | - | 0.033 | 0.034 |
Spect | 0.093 | 0.073 | - | - | 0.065 | 0.069 |
R-MOTiFS vs. | R+ | R– | -Value | -Value |
---|---|---|---|---|
MOTiFS | 136 | 0 | 0.0004 | 0 |
H-MOTiFS | 72.5 | 63.5 | 0.8181 | 63.5 |
GA | 136 | 0 | 0.0004 | 0 |
SFSW | 66 | 0 | 0.0033 | 0 |
PSO (4-2) | 9 | 19 | NA | 9 |
WoA | 28 | 0 | NA | 0 |
WoA-T | 28 | 0 | NA | 0 |
Methods | Rank |
---|---|
R-MOTiFS | 1.36 |
H-MOTiFS | 1.64 |
MOTiFS | 4.64 |
SFSW | 3.45 |
GA | 3.72 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chaudhry, M.U.; Yasir, M.; Asghar, M.N.; Lee, J.-H. Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets. Entropy 2020, 22, 1093. https://doi.org/10.3390/e22101093
Chaudhry MU, Yasir M, Asghar MN, Lee J-H. Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets. Entropy. 2020; 22(10):1093. https://doi.org/10.3390/e22101093
Chicago/Turabian StyleChaudhry, Muhammad Umar, Muhammad Yasir, Muhammad Nabeel Asghar, and Jee-Hyong Lee. 2020. "Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets" Entropy 22, no. 10: 1093. https://doi.org/10.3390/e22101093
APA StyleChaudhry, M. U., Yasir, M., Asghar, M. N., & Lee, J. -H. (2020). Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets. Entropy, 22(10), 1093. https://doi.org/10.3390/e22101093