Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data
Abstract
:1. Introduction
2. Related Studies
2.1. ILM Based Model Simplification
2.2. Deep Random Models
2.3. Contributions of This Study
- After an LMRF is trained and multi-layer networks are generated using several RFs (see, Figure 1a), we first decompose the predictions of each decision tree in the RF into mathematically exact feature contributions.
- Individual predictions of the decision tree can be explained by breaking down the decision path into a single component per feature. This procedure is iteratively applied to find all rules of the entire RF layer by layer and saved to decision sets, which are sets of classification rules of an RF, (see the example in Figure 1).
- Sequential covering then repeatedly maintains and eliminates rules of the decision set of an RF based on a combination of the rule contribution and feature pattern (frequency of rules). This regularization keeps only a small number of refined rules that are the most discriminative.
- After the sequential covering, we have the same number of decision sets per layer, but the numbers of rules and features are significantly reduced without decreasing the performance.
- Herein, we provide the qualitative and quantitative results demonstrating that our proposed model simplification method is understandable and effective for real-time processing.
3. Simplification of Deep Random Forest
3.1. Growth Phase: Training of DRF
3.2. Sequential Covering Based on Rule Contribution
3.3. Rule Elimination Phase: Simplifying LMRF
Algorithm 1: Rule elimination based on feature contribution and feature pattern |
1: Input: The number of layers , the number of RFs , the number of trees T, random forest , list of dSets 2: Start with an empty list of 3: 4: Learn LMRF 5: For each layer: 6: For each RF: 7: For each tree: 8: - Split a -th rule from a decision tree 9: - Calculate feature contribution of a -th rule , Equation (2) 10: - Calculate rule contribution for -th rule 11: - Add rule and its to 12: End 13: - Compute feature pattern by splitting rules in 14: - Re-compute a new rule contribution , Equation (4) 15: - Sort rules in according to 16: - Add to of -th layer 17: 18: End 19: End 20: Output: The consists of layers |
4. Experimental Results
4.1. Datasets
4.2. Evaluation of DRF Models
4.3. Decision Boundary Analysis
4.4. Comparison with State-Of-The-Art Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Marcus, G. Deep Learning: A Critical Appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
- Chen, Y.; Wang, N.; Zhang, Z. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), New Orleans, FL, USA, 2–7 February 2018; pp. 2852–2859. [Google Scholar]
- Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Wang, Y.; Xu, C.; Xu, C.; Xu, C.; Tao, D. Learning Versatile Filters for Efficient Convolutional Neural Networks. In Proceedings of the Thirty-Second Annual Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 2–8 December 2018; pp. 1608–1618. [Google Scholar]
- Tung, F.; Mori, G. CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization. In Proceedings of the 2018 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7873–7882. [Google Scholar]
- Kim, S.J.; Kwak, S.Y.; Ko, B.C. Fast Pedestrian Detection in Surveillance Video Based on Soft Target Training of Shallow Random Forest. IEEE Access 2019, 7, 12315–12326. [Google Scholar] [CrossRef]
- Miller, K.; Hettinger, C.; Humpherys, J.; Jarvis, T.; Kartchner, D. Forward Thinking: Building Deep Random Forests. arXiv 2017, arXiv:1705.07366. [Google Scholar]
- Zhou, Z.H.; Feng, J. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; pp. 3553–3559. [Google Scholar]
- David, G. Explainable Artificial Intelligence (XAI). Available online: https://www.darpa.mil/program/explainable-artificial-intelligence (accessed on 8 November 2020).
- Anders, C.; Montavon, G.; Samek, W.; Müller, K.R. Understanding Patch-Based Learning of Video Data by Explaining Predictions. Lecture Notes Comput. Sci. 2019, 11700, 297–309. [Google Scholar]
- Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On Pixel-wise Explanations for Non-Linear Classifier Decisions by Layer-wise Relevance Propagation. PLoS ONE 2015, 10, 1–46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Montavon, G.; Lapuschkin, A.; Binder Samek, W.; Müller, K.R. Explaining non-linear classification decisions with deep Taylor decomposition. Pattern Recog. 2017, 65, 211–222. [Google Scholar] [CrossRef]
- Montavon, G.; Samek, W.; Müller, K.R. Methods for interpreting and understanding deep neural networks. Dig. Sig. Proc. 2018, 73, 1–15. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 618–626. [Google Scholar]
- Biran, O.; Cotton, C. Explanation and Justification in Machine Learning: A Survey. In Proceedings of the Twenty-Sixth International Conference on Artificial Intelligence Workshop (IJCAIW), Melbourne, Australia, 19–25 August 2017; pp. 1–6. [Google Scholar]
- Kim, S.; Jeong, M.; Ko, B.C. Interpretation and Simplification of Deep Forest. TechRxiv 2020. TechRxiv:techrxiv.11661246.v1. [Google Scholar]
- Si, Z.; Zhu, S.C. Learning AND-OR templates for object recognition and detection. IEEE Trans. Pattern Anal. Machine Intell. 2013, 35, 2189–2205. [Google Scholar] [CrossRef]
- Liu, S.; Dissanayake, S.; Patel, S.; Dang, X.; Mlsna, T.; Chem, Y.; Wilkins, D. Learning accurate and interpretable models based on regularized random forests regression. BMC Syst. Biol. 2014, 8, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Letham, B.; Rudin, C.; McCormick, T.H.; Madigan, D. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Stat. 2015, 9, 1350–1371. [Google Scholar] [CrossRef]
- Lakkaraju, H.; Bach, S.H.; Leskovec, J. Interpretable decision sets: A joint frame-work for description and prediction. In Proceedings of the ACM International Conference Knowledge Discovery and Data Mining (KDD), Anchorage, AK, USA, 4–8 August 2019; pp. 1675–1684. [Google Scholar]
- Yang, H.; Rudin, C.; Seltzer, M. Scalable Bayesian Rule Lists. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 11–15 August 2017; pp. 3921–3930. [Google Scholar]
- Yang, Z.; Zhang, A.; Sudjianto, A. Enhancing Explainability of Neural Networks through Architecture Constraints. IEEE Trans. on Neur. Netw. Learn. Syst. 2020, 1, 1–12. [Google Scholar] [CrossRef]
- Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
- Jeong, M.; Park, M.; Ko, B.C. Intelligent Driver Emotion Monitoring Based on Lightweight Multilayer Random Forests. In Proceedings of the International Conference on Industrial Informatics (INDIN), Helsinki-Espoo, Finland, 22–25 July 2019; pp. 1–4. [Google Scholar]
- Ji, F.; Yu, Y.; Zhou, Z.H. Multi-layered gradient boosting decision trees. In Proceedings of the Thirty-Second Annual Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 2–8 December 2018; pp. 3551–3561. [Google Scholar]
- Utkin, L.V.; Ryabinin, M.A. A Siamese Deep Forest. Knowl. Based Syst. 2018, 139, 13–22. [Google Scholar] [CrossRef]
- Kim, S.; Jeong, M.; Ko, B.C. Self-Supervised Keypoint Detection Based on Multi-layer Random Forest Regressor. IEEE Acess 2021, 9, 40850–40859. [Google Scholar] [CrossRef]
- Kim, S.; Jeong, M.; Lee, D.; Ko, B.C. Deep coupling of random ferns. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Long Beach, CA, USA, 16–20 June 2019; pp. 5–8. [Google Scholar]
- Ioannou, Y.; Robertson, D.; Zikic, D.; Kontschieder, P.; Shotton, J.; Brown, M.; Criminisi, A. Decision Forests, convolutional networks and the models in-between. arXiv 2016, arXiv:1603.01250. [Google Scholar]
- Frosst, N.; Hinton, G.E. Distilling a neural network into a soft decision tree. arXiv 2017, arXiv:1711.09784. [Google Scholar]
- Kong, Y.; Yu, T. A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Sci. Rep. 2018, 8, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Molnar, C. Interpretable Machine Learning, 1st ed.; Leanpub Book: Victoria, BC, Canada, 2019; pp. 90–93. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Mangasarian, O.L.; Wolberg, W.H. Cancer diagnosis via linear programming. SIAM News 1998, 23, 1–18. [Google Scholar]
- Samaria, F.; Harter, A. Parameterisation of a Stochastic Model for Human Face Identification. In Proceedings of the IEEE Workshops on Applications of Computer Vision (WACV), Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 8 November 2020).
- Krizhenvsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Liu, M.; Li, S.; Shan, S.; Wang, R.; Chen, X. Deeply learning deformable facial action parts model for dynamic expression analysis. In Proceedings of the Asian Conference on Computer Vision (ACCV), Singapore, 1–5 November 2014; pp. 143–157. [Google Scholar]
- Mollahosseini, A.; Chan, D.; Mahoor, M.H. Going deeper in facial expression recognition using deep neural networks. In Proceedings of the IEEE Workshops on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–10. [Google Scholar]
- Hasani, B.; Mahoor, M.H. Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 30–40. [Google Scholar]
- Jeong, M.; Ko, B.C. Driver’s Facial Expression Recognition in Real-Time for Safe Driving. Sensors 2018, 18, 4270. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: In-verted residuals and linear bottlenecks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Le, Q.V. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Rules |
---|
Initial Rules of and and and ⇒ {0} [0, 0, 1] and and and ⇒ {-0.277} [0, 0.5, 0.5] and and ⇒ {0.55} [0, 1, 0] and and ⇒ {0.27} [0, 0.75, 0.25] and and ⇒ {0.83} [0, 0.6, 0.4] …… |
Reordered Rules of and and ⇒ {0.83} [0, 0.6, 0.4] and and ⇒ {0.55} [0, 1, 0] and and and ⇒ {-0.277} [0, 0.5, 0.5] and and ⇒ {0.27} [0, 0.75, 0.25] and and and ⇒ {0} [0, 0, 1] …… |
Rule Ratio | Accuracy (%) | Rules (M) | # Param. (M) | # Op. (M) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | |
1.0 | 93.60 | 89.71 | 92.41 | 0.12 | 0.16 | 0.13 | 0.53 | 2.90 | 2.51 | 0.0060 | 0.0381 | 0.0233 |
0.9 | 92.86 | 90.00 | 92.15 | 0.11 | 0.15 | 0.12 | 0.51 | 2.78 | 2.39 | 0.0060 | 0.0381 | 0.0232 |
0.8 | 92.50 | 89.92 | 92.24 | 0.09 | 0.13 | 0.10 | 0.47 | 2.59 | 2.22 | 0.0059 | 0.0380 | 0.0231 |
0.7 | 91.87 | 89.92 | 92.18 | 0.08 | 0.12 | 0.09 | 0.44 | 2.38 | 2.03 | 0.0059 | 0.0379 | 0.0230 |
0.6 | 91.05 | 89.73 | 92.04 | 0.07 | 0.10 | 0.08 | 0.39 | 2.16 | 1.83 | 0.0058 | 0.0377 | 0.0228 |
Rule Ratio | Accuracy (%) | Rules (M) | # Param. (M) | # Op. (M) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | |
1.0 | 97.98 | 98.77 | 98.57 | 0.08 | 0.94 | 0.26 | 2.12 | 26.42 | 7.17 | 0.0089 | 0.0852 | 0.0296 |
0.9 | 97.77 | 98.73 | 98.57 | 0.08 | 0.85 | 0.23 | 2.00 | 24.79 | 6.76 | 0.0088 | 0.0851 | 0.0294 |
0.8 | 97.41 | 98.74 | 98.57 | 0.07 | 0.76 | 0.21 | 1.86 | 22.98 | 6.28 | 0.0087 | 0.0850 | 0.0293 |
0.7 | 96.86 | 98.76 | 98.47 | 0.06 | 0.66 | 0.18 | 1.71 | 20.98 | 5.75 | 0.0087 | 0.0849 | 0.0292 |
0.6 | 96.00 | 98.75 | 98.39 | 0.05 | 0.57 | 0.16 | 1.54 | 18.86 | 5.19 | 0.0086 | 0.0850 | 0.0291 |
Rule Ratio | Accuracy (%) | Rules (M) | ||||
---|---|---|---|---|---|---|
sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | |
1.0 | 96.49 | 95.21 | 97.34 | 0.0063 | 0.0450 | 0.0256 |
0.9 | 96.49 | 95.21 | 97.34 | 0.0063 | 0.0450 | 0.0247 |
0.8 | 96.49 | 95.21 | 97.34 | 0.0057 | 0.0385 | 0.0227 |
0.7 | 96.49 | 95.74 | 97.34 | 0.0050 | 0.0350 | 0.0199 |
0.6 | 96.49 | 95.74 | 96.81 | 0.0044 | 0.0298 | 0.0168 |
Rule Ratio | Accuracy (%) | Rules (M) | ||||
---|---|---|---|---|---|---|
sLMRF | gcForest | FTDRF | sLMRF | gcForest | FTDRF | |
1.0 | 97.50 | 97.50 | 90.00 | 0.0595 | 0.0957 | 0.1133 |
0.9 | 97.50 | 97.50 | 90.00 | 0.0543 | 0.0893 | 0.1045 |
0.8 | 97.50 | 97.50 | 90.00 | 0.0481 | 0.0786 | 0.0925 |
0.7 | 87.50 | 97.50 | 90.00 | 0.0422 | 0.0702 | 0.0809 |
0.6 | 87.50 | 97.50 | 90.00 | 0.0362 | 0.0594 | 0.0696 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, S.; Ko, B.-C.; Nam, J. Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data. Sensors 2021, 21, 3004. https://doi.org/10.3390/s21093004
Kim S, Ko B-C, Nam J. Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data. Sensors. 2021; 21(9):3004. https://doi.org/10.3390/s21093004
Chicago/Turabian StyleKim, Sangwon, Byoung-Chul Ko, and Jaeyeal Nam. 2021. "Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data" Sensors 21, no. 9: 3004. https://doi.org/10.3390/s21093004
APA StyleKim, S., Ko, B. -C., & Nam, J. (2021). Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data. Sensors, 21(9), 3004. https://doi.org/10.3390/s21093004