Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
4. Experiments and Results
4.1. Datasets
- CIFAR-10 [24]: The CIFAR-10 dataset features 60,000 32 × 32 color images, divided into 10 classes of 6000 images each. The training set consists of 50,000 images, whereas the test set contains 10,000 images, randomly selected from each class.
- USPS Handwritten Digits (USPS) [25]: USPS is a dataset of handwritten digits featuring 7291 training and 2007 8 × 8 testing examples, coming from 10 classes.
- Fashion-MNIST [26]: Fashion-MNIST is structured based on MNIST [27], a handwritten digit dataset, which is considered an almost solved problem, and is designed as a more challenging dataset; it consists of clothing images divided into a training set of 60,000 samples and a test set of 10,000 28 × 28 grayscale samples of 10 classes.
- SVHN [28]: SVHN is an image dataset of house numbers, obtained from Google Street View images. The dataset’s structure is similar to that of the MNIST dataset; each of the 10 classes consists of images of one digit. The dataset contains over 600,000 digit images, split into 73,257 digits for training, 26,032 digits for testing, and 531,131 additional training examples.
- STL-10 [29]: The STL-10 dataset is an image recognition dataset inspired by the CIFAR-10 dataset. The dataset shares the same structure as the CIFAR-10 dataset, with 10 classes of 500 96 × 96 training images and 800 96 × 96 test images in each. However, the dataset also contains 100,000 unlabeled images for unsupervised training, with content extracted from similar, but not the same categories as the original classes, acquired from Imagenet [30]. Although this dataset was designed for developing scalable unsupervised methods, in this study, it was used as a standard supervised classification dataset.
4.2. Implementation Details
4.3. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Allen, D.M. The Relationship between Variable Selection and Data Agumentation and a Method for Prediction. Technometrics 1974, 16, 125–127. [Google Scholar] [CrossRef]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In 14th International Joint Conference on Artificial Intelligence—Volume 2; IJCAI’95; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 1137–1143. [Google Scholar]
- Freund, Y. Boosting a Weak Learning Algorithm by Majority. Inf. Comput. 1995, 121, 256–285. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Perez, L.; Wang, J. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Mané, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies from Data. arXiv 2018, arXiv:1805.09501. [Google Scholar]
- Ohashi, H.; Al-Naser, M.; Ahmed, S.; Akiyama, T.; Sato, T.; Nguyen, P.; Nakamura, K.; Dengel, A. Augmenting Wearable Sensor Data with Physical Constraint for DNN-Based Human-Action Recognition. In Proceedings of the ICML 2017 Times Series Workshop, PMLR, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Prechelt, L. Early Stopping-But When? In Neural Networks: Tricks of the Trade; This Book Is an Outgrowth of a 1996 NIPS Workshop; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar]
- Krogh, A.; Hertz, J.A. A Simple Weight Decay Can Improve Generalization. In Proceedings of the 4th International Conference on Neural Information Processing Systems; NIPS’91. Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1991; pp. 950–957. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Wan, L.; Zeiler, M.D.; Zhang, S.; LeCun, Y.; Fergus, R. Regularization of Neural Networks using DropConnect. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Huang, G.; Sun, Y.; Liu, Z.; Sedra, D.; Weinberger, K. Deep Networks with Stochastic Depth. arXiv 2016, arXiv:1603.09382. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. DropBlock: A regularization method for convolutional networks. arXiv 2018, arXiv:1810.12890. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. arXiv 2017, arXiv:1605.07648. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. arXiv 2018, arXiv:1707.07012. [Google Scholar]
- Gastaldi, X. Shake-Shake regularization. arXiv 2017, arXiv:1705.07485. [Google Scholar]
- Yamada, Y.; Iwamura, M.; Akiba, T.; Kise, K. Shakedrop Regularization for Deep Residual Learning. IEEE Access 2019, 7, 186126–186136. [Google Scholar] [CrossRef]
- Goodfellow, I.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. Maxout Networks. In Proceedings of the 30th International Conference on Machine Learning; Dasgupta, S., McAllester, D., Eds.; PMLR: Atlanta, GA, USA, 2013; Volume 28, pp. 1319–1327. [Google Scholar]
- Tseng, H.Y.; Chen, Y.W.; Tsai, Y.H.; Liu, S.; Lin, Y.Y.; Yang, M.H. Regularizing Meta-Learning via Gradient Dropout. arXiv 2020, arXiv:2004.05859. [Google Scholar]
- Ba, J.; Frey, B. Adaptive dropout for training deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Lake Tahoe, NA, USA, 2013; Volume 26. [Google Scholar]
- Gomez, A.N.; Zhang, I.; Kamalakara, S.R.; Madaan, D.; Swersky, K.; Gal, Y.; Hinton, G.E. Learning Sparse Networks Using Targeted Dropout. arXiv 2019, arXiv:1905.13678. [Google Scholar]
- Lin, H.; Zeng, W.; Ding, X.; Huang, Y.; Huang, C.; Paisley, J. Learning Rate Dropout. arXiv 2019, arXiv:1912.00144. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Nair, V.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2019. [Google Scholar]
- Seewald, A.K. Digits—A Dataset for Handwritten Digit Recognition; Institute for Artificial Intelligence: Vienna, Austria, 2005. [Google Scholar]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
- Deng, L. The Mnist Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning NIPS Workshop on Deep Learning and Unsupervised Feature Learning; Springer: Granada, Spain, 2011. [Google Scholar]
- Coates, A.; Ng, A.; Lee, H. An Analysis of Single Layer Networks in Unsupervised Feature Learning. In Proceedings of the Artificial Intelligence and Statistics AISTATS, Ft. Lauderdale, FL, USA, 2011; Available online: https://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf (accessed on 11 December 2022).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2014, arXiv:1312.6114. [Google Scholar]
- Lu, L. Dying ReLU and Initialization: Theory and Numerical Examples. Commun. Comput. Phys. 2020, 28, 1671–1706. [Google Scholar] [CrossRef]
CIFAR10 | |||||||||
---|---|---|---|---|---|---|---|---|---|
# | acc | epoch | conv1 | conv2 | conv3 | conv4 | fc1 | int | wMask |
v | 72.8 | 70 | |||||||
1 | 73.26 | 93 | 0.01 | 0.01 | 0.05 | 0.08 | 0.5 | 10 | 10 |
2 | 72.97 | 49 | 0.001 | 0.002 | 0.003 | 0.05 | 0.4 | 1 | 1 |
3 | 72.74 | 78 | 0.001 | 0.002 | 0.004 | 0.01 | 0.7 | 10 | 10 |
4 | 72.25 | 51 | 0.0 | 0.0 | 0.0 | 0.0 | 0.7 | 1 | - |
5 | 71.9 | 6 | 0.01 | 0.02 | 0.05 | 0.1 | 0.4 | 1 | 10 |
USPS | |||||||||
---|---|---|---|---|---|---|---|---|---|
# | acc | epoch | conv1 | conv2 | conv3 | conv4 | fc1 | int | wMask |
v | 96.21 | 94 | |||||||
1 | 96.71 | 49 | 0.0 | 0.0 | 0.0 | 0.0 | 0.7 | 1 | - |
2 | 96.71 | 85 | 0.01 | 0.01 | 0.02 | 0.03 | 0.3 | 1 | 4 |
3 | 96.46 | 28 | 0.002 | 0.002 | 0.003 | 0.05 | 0.2 | 1 | 1 |
4 | 96.41 | 92 | 0.001 | 0.001 | 0.01 | 0.2 | 0.5 | 5 | 1 |
5 | 96.41 | 53 | 0.001 | 0.001 | 0.002 | 0.01 | 0.1 | 1 | 20 |
Fashion-MNIST | |||||||||
---|---|---|---|---|---|---|---|---|---|
# | acc | epoch | conv1 | conv2 | conv3 | conv4 | fc1 | int | wMask |
v | 90.51 | 48 | |||||||
1 | 92.55 | 28 | 0.0 | 0.002 | 0.007 | 0.01 | 0.5 | 1 | 1 |
2 | 92.26 | 35 | 0.008 | 0.008 | 0.008 | 0.008 | 0.8 | 1 | 1 |
3 | 91.84 | 23 | 0.005 | 0.01 | 0.05 | 0.1 | 0.5 | 1 | 1 |
4 | 91.44 | 34 | 0.005 | 0.005 | 0.01 | 0.02 | 0.5 | 1 | 1 |
5 | 91.36 | 43 | 0.01 | 0.01 | 0.01 | 0.2 | 0.5 | 1 | 1 |
STL-10 | |||||||||
---|---|---|---|---|---|---|---|---|---|
# | acc | epoch | conv1 | conv2 | conv3 | conv4 | fc1 | int | wMask |
v | 50.08 | 99 | |||||||
1 | 52.26 | 22 | 0.001 | 0.001 | 0.002 | 0.05 | 0.2 | 5 | 1 |
2 | 51.76 | 100 | 0.001 | 0.002 | 0.01 | 0.01 | 0.1 | 2 | 1 |
3 | 50.86 | 28 | 0.1 | 0.1 | 0.15 | 0.2 | 0.4 | 5 | 1 |
4 | 50.79 | 12 | 0.02 | 0.02 | 0.1 | 0.2 | 0.4 | 5 | 1 |
5 | 50.56 | 48 | 0.001 | 0.001 | 0.002 | 0.01 | 0.3 | 5 | 5 |
SVHN | |||||||||
---|---|---|---|---|---|---|---|---|---|
# | acc | epoch | conv1 | conv2 | conv3 | conv4 | fc1 | int | wMask |
v | 90.19 | 17 | |||||||
1 | 90.45 | 12 | 0.001 | 0.001 | 0.002 | 0.01 | 0.6 | 1 | 1 |
2 | 90.43 | 11 | 0.001 | 0.003 | 0.008 | 0.01 | 0.5 | 1 | 1 |
3 | 90.36 | 12 | 0.001 | 0.001 | 0.01 | 0.02 | 0.5 | 1 | 1 |
4 | 90.25 | 26 | 0.001 | 0.001 | 0.002 | 0.005 | 0.6 | 1 | 1 |
5 | 90.17 | 5 | 0.001 | 0.003 | 0.008 | 0.01 | 0.3 | 1 | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Avgerinos, C.; Vretos, N.; Daras, P. Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks. Sensors 2023, 23, 1325. https://doi.org/10.3390/s23031325
Avgerinos C, Vretos N, Daras P. Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks. Sensors. 2023; 23(3):1325. https://doi.org/10.3390/s23031325
Chicago/Turabian StyleAvgerinos, Christos, Nicholas Vretos, and Petros Daras. 2023. "Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks" Sensors 23, no. 3: 1325. https://doi.org/10.3390/s23031325
APA StyleAvgerinos, C., Vretos, N., & Daras, P. (2023). Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks. Sensors, 23(3), 1325. https://doi.org/10.3390/s23031325