Light Recurrent Unit: Towards an Interpretable Recurrent Neural Network for Modeling Long-Range Dependency
Abstract
:1. Introduction
- The proposed LRU introduces a latent hidden state that is highly interpretable. It has a minimal number of gates in all possible gated recurrent structures: only one gate to control whether the past memories should be kept or not, so that the requirement for training data, model tuning, and training time can be reduced, while at the same time, the model accuracy is maintained.
- The proposed LRU leverages the Stack Recurrent Cells (SRCs) to modify the activation function, consequently improving the gradient flow in deep networks. This modification leads to accelerated convergence rates of the network and enhances the interpretability of the model learning parameter.
- Experimental results on various tasks demonstrate that LRU can keep long-term memory to better process long sequences. Despite reduced model complexity, LRU has overall better accuracy as well as faster convergence speed compared to LSTM.
2. Background for RNN
2.1. RNN with Special Initialization
2.2. RNN with Structure Constraints
3. Proposed Model
3.1. RNN and LSTM
3.2. Proposed LRU
- The portion that is remembered in the last state , for each component;
- The portion that is added to from for each component.
3.3. Stack Recurrent Cells
- : Input sequence.
- : Hidden state of layer l at time step t.
- : Candidate hidden state of layer l at time step t.
- : Forget gate of layer l at time step t.
- : Weight matrix for candidate hidden state of layer l.
- : Weight matrix for forget gate of layer l.
- : Weight matrix for forget gate input of layer l.
- : Bias for forget gate of layer l.
- : Activation function (sigmoid).
- : Activation function (hyperbolic tangent).
- ⊙: Element-wise multiplication.
Algorithm 1 Computation of an L-layered Light Recurrent Unit (LRU) |
|
3.4. Analysis
4. Experiments
4.1. The Adding Problem
4.1.1. Task Description
4.1.2. Setup
4.1.3. Results
4.2. MNIST Handwritten Digit Classification
4.2.1. Dataset
4.2.2. Setup
4.2.3. Results
4.2.4. Weight Visualization
4.3. Language Modeling
4.3.1. Dataset
4.3.2. Setup
4.3.3. Results
4.4. Ball Bearing Health Monitoring
4.4.1. Dataset
4.4.2. Setup
4.4.3. Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Andrianandrianina Johanesa, T.V.; Equeter, L.; Mahmoudi, S.A. Survey on AI Applications for Product Quality Control and Predictive Maintenance in Industry 4.0. Electronics 2024, 13, 976. [Google Scholar] [CrossRef]
- Xie, Z.; Du, S.; Lv, J.; Deng, Y.; Jia, S. A hybrid prognostics deep learning model for remaining useful life prediction. Electronics 2020, 10, 39. [Google Scholar] [CrossRef]
- Song, H.; Choi, H. Forecasting stock market indices using the recurrent neural network based hybrid models: Cnn-lstm, gru-cnn, and ensemble models. Appl. Sci. 2023, 13, 4644. [Google Scholar] [CrossRef]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef]
- Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies; Wiley-IEEE Press: New York, NY, USA, 2001. [Google Scholar]
- Zhao, J.; Huang, F.; Lv, J.; Duan, Y.; Qin, Z.; Li, G.; Tian, G. Do RNN and LSTM have long memory? In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 11365–11375. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
- Landi, F.; Baraldi, L.; Cornia, M.; Cucchiara, R. Working memory connections for LSTM. Neural Netw. 2021, 144, 334–341. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
- Yadav, H.; Thakkar, A. NOA-LSTM: An efficient LSTM cell architecture for time series forecasting. Expert Syst. Appl. 2024, 238, 122333. [Google Scholar] [CrossRef]
- Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
- Zhang, J.; Xie, X.; Peng, G.; Liu, L.; Yang, H.; Guo, R.; Cao, J.; Yang, J. A Real-Time and Privacy-Preserving Facial Expression Recognition System Using an AI-Powered Microcontroller. Electronics 2024, 13, 2791. [Google Scholar] [CrossRef]
- Al-Nader, I.; Lasebae, A.; Raheem, R.; Khoshkholghi, A. A Novel Scheduling Algorithm for Improved Performance of Multi-Objective Safety-Critical Wireless Sensor Networks Using Long Short-Term Memory. Electronics 2023, 12, 4766. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Huang, C.; Tu, Y.; Han, Z.; Jiang, F.; Wu, F.; Jiang, Y. Examining the relationship between peer feedback classified by deep learning and online learning burnout. Comput. Educ. 2023, 207, 104910. [Google Scholar] [CrossRef]
- Zheng, W.; Gong, G.; Tian, J.; Lu, S.; Wang, R.; Yin, Z.; Yin, L. Design of a Modified Transformer Architecture Based on Relative Position Coding. Int. J. Comput. Intell. Syst. 2023, 16, 168. [Google Scholar] [CrossRef]
- Pirani, M.; Thakkar, P.; Jivrani, P.; Bohara, M.H.; Garg, D. A comparative analysis of ARIMA, GRU, LSTM and BiLSTM on financial time series forecasting. In Proceedings of the 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 23–24 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Lindemann, B.; Maschler, B.; Sahlab, N.; Weyrich, M. A survey on anomaly detection for technical systems using LSTM networks. Comput. Ind. 2021, 131, 103498. [Google Scholar] [CrossRef]
- Al Hamoud, A.; Hoenig, A.; Roy, K. Sentence subjectivity analysis of a political and ideological debate dataset using LSTM and BiLSTM with attention and GRU models. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 7974–7987. [Google Scholar] [CrossRef]
- Le, Q.V.; Jaitly, N.; Hinton, G.E. A simple way to initialize recurrent networks of rectified linear units. arXiv 2015, arXiv:1504.00941. [Google Scholar]
- Wang, J.; Li, X.; Li, J.; Sun, Q.; Wang, H. NGCU: A new RNN model for time-series data prediction. Big Data Res. 2022, 27, 100296. [Google Scholar] [CrossRef]
- Neyshabur, B.; Wu, Y.; Salakhutdinov, R.R.; Srebro, N. Path-normalized optimization of recurrent neural networks with relu activations. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3477–3485. [Google Scholar]
- Arjovsky, M.; Shah, A.; Bengio, Y. Unitary evolution recurrent neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1120–1128. [Google Scholar]
- Talathi, S.S.; Vartak, A. Improving performance of recurrent neural network with relu nonlinearity. arXiv 2015, arXiv:1511.03771. [Google Scholar]
- Dhruv, P.; Naskar, S. Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): A review. In Machine Learning and Information Processing: Proceedings of ICMLIP 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 367–381. [Google Scholar]
- Mikolov, T.; Joulin, A.; Chopra, S.; Mathieu, M.; Ranzato, M. Learning longer memory in recurrent neural networks. arXiv 2014, arXiv:1412.7753. [Google Scholar]
- Hu, Y.; Huber, A.; Anumula, J.; Liu, S.C. Overcoming the vanishing gradient problem in plain recurrent networks. arXiv 2018, arXiv:1801.06105. [Google Scholar]
- Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. In Proceedings of the 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), Edinburgh, UK, 7–10 September 1999. [Google Scholar]
- Ali, M.H.E.; Abdellah, A.R.; Atallah, H.A.; Ahmed, G.S.; Muthanna, A.; Koucheryavy, A. Deep Learning Peephole LSTM Neural Network-Based Channel State Estimators for OFDM 5G and Beyond Networks. Mathematics 2023, 11, 3386. [Google Scholar] [CrossRef]
- Jozefowicz, R.; Zaremba, W.; Sutskever, I. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 2342–2350. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 13 December 2014. [Google Scholar]
- Zhou, G.B.; Wu, J.; Zhang, C.L.; Zhou, Z.H. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 2016, 13, 226–234. [Google Scholar] [CrossRef]
- Ravanelli, M.; Brakel, P.; Omologo, M.; Bengio, Y. Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 92–102. [Google Scholar] [CrossRef]
- Khan, M.; Wang, H.; Riaz, A.; Elfatyany, A.; Karim, S. Bidirectional LSTM-RNN-based hybrid deep learning frameworks for univariate time series classification. J. Supercomput. 2021, 77, 7021–7045. [Google Scholar] [CrossRef]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Oliva, J.B.; Póczos, B.; Schneider, J. The statistical recurrent unit. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, Sydney, Australia, 6–11 August 2017; pp. 2671–2680. [Google Scholar]
- Srivastava, R.K.; Greff, K.; Schmidhuber, J. Training very deep networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2377–2385. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Merity, S.; Keskar, N.S.; Socher, R. Regularizing and optimizing LSTM language models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent neural network regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
- Gal, Y.; Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 1019–1027. [Google Scholar]
- Inan, H.; Khosravi, K.; Socher, R. Tying word vectors and word classifiers: A loss framework for language modeling. arXiv 2016, arXiv:1611.01462. [Google Scholar]
- Zilly, J.G.; Srivastava, R.K.; Koutník, J.; Schmidhuber, J. Recurrent highway networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, Sydney, Australia, 6–11 August 2017; pp. 4189–4198. [Google Scholar]
- Melis, G.; Dyer, C.; Blunsom, P. On the state of the art of evaluation in neural language models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Che, C.; Wang, H.; Xiong, M.; Ni, X. Few-shot fault diagnosis of rolling bearing under variable working conditions based on ensemble meta-learning. Digit. Signal Process. 2022, 131, 103777. [Google Scholar] [CrossRef]
Model | MNIST (%) | pMNIST (%) |
---|---|---|
SRU [38] | 89.0 | - |
IRNN [22] | 95.0 | 82.0 |
IRNN (our implementation) | 96.1 | 88.9 |
uRNN [25] | 95.1 | 91.4 |
RIN [29] | 95.4 | 86.2 |
np-RNN [24] | 96.8 | - |
LSTM (our implementation) | 97.0 | 86.9 |
LRU (our implementation) | 98.5 | 91.5 |
PTB | WT-2 | |||||
---|---|---|---|---|---|---|
Model | #Params | Val | Test | #Params | Val | Test |
LSTM [32] | 20 M | 83.3 | 79.8 | - | - | - |
LSTM+regularization [43] | 20 M | 86.2 | 82.7 | - | - | - |
LSTM+regularization [43] | 66 M | 82.2 | 78.4 | - | - | - |
Variational LSTM [44] | 20 M | 81.8 | 79.7 | - | - | - |
Variational LSTM [44] | 66 M | 77.3 | 75.0 | - | - | - |
Variational LSTM+augmented loss [45] | 24 M | 75.7 | 73.2 | 28 M | 91.5 | 87.0 |
Variational LSTM+augmented loss [45] | 51 M | 71.7 | 68.5 | - | - | - |
Variational Recurrent Highway Network [46] | 23 M | 67.9 | 65.4 | - | - | - |
4-layer skip-connection LSTM [47] | 24 M | 60.9 | 58.3 | 24 M | 69.1 | 65.9 |
AWD-LSTM [42] | 24 M | 60.7 | 58.8 | 33 M | 69.1 | 66.0 |
LRU | 8 M | 60.5 | 58.2 | 8 M | 69.4 | 66.1 |
Fault Depth | Fault Type | Fault Abbreviations |
---|---|---|
Healthy bearing | N | |
Inner race | IRF_007 | |
Ball | BF_007 | |
0.007 inch | Outer race (lefted) | ORF1_007 |
Outer race (Orthogonal) | ORF2_007 | |
Outer race (Opposite) | ORF3_007 | |
Inner race | IRF_0014 | |
0.014 inch | Ball | BF_0014 |
Outer race (lefted) | ORF_0014 | |
Inner race | IRF_0021 | |
Ball | BF_0021 | |
0.021 inch | Outer race (lefted) | ORF1_0021 |
Outer race (Orthogonal) | ORF2_0021 | |
Outer race (Opposite) | ORF3_0021 | |
0.028 inch | Inner race | IRF_028 |
References | Models | Accuracy |
---|---|---|
[48] | CNN | 90.46% |
MAML | 92.51% | |
Reptile | 92.63% | |
Reptile with GC | 93.48% | |
EML | 98.78% | |
Ours | LRU | 97.14% |
(a) | (b) | ||
---|---|---|---|
Learning Rate | Accuracy (%) | Batch Size | Accuracy (%) |
0.05 | 97.14 | 64 | 97.06 |
0.01 | 96.03 | 128 | 97.14 |
0.005 | 80.1 | 256 | 95.90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, H.; Zhang, Y.; Liu, H.; Li, X.; Chang, J.; Zheng, H. Light Recurrent Unit: Towards an Interpretable Recurrent Neural Network for Modeling Long-Range Dependency. Electronics 2024, 13, 3204. https://doi.org/10.3390/electronics13163204
Ye H, Zhang Y, Liu H, Li X, Chang J, Zheng H. Light Recurrent Unit: Towards an Interpretable Recurrent Neural Network for Modeling Long-Range Dependency. Electronics. 2024; 13(16):3204. https://doi.org/10.3390/electronics13163204
Chicago/Turabian StyleYe, Hong, Yibing Zhang, Huizhou Liu, Xuannong Li, Jiaming Chang, and Hui Zheng. 2024. "Light Recurrent Unit: Towards an Interpretable Recurrent Neural Network for Modeling Long-Range Dependency" Electronics 13, no. 16: 3204. https://doi.org/10.3390/electronics13163204
APA StyleYe, H., Zhang, Y., Liu, H., Li, X., Chang, J., & Zheng, H. (2024). Light Recurrent Unit: Towards an Interpretable Recurrent Neural Network for Modeling Long-Range Dependency. Electronics, 13(16), 3204. https://doi.org/10.3390/electronics13163204