Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications
Abstract
:1. Introduction
2. Related Works
3. Fundamentals of RNNs
3.1. Basic Architecture and Working Principle of Standard RNNs
3.2. Activation Functions
3.3. The Vanishing and Exploding Gradient Problems
3.4. Bidirectional RNNs
3.5. Deep RNNs
4. Advanced Variants of RNNs
4.1. Long Short-Term Memory Networks
4.2. Bidirectional LSTM
Stacked LSTM
4.3. Gated Recurrent Units
Comparison with LSTM
4.4. Other Notable Variants
4.4.1. Peephole LSTM
4.4.2. Echo State Networks
- Deep Echo-State Networks: Recent research has extended the ESN architecture to deeper variants, known as deep echo-state networks (DeepESNs). In DeepESNs, multiple reservoir layers are stacked, allowing the network to capture hierarchical temporal features across different timescales [87]. Each layer in a DeepESN processes the output from the previous layer’s reservoir, enabling the model to learn more abstract and complex representations of the input data. The state update for a DeepESN can be generalized as follows:
- Ensemble Deep ESNs: In ensemble deep ESNs, multiple DeepESNs are trained independently, and their outputs are combined to form the final prediction [89]. This ensemble approach leverages the diversity of the reservoirs and the deep architecture to improve robustness and accuracy, particularly in time series forecasting applications. For instance, Gao et al. [90] demonstrated the effectiveness of Deep ESN ensembles in predicting significant wave heights, where the ensemble approach helped mitigate the impact of reservoir initialization variability and improved the model’s generalization ability.
- Input Processing with Signal Decomposition: Another critical aspect of effectively utilizing RNNs and ESNs is the preprocessing of input signals. Given the complex and often noisy nature of real-world time series data, signal decomposition techniques such as the empirical wavelet transform (EWT) have been employed to enhance the input to ESNs [91]. The EWT decomposes the input signal into different frequency components, allowing the ESN to process each component separately and improve the model’s ability to capture underlying patterns. The combination of the EWT with ESNs has shown promising results in various applications, including time series forecasting, where it helps reduce noise and enhance the predictive performance of the model.
4.4.3. Independently Recurrent Neural Network
5. Innovations in RNN Architectures and Training Methodologies
5.1. Hybrid Architectures
5.2. Neural Architecture Search
5.3. Advanced Optimization Techniques
5.4. RNNs with Attention Mechanisms
5.5. RNNs Integrated with Transformer Models
6. Public Datasets for RNN Research
7. Applications of RNNs in Peer-Reviewed Literature
7.1. Natural Language Processing
7.1.1. Text Generation
7.1.2. Sentiment Analysis
7.1.3. Machine Translation
7.2. Speech Recognition
7.3. Time Series Forecasting
7.4. Signal Processing
7.5. Bioinformatics
7.6. Autonomous Vehicles
7.7. Anomaly Detection
8. Challenges and Future Research Directions
8.1. Scalability and Efficiency
8.2. Interpretability and Explainability
8.3. Bias and Fairness
8.4. Data Dependency and Quality
8.5. Overfitting and Generalization
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
ANN | Artificial neural network |
BiLSTM | Bidirectional long short-term memory |
CNN | Convolutional neural network |
DL | Deep learning |
GRU | Gated recurrent unit |
LSTM | Long short-term memory |
ML | Machine learning |
NAS | Neural architecture search |
NLP | Natural language processing |
RNN | Recurrent neural network |
RL | Reinforcement learning |
SHAPs | Shapley Additive Explanations |
TPU | Tensor processing unit |
VAE | Variational autoencoder |
References
- O’Halloran, T.; Obaido, G.; Otegbade, B.; Mienye, I.D. A deep learning approach for Maize Lethal Necrosis and Maize Streak Virus disease detection. Mach. Learn. Appl. 2024, 16, 100556. [Google Scholar] [CrossRef]
- Peng, Y.; He, L.; Hu, D.; Liu, Y.; Yang, L.; Shang, S. Decoupling Deep Learning for Enhanced Image Recognition Interpretability. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Khan, W.; Daud, A.; Khan, K.; Muhammad, S.; Haq, R. Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends. Nat. Lang. Process. J. 2023, 4, 100026. [Google Scholar] [CrossRef]
- Obaido, G.; Achilonu, O.; Ogbuokiri, B.; Amadi, C.S.; Habeebullahi, L.; Ohalloran, T.; Chukwu, C.W.; Mienye, E.; Aliyu, M.; Fasawe, O.; et al. An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble. IEEE Access 2024, 12, 89098–89112. [Google Scholar] [CrossRef]
- Mienye, I.D.; Obaido, G.; Aruleba, K.; Dada, O.A. Enhanced Prediction of Chronic Kidney Disease using Feature Selection and Boosted Classifiers. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Virtual, 13–15 December 2021; pp. 527–537. [Google Scholar]
- Al-Jumaili, A.H.A.; Muniyandi, R.C.; Hasan, M.K.; Paw, J.K.S.; Singh, M.J. Big data analytics using cloud computing based frameworks for power management systems: Status, constraints, and future recommendations. Sensors 2023, 23, 2952. [Google Scholar] [CrossRef]
- Gill, S.S.; Wu, H.; Patros, P.; Ottaviani, C.; Arora, P.; Pujol, V.C.; Haunschild, D.; Parlikad, A.K.; Cetinkaya, O.; Lutfiyya, H.; et al. Modern computing: Vision and challenges. Telemat. Inform. Rep. 2024, 13, 100116. [Google Scholar] [CrossRef]
- Mienye, I.D.; Jere, N. A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access 2024, 12, 86716–86727. [Google Scholar] [CrossRef]
- Aruleba, R.T.; Adekiya, T.A.; Ayawei, N.; Obaido, G.; Aruleba, K.; Mienye, I.D.; Aruleba, I.; Ogbuokiri, B. COVID-19 diagnosis: A review of rapid antigen, RT-PCR and artificial intelligence methods. Bioengineering 2022, 9, 153. [Google Scholar] [CrossRef]
- Alhajeri, M.S.; Ren, Y.M.; Ou, F.; Abdullah, F.; Christofides, P.D. Model predictive control of nonlinear processes using transfer learning-based recurrent neural networks. Chem. Eng. Res. Des. 2024, 205, 1–12. [Google Scholar] [CrossRef]
- Shahinzadeh, H.; Mahmoudi, A.; Asilian, A.; Sadrarhami, H.; Hemmati, M.; Saberi, Y. Deep Learning: A Overview of Theory and Architectures. In Proceedings of the 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), Babol, Iran, 21–22 February 2024; pp. 1–11. [Google Scholar]
- Baruah, R.D.; Organero, M.M. Explicit Context Integrated Recurrent Neural Network for applications in smart environments. Expert Syst. Appl. 2024, 255, 124752. [Google Scholar] [CrossRef]
- Werbos, P. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef]
- Lalapura, V.S.; Amudha, J.; Satheesh, H.S. Recurrent neural networks for edge intelligence: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Liu, F.; Li, J.; Wang, L. PI-LSTM: Physics-informed long short-term memory network for structural response modeling. Eng. Struct. 2023, 292, 116500. [Google Scholar] [CrossRef]
- Ni, Q.; Ji, J.; Feng, K.; Zhang, Y.; Lin, D.; Zheng, J. Data-driven bearing health management using a novel multi-scale fused feature and gated recurrent unit. Reliab. Eng. Syst. Saf. 2024, 242, 109753. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yue, G.; Wang, L.N.; Yu, H.; Ling, X.; Dong, J. Recurrent attention unit: A new gated recurrent unit for long-term memory of important parts in sequential data. Neurocomputing 2023, 517, 1–9. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
- Tarwani, K.M.; Edem, S. Survey on recurrent neural network in natural language processing. Int. J. Eng. Trends Technol. 2017, 48, 301–304. [Google Scholar] [CrossRef]
- Tsoi, A.C.; Back, A.D. Locally recurrent globally feedforward networks: A critical review of architectures. IEEE Trans. Neural Netw. 1994, 5, 229–239. [Google Scholar] [CrossRef] [PubMed]
- Mastorocostas, P.A.; Theocharis, J.B. A stable learning algorithm for block-diagonal recurrent neural networks: Application to the analysis of lung sounds. IEEE Trans. Syst. Man. Cybern. Part B (Cybern.) 2006, 36, 242–254. [Google Scholar] [CrossRef] [PubMed]
- Dutta, K.K.; Poornima, S.; Sharma, R.; Nair, D.; Ploeger, P.G. Applications of Recurrent Neural Network: Overview and Case Studies. In Recurrent Neural Networks; CRC Press: Boca Raton, FL, USA, 2022; pp. 23–41. [Google Scholar]
- Quradaa, F.H.; Shahzad, S.; Almoqbily, R.S. A systematic literature review on the applications of recurrent neural networks in code clone research. PLoS ONE 2024, 19, e0296858. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud-Univ.-Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
- Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent neural network regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Badawy, M.; Ramadan, N.; Hefny, H.A. Healthcare predictive analytics using machine learning and deep learning techniques: A survey. J. Electr. Syst. Inf. Technol. 2023, 10, 40. [Google Scholar] [CrossRef]
- Ismaeel, A.G.; Janardhanan, K.; Sankar, M.; Natarajan, Y.; Mahmood, S.N.; Alani, S.; Shather, A.H. Traffic pattern classification in smart cities using deep recurrent neural network. Sustainability 2023, 15, 14522. [Google Scholar] [CrossRef]
- Mers, M.; Yang, Z.; Hsieh, Y.A.; Tsai, Y. Recurrent neural networks for pavement performance forecasting: Review and model performance comparison. Transp. Res. Rec. 2023, 2677, 610–624. [Google Scholar] [CrossRef]
- Chen, Y.; Cheng, Q.; Cheng, Y.; Yang, H.; Yu, H. Applications of recurrent neural networks in environmental factor forecasting: A review. Neural Comput. 2018, 30, 2855–2881. [Google Scholar] [CrossRef] [PubMed]
- Linardos, V.; Drakaki, M.; Tzionas, P.; Karnavas, Y.L. Machine learning in disaster management: Recent developments in methods and applications. Mach. Learn. Knowl. Extr. 2022, 4, 446–473. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, H.; Chang, Q.; Wang, L.; Gao, R.X. Recurrent neural network for motion trajectory prediction in human-robot collaborative assembly. CIRP Ann. 2020, 69, 9–12. [Google Scholar] [CrossRef]
- Tsantekidis, A.; Passalis, N.; Tefas, A. Recurrent Neural Networks. In Deep Learning for Robot Perception and Cognition; Elsevier: Amsterdam, The Netherlands, 2022; pp. 101–115. [Google Scholar]
- Mienye, I.D.; Jere, N. Deep Learning for Credit Card Fraud Detection: A Review of Algorithms, Challenges, and Solutions. IEEE Access 2024, 12, 96893–96910. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. A machine learning method with hybrid feature selection for improved credit card fraud detection. Appl. Sci. 2023, 13, 7254. [Google Scholar] [CrossRef]
- Rezk, N.M.; Purnaprajna, M.; Nordström, T.; Ul-Abdin, Z. Recurrent neural networks: An embedded computing perspective. IEEE Access 2020, 8, 57967–57996. [Google Scholar] [CrossRef]
- Yu, Y.; Adu, K.; Tashi, N.; Anokye, P.; Wang, X.; Ayidzoe, M.A. Rmaf: Relu-memristor-like activation function for deep learning. IEEE Access 2020, 8, 72727–72741. [Google Scholar] [CrossRef]
- Mienye, I.D.; Ainah, P.K.; Emmanuel, I.D.; Esenogho, E. Sparse Noise Minimization in Image Classification using Genetic Algorithm and DenseNet. In Proceedings of the 2021 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 10–11 March 2021; pp. 103–108. [Google Scholar]
- Ciaburro, G.; Venkateswaran, B. Neural Networks with R: SMART Models Using CNN, RNN, Deep Learning, and Artificial Intelligence Principles; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
- Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv 2018, arXiv:1811.03378. [Google Scholar]
- Szandała, T. Review and comparison of commonly used activation functions for deep neural networks. Bio-Inspired Neurocomp. 2021, 203–224. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Obaido, G.; Mienye, I.D.; Egbelowo, O.F.; Emmanuel, I.D.; Ogunleye, A.; Ogbuokiri, B.; Mienye, P.; Aruleba, K. Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects. Mach. Learn. Appl. 2024, 17, 100576. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. Effective Feature Selection for Improved Prediction of Heart Disease. In Proceedings of the Pan-African Artificial Intelligence and Smart Systems Conference, Durban, South Africa, 4–6 December 2021; pp. 94–107. [Google Scholar]
- Martins, A.; Astudillo, R. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1614–1623. [Google Scholar]
- Bianchi, F.M.; Maiorino, E.; Kampffmeyer, M.C.; Rizzi, A.; Jenssen, R.; Bianchi, F.M.; Maiorino, E.; Kampffmeyer, M.C.; Rizzi, A.; Jenssen, R. Properties and Training in Recurrent Neural Networks. In Recurrent Neural Networks for Short-Term Load Forecasting: An Overview and Comparative Analysis; Springer: Berlin/Heidelberg, Germany, 2017; pp. 9–21. [Google Scholar]
- Mohajerin, N.; Waslander, S.L. State Initialization for Recurrent Neural Network Modeling of Time-Series Data. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2330–2337. [Google Scholar]
- Forgione, M.; Muni, A.; Piga, D.; Gallieri, M. On the adaptation of recurrent neural networks for system identification. Automatica 2023, 155, 111092. [Google Scholar] [CrossRef]
- Zhang, J.; He, T.; Sra, S.; Jadbabaie, A. Why gradient clipping accelerates training: A theoretical justification for adaptivity. arXiv 2019, arXiv:1905.11881. [Google Scholar]
- Qian, J.; Wu, Y.; Zhuang, B.; Wang, S.; Xiao, J. Understanding Gradient Clipping in Incremental Gradient Methods. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 1504–1512. [Google Scholar]
- Fei, H.; Tan, F. Bidirectional grid long short-term memory (bigridlstm): A method to address context-sensitivity and vanishing gradient. Algorithms 2018, 11, 172. [Google Scholar] [CrossRef]
- Dong, X.; Chowdhury, S.; Qian, L.; Li, X.; Guan, Y.; Yang, J.; Yu, Q. Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN. PLoS ONE 2019, 14, e0216046. [Google Scholar] [CrossRef] [PubMed]
- Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Zhou, M.; Duan, N.; Liu, S.; Shum, H.Y. Progress in neural NLP: Modeling, learning, and reasoning. Engineering 2020, 6, 275–290. [Google Scholar] [CrossRef]
- Naseem, U.; Razzak, I.; Khan, S.K.; Prasad, M. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–35. [Google Scholar] [CrossRef]
- Adil, M.; Wu, J.Z.; Chakrabortty, R.K.; Alahmadi, A.; Ansari, M.F.; Ryan, M.J. Attention-based STL-BiLSTM network to forecast tourist arrival. Processes 2021, 9, 1759. [Google Scholar] [CrossRef]
- Min, S.; Park, S.; Kim, S.; Choi, H.S.; Lee, B.; Yoon, S. Pre-training of deep bidirectional protein sequence representations with structural information. IEEE Access 2021, 9, 123912–123926. [Google Scholar] [CrossRef]
- Jain, A.; Zamir, A.R.; Savarese, S.; Saxena, A. Structural-rnn: Deep Learning on Spatio-Temporal Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5308–5317. [Google Scholar]
- Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to construct deep recurrent neural networks. arXiv 2013, arXiv:1312.6026. [Google Scholar]
- Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid 2017, 9, 5271–5280. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Moradi, R.; Berangi, R.; Minaei, B. A survey of regularization strategies for deep models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics 2023, 12, 3106. [Google Scholar] [CrossRef]
- Cai, S.; Shu, Y.; Chen, G.; Ooi, B.C.; Wang, W.; Zhang, M. Effective and efficient dropout for deep convolutional neural networks. arXiv 2019, arXiv:1904.03392. [Google Scholar]
- Garbin, C.; Zhu, X.; Marques, O. Dropout vs. batch normalization: An empirical study of their impact to deep learning. Multimed. Tools Appl. 2020, 79, 12777–12815. [Google Scholar] [CrossRef]
- Borawar, L.; Kaur, R. ResNet: Solving Vanishing Gradient in Deep Networks. In Proceedings of the International Conference on Recent Trends in Computing: ICRTC 2022, Delhi, India, 3–4 June 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 235–247. [Google Scholar]
- Mienye, I.D.; Sun, Y. A deep learning ensemble with data resampling for credit card fraud detection. IEEE Access 2023, 11, 30628–30638. [Google Scholar] [CrossRef]
- Kiperwasser, E.; Goldberg, Y. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Linguist. 2016, 4, 313–327. [Google Scholar] [CrossRef]
- Zhang, W.; Li, H.; Tang, L.; Gu, X.; Wang, L.; Wang, L. Displacement prediction of Jiuxianping landslide using gated recurrent unit (GRU) networks. Acta Geotech. 2022, 17, 1367–1382. [Google Scholar] [CrossRef]
- Cahuantzi, R.; Chen, X.; Güttel, S. A Comparison of LSTM and GRU Networks for Learning Symbolic Sequences. In Proceedings of the Science and Information Conference, Nanchang, China, 2–4 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 771–785. [Google Scholar]
- Shewalkar, A.; Nyavanandi, D.; Ludwig, S.A. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 2019, 9, 235–245. [Google Scholar] [CrossRef]
- Vatanchi, S.M.; Etemadfard, H.; Maghrebi, M.F.; Shad, R. A comparative study on forecasting of long-term daily streamflow using ANN, ANFIS, BiLSTM and CNN-GRU-LSTM. Water Resour. Manag. 2023, 37, 4769–4785. [Google Scholar] [CrossRef]
- Mateus, B.C.; Mendes, M.; Farinha, J.T.; Assis, R.; Cardoso, A.M. Comparing LSTM and GRU models to predict the condition of a pulp paper press. Energies 2021, 14, 6958. [Google Scholar] [CrossRef]
- Gers, F.A.; Schmidhuber, J. Recurrent Nets That Time and Count. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 24–27 July 2000; Volume 3, pp. 189–194. [Google Scholar]
- Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2002, 3, 115–143. [Google Scholar]
- Jaeger, H. Adaptive nonlinear system identification with echo state networks. Adv. Neural Inf. Process. Syst. 2002, 15, 593–600. [Google Scholar]
- Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assessments 2022, 52, 102275. [Google Scholar]
- Sun, C.; Song, M.; Cai, D.; Zhang, B.; Hong, S.; Li, H. A systematic review of echo state networks from design to application. IEEE Trans. Artif. Intell. 2022, 5, 23–37. [Google Scholar] [CrossRef]
- Gallicchio, C.; Micheli, A. Deep echo state network (deepesn): A brief survey. arXiv 2017, arXiv:1712.04323. [Google Scholar]
- Gallicchio, C.; Micheli, A. Richness of Deep Echo State Network Dynamics. In Proceedings of the Advances in Computational Intelligence: 15th International Work-Conference on Artificial Neural Networks, IWANN 2019, Gran Canaria, Spain, 12–14 June 2019, Proceedings, Part I 15; Springer: Berlin/Heidelberg, Germany, 2019; pp. 480–491. [Google Scholar]
- Hu, R.; Tang, Z.R.; Song, X.; Luo, J.; Wu, E.Q.; Chang, S. Ensemble echo network with deep architecture for time-series modeling. Neural Comput. Appl. 2021, 33, 4997–5010. [Google Scholar] [CrossRef]
- Gao, R.; Li, R.; Hu, M.; Suganthan, P.N.; Yuen, K.F. Dynamic ensemble deep echo state network for significant wave height forecasting. Appl. Energy 2023, 329, 120261. [Google Scholar] [CrossRef]
- Gao, R.; Du, L.; Duru, O.; Yuen, K.F. Time series forecasting based on echo state network and empirical wavelet transformation. Appl. Soft Comput. 2021, 102, 107111. [Google Scholar] [CrossRef]
- Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently Recurrent Neural Network (indrnn): Building a Longer and Deeper rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5457–5466. [Google Scholar]
- Yang, J.; Qu, J.; Mi, Q.; Li, Q. A CNN-LSTM model for tailings dam risk prediction. IEEE Access 2020, 8, 206491–206502. [Google Scholar] [CrossRef]
- Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
- Mellor, J.; Turner, J.; Storkey, A.; Crowley, E.J. Neural Architecture Search without Training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 7588–7598. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Chen, X.; Wu, S.Z.; Hong, M. Understanding gradient clipping in private sgd: A geometric perspective. Adv. Neural Inf. Process. Syst. 2020, 33, 13773–13782. [Google Scholar]
- Zhang, Z. Improved Adam Optimizer for Deep Neural Networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar]
- De Santana Correia, A.; Colombini, E.L. Attention, please! A survey of neural attention models in deep learning. Artif. Intell. Rev. 2022, 55, 6037–6124. [Google Scholar] [CrossRef]
- Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–32. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Marcus, M.P.; Marcinkiewicz, M.A.; Santorini, B. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 1993, 19, 313–330. [Google Scholar]
- Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S. TIMIT acoustic-phonetic continuous speech corpus. Linguist. Data Consort. 1993, 93, 27403. [Google Scholar]
- Lewis, D. Reuters-21578 Text Categorization Test Collection; Distribution 1.0; AT&T Labs-Research: Atlanta, GA, USA, 1997. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository; School of Information and Computer Science, University of California: Irvine, CA, USA, 2017. [Google Scholar]
- Lomonaco, V.; Maltoni, D. Core50: A New Dataset and Benchmark for Continuous Object Recognition. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 17–26. [Google Scholar]
- Souri, A.; El Maazouzi, Z.; Al Achhab, M.; El Mohajir, B.E. Arabic Text Generation using Recurrent Neural Networks. In Proceedings of the Big Data, Cloud and Applications: Third International Conference, BDCA 2018, Kenitra, Morocco, 4–5 April 2018; Revised Selected Papers 3; Springer: Berlin/Heidelberg, Germany, 2018; pp. 523–533. [Google Scholar]
- Islam, M.S.; Mousumi, S.S.S.; Abujar, S.; Hossain, S.A. Sequence-to-sequence Bangla sentence generation with LSTM recurrent neural networks. Procedia Comput. Sci. 2019, 152, 51–58. [Google Scholar] [CrossRef]
- Gajendran, S.; Manjula, D.; Sugumaran, V. Character level and word level embedding with bidirectional LSTM–Dynamic recurrent neural network for biomedical named entity recognition from literature. J. Biomed. Inform. 2020, 112, 103609. [Google Scholar] [CrossRef]
- Hu, H.; Liao, M.; Mao, W.; Liu, W.; Zhang, C.; Jing, Y. Variational Auto-Encoder for Text Generation. In Proceedings of the 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 12–14 June 2020; pp. 595–598. [Google Scholar]
- Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The curious case of neural text degeneration. arXiv 2019, arXiv:1904.09751. [Google Scholar]
- Yin, W.; Schütze, H. Attentive convolution: Equipping cnns with rnn-style attention mechanisms. Trans. Assoc. Comput. Linguist. 2018, 6, 687–702. [Google Scholar] [CrossRef]
- Hussein, M.A.H.; Savaş, S. LSTM-Based Text Generation: A Study on Historical Datasets. arXiv 2024, arXiv:2403.07087. [Google Scholar]
- Baskaran, S.; Alagarsamy, S.; S, S.; Shivam, S. Text Generation using Long Short-Term Memory. In Proceedings of the 2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Krishnankoil, India, 14–16 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Keskar, N.S.; McCann, B.; Varshney, L.R.; Xiong, C.; Socher, R. Ctrl: A conditional transformer language model for controllable generation. arXiv 2019, arXiv:1909.05858. [Google Scholar]
- Guo, H. Generating text with deep reinforcement learning. arXiv 2015, arXiv:1510.09202. [Google Scholar]
- Yadav, V.; Verma, P.; Katiyar, V. Long short term memory (LSTM) model for sentiment analysis in social data for e-commerce products reviews in Hindi languages. Int. J. Inf. Technol. 2023, 15, 759–772. [Google Scholar] [CrossRef]
- Abimbola, B.; de La Cal Marin, E.; Tan, Q. Enhancing Legal Sentiment Analysis: A Convolutional Neural Network–Long Short-Term Memory Document-Level Model. Mach. Learn. Knowl. Extr. 2024, 6, 877–897. [Google Scholar] [CrossRef]
- Zulqarnain, M.; Ghazali, R.; Aamir, M.; Hassim, Y.M.M. An efficient two-state GRU based on feature attention mechanism for sentiment analysis. Multimed. Tools Appl. 2024, 83, 3085–3110. [Google Scholar] [CrossRef]
- Pujari, P.; Padalia, A.; Shah, T.; Devadkar, K. Hybrid CNN and RNN for Twitter Sentiment Analysis. In Proceedings of the International Conference on Smart Computing and Communication; Springer: Berlin/Heidelberg, Germany, 2024; pp. 297–310. [Google Scholar]
- Wankhade, M.; Annavarapu, C.S.R.; Abraham, A. CBMAFM: CNN-BiLSTM multi-attention fusion mechanism for sentiment classification. Multimed. Tools Appl. 2024, 83, 51755–51786. [Google Scholar] [CrossRef]
- Sangeetha, J.; Kumaran, U. A hybrid optimization algorithm using BiLSTM structure for sentiment analysis. Meas. Sensors 2023, 25, 100619. [Google Scholar] [CrossRef]
- He, R.; McAuley, J. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [Google Scholar]
- Samir, A.; Elkaffas, S.M.; Madbouly, M.M. Twitter Sentiment Analysis using BERT. In Proceedings of the 2021 31st International Conference on Computer Theory and Applications (ICCTA), Kochi, Kerala, India, 17–19 August 2021; pp. 182–186. [Google Scholar]
- Prottasha, N.J.; Sami, A.A.; Kowsher, M.; Murad, S.A.; Bairagi, A.K.; Masud, M.; Baz, M. Transfer learning for sentiment analysis using BERT based supervised fine-tuning. Sensors 2022, 22, 4157. [Google Scholar] [CrossRef]
- Mujahid, M.; Rustam, F.; Shafique, R.; Chunduri, V.; Villar, M.G.; Ballester, J.B.; Diez, I.d.l.T.; Ashraf, I. Analyzing sentiments regarding ChatGPT using novel BERT: A machine learning approach. Information 2023, 14, 474. [Google Scholar] [CrossRef]
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909. [Google Scholar]
- Kang, L.; He, S.; Wang, M.; Long, F.; Su, J. Bilingual attention based neural machine translation. Appl. Intell. 2023, 53, 4302–4315. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Salakhutdinov, R.; Cohen, W.W. Breaking the softmax bottleneck: A high-rank RNN language model. arXiv 2017, arXiv:1711.03953. [Google Scholar]
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv 2019, arXiv:1905.02450. [Google Scholar]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.r.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Hannun, A.; Case, C.; Casper, J.; Catanzaro, B.; Diamos, G.; Elsen, E.; Prenger, R.; Satheesh, S.; Sengupta, S.; Coates, A.; et al. Deep speech: Scaling up end-to-end speech recognition. arXiv 2014, arXiv:1412.5567. [Google Scholar]
- Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Casper, J.; Catanzaro, B.; Cheng, Q.; Chen, G.; et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 173–182. [Google Scholar]
- Chiu, C.C.; Sainath, T.N.; Wu, Y.; Prabhavalkar, R.; Nguyen, P.; Chen, Z.; Kannan, A.; Weiss, R.J.; Rao, K.; Gonina, E.; et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 15–20 April 2018; pp. 4774–4778. [Google Scholar]
- Zhang, Y.; Chan, W.; Jaitly, N. Very Deep Convolutional Networks for End-to-End Speech Recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 4845–4849. [Google Scholar]
- Dong, L.; Xu, S.; Xu, B. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5884–5888. [Google Scholar]
- Bhaskar, S.; Thasleema, T. LSTM model for visual speech recognition through facial expressions. Multimed. Tools Appl. 2023, 82, 5455–5472. [Google Scholar] [CrossRef]
- Daouad, M.; Allah, F.A.; Dadi, E.W. An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture. Int. J. Speech Technol. 2023, 26, 775–787. [Google Scholar]
- Dhanjal, A.S.; Singh, W. A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl. 2024, 83, 23367–23412. [Google Scholar] [CrossRef]
- Nasr, S.; Duwairi, R.; Quwaider, M. End-to-end speech recognition for arabic dialects. Arab. J. Sci. Eng. 2023, 48, 10617–10633. [Google Scholar] [CrossRef]
- Kumar, D.; Aziz, S. Performance Evaluation of Recurrent Neural Networks-LSTM and GRU for Automatic Speech Recognition. In Proceedings of the 2023 International Conference on Computer, Electronics & Electrical Engineering & Their Applications (IC2E3), Srinagar Garhwal, India, 8–9 June 2023; pp. 1–6. [Google Scholar]
- Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
- Nelson, D.M.; Pereira, A.C.; De Oliveira, R.A. Stock Market’s Price Movement Prediction with LSTM Neural Networks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1419–1426. [Google Scholar]
- Luo, A.; Zhong, L.; Wang, J.; Wang, Y.; Li, S.; Tai, W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access 2024, 12, 29617–29632. [Google Scholar] [CrossRef]
- Bao, W.; Yue, J.; Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 2017, 12, e0180944. [Google Scholar] [CrossRef] [PubMed]
- Feng, F.; Chen, H.; He, X.; Ding, J.; Sun, M.; Chua, T.S. Enhancing Stock Movement Prediction with Adversarial Training. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; Volume 19, pp. 5843–5849. [Google Scholar]
- Rundo, F. Deep LSTM with reinforcement learning layer for financial trend prediction in FX high frequency trading systems. Appl. Sci. 2019, 9, 4460. [Google Scholar] [CrossRef]
- Devi, T.; Deepa, N.; Gayathri, N.; Rakesh Kumar, S. AI-Based Weather Forecasting System for Smart Agriculture System Using a Recurrent Neural Networks (RNN) Algorithm. Sustain. Manag. Electron. Waste 2024, 97–112. [Google Scholar]
- Anshuka, A.; Chandra, R.; Buzacott, A.J.; Sanderson, D.; van Ogtrop, F.F. Spatio temporal hydrological extreme forecasting framework using LSTM deep learning model. Stoch. Environ. Res. Risk Assess. 2022, 36, 3467–3485. [Google Scholar] [CrossRef]
- Marulanda, G.; Cifuentes, J.; Bello, A.; Reneses, J. A hybrid model based on LSTM neural networks with attention mechanism for short-term wind power forecasting. Wind. Eng. 2023, 0309524X231191163. [Google Scholar] [CrossRef]
- Chen, W.; An, N.; Jiang, M.; Jia, L. An improved deep temporal convolutional network for new energy stock index prediction. Inf. Sci. 2024, 682, 121244. [Google Scholar] [CrossRef]
- Hasanat, S.M.; Younis, R.; Alahmari, S.; Ejaz, M.T.; Haris, M.; Yousaf, H.; Watara, S.; Ullah, K.; Ullah, Z. Enhancing Load Forecasting Accuracy in Smart Grids: A Novel Parallel Multichannel Network Approach Using 1D CNN and Bi-LSTM Models. Int. J. Energy Res. 2024, 2024, 2403847. [Google Scholar] [CrossRef]
- Asiri, M.M.; Aldehim, G.; Alotaibi, F.; Alnfiai, M.M.; Assiri, M.; Mahmud, A. Short-term load forecasting in smart grids using hybrid deep learning. IEEE Access 2024, 12, 23504–23513. [Google Scholar] [CrossRef]
- Yıldız Doğan, G.; Aksoy, A.; Öztürk, N. A Hybrid Deep Learning Model to Estimate the Future Electricity Demand of Sustainable Cities. Sustainability 2024, 16, 6503. [Google Scholar] [CrossRef]
- Bhambu, A.; Gao, R.; Suganthan, P.N. Recurrent ensemble random vector functional link neural network for financial time series forecasting. Appl. Soft Comput. 2024, 161, 111759. [Google Scholar] [CrossRef]
- Mienye, E.; Jere, N.; Obaido, G.; Mienye, I.D.; Aruleba, K. Deep Learning in Finance: A Survey of Applications and Techniques. Preprints 2024. [Google Scholar] [CrossRef]
- Mastoi, Q.U.A.; Wah, T.Y.; Gopal Raj, R. Reservoir computing based echo state networks for ventricular heart beat classification. Appl. Sci. 2019, 9, 702. [Google Scholar] [CrossRef]
- Valin, J.M.; Tenneti, S.; Helwani, K.; Isik, U.; Krishnaswamy, A. Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based on Percepnet. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 7133–7137. [Google Scholar]
- Li, Y.; Huang, C.; Ding, L.; Li, Z.; Pan, Y.; Gao, X. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019, 166, 4–21. [Google Scholar] [CrossRef]
- Zhang, Y.; Qiao, S.; Ji, S.; Li, Y. DeepSite: Bidirectional LSTM and CNN models for predicting DNA–protein binding. Int. J. Mach. Learn. Cybern. 2020, 11, 841–851. [Google Scholar] [CrossRef]
- Xu, J.; Mcpartlon, M.; Li, J. Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat. Mach. Intell. 2021, 3, 601–609. [Google Scholar] [CrossRef]
- Yadav, S.; Ekbal, A.; Saha, S.; Kumar, A.; Bhattacharyya, P. Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein–protein interaction. Knowl.-Based Syst. 2019, 166, 18–29. [Google Scholar] [CrossRef]
- Aybey, E.; Gümüş, Ö. SENSDeep: An ensemble deep learning method for protein–protein interaction sites prediction. Interdiscip. Sci. Comput. Life Sci. 2023, 15, 55–87. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Du, X.; Cao, Y. DAT-RNN: Trajectory Prediction with Diverse Attention. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 1512–1518. [Google Scholar]
- Lee, M.j.; Ha, Y.g. Autonomous Driving Control Using End-to-End Deep Learning. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; pp. 470–473. [Google Scholar] [CrossRef]
- Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-End Driving via Conditional Imitation Learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 4693–4700. [Google Scholar]
- Altché, F.; de La Fortelle, A. An LSTM Network for Highway Trajectory Prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Abu Dhabi, United Arab Emirates, 25–28 October 2017; pp. 353–359. [Google Scholar]
- Li, P.; Zhang, Y.; Yuan, L.; Xiao, H.; Lin, B.; Xu, X. Efficient long-short temporal attention network for unsupervised video object segmentation. Pattern Recognit. 2024, 146, 110078. [Google Scholar] [CrossRef]
- Li, R.; Shu, X.; Li, C. Driving Behavior Prediction Based on Combined Neural Network Model. IEEE Trans. Comput. Soc. Syst. 2024, 11, 4488–4496. [Google Scholar] [CrossRef]
- Liu, Y.; Diao, S. An automatic driving trajectory planning approach in complex traffic scenarios based on integrated driver style inference and deep reinforcement learning. PLoS ONE 2024, 19, e0297192. [Google Scholar] [CrossRef]
- Altindal, M.C.; Nivlet, P.; Tabib, M.; Rasheed, A.; Kristiansen, T.G.; Khosravanian, R. Anomaly detection in multivariate time series of drilling data. Geoenergy Sci. Eng. 2024, 237, 212778. [Google Scholar] [CrossRef]
- Matar, M.; Xia, T.; Huguenard, K.; Huston, D.; Wshah, S. Multi-Head Attention Based bi-lstm for Anomaly Detection in Multivariate Time-Series of wsn. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–13 June 2023; pp. 1–5. [Google Scholar]
- Kumaresan, S.J.; Senthilkumar, C.; Kongkham, D.; Beenarani, B.; Nirmala, P. Investigating the Effectiveness of Recurrent Neural Networks for Network Anomaly Detection. In Proceedings of the 2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), Bangalore, India, 24–25 January 2024; pp. 1–5. [Google Scholar]
- Li, E.; Bedi, S.; Melek, W. Anomaly detection in three-axis CNC machines using LSTM networks and transfer learning. Int. J. Adv. Manuf. Technol. 2023, 127, 5185–5198. [Google Scholar] [CrossRef]
- Minic, A.; Jovanovic, L.; Bacanin, N.; Stoean, C.; Zivkovic, M.; Spalevic, P.; Petrovic, A.; Dobrojevic, M.; Stoean, R. Applying recurrent neural networks for anomaly detection in electrocardiogram sensor data. Sensors 2023, 23, 9878. [Google Scholar] [CrossRef]
- Zhou, C.; Paffenroth, R.C. Anomaly Detection with Robust Deep Autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 665–674. [Google Scholar]
- Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-Series Anomaly Detection Service at Microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar]
- Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access 2018, 7, 1991–2005. [Google Scholar] [CrossRef]
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
- Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
- Li, X.; Qin, T.; Yang, J.; Liu, T.Y. LightRNN: Memory and computation-efficient recurrent neural networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers Are rnns: Fast Autoregressive Transformers with Linear Attention. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 5156–5165. [Google Scholar]
- Shao, W.; Li, B.; Yu, W.; Xu, J.; Wang, H. When Is It Likely to Fail? Performance Monitor for Black-Box Trajectory Prediction Model. IEEE Trans. Autom. Sci. Eng. 2024, 4, 765–772. [Google Scholar] [CrossRef]
- Jacobs, W.R.; Kadirkamanathan, V.; Anderson, S.R. Interpretable deep learning for nonlinear system identification using frequency response functions with ensemble uncertainty quantification. IEEE Access 2024, 12, 11052–11065. [Google Scholar] [CrossRef]
- Mamalakis, M.; Mamalakis, A.; Agartz, I.; Mørch-Johnsen, L.E.; Murray, G.; Suckling, J.; Lio, P. Solving the enigma: Deriving optimal explanations of deep networks. arXiv 2024, arXiv:2405.10008. [Google Scholar]
- Shah, M.; Sureja, N. A Comprehensive Review of Bias in Deep Learning Models: Methods, Impacts, and Future Directions. Arch. Comput. Methods Eng. 2024, 1–13. [Google Scholar] [CrossRef]
- Goethals, S.; Calders, T.; Martens, D. Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics. arXiv 2024, arXiv:2401.13391. [Google Scholar]
- Weerts, H.; Pfisterer, F.; Feurer, M.; Eggensperger, K.; Bergman, E.; Awad, N.; Vanschoren, J.; Pechenizkiy, M.; Bischl, B.; Hutter, F. Can fairness be automated? Guidelines and opportunities for fairness-aware AutoML. J. Artif. Intell. Res. 2024, 79, 639–677. [Google Scholar] [CrossRef]
- Bai, Y.; Geng, X.; Mangalam, K.; Bar, A.; Yuille, A.L.; Darrell, T.; Malik, J.; Efros, A.A. Sequential Modeling Enables Scalable Learning for Large Vision Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 22861–22872. [Google Scholar]
- Taye, M.M. Understanding of machine learning with deep learning: Architectures, workflow, applications and future directions. Computers 2023, 12, 91. [Google Scholar] [CrossRef]
Reference | Year | Description |
---|---|---|
Zaremba et al. [30] | 2014 | Insights into RNNs in language modeling |
Chung et al. [33] | 2014 | Survey of advancements in RNN training, optimization, and architectures |
Goodfellow et al. [27] | 2016 | Review on deep learning, including RNNs |
Greff et al. [28] | 2016 | Extensive comparison of LSTM variants |
Tarwani et al. [22] | 2017 | In-depth analysis of RNNs in NLP |
Chen et al. [37] | 2018 | Effectiveness of RNNs in environmental monitoring and climate modeling |
Bai et al. [31] | 2018 | Comparison of RNNs with other sequence modeling techniques like CNNs and attention mechanisms |
Che et al. [32] | 2018 | Potential of RNNs in medical applications |
Zhang et al. [39] | 2020 | RNN applications in robotics, including path planning, motion control, and human–robot interaction |
Dutta et al. [25] | 2022 | Overview of RNNs, challenges in training, and advancements in LSTM and GRU for sequence learning |
Linardos et al. [38] | 2022 | RNNs for early warning systems, disaster response, and recovery planning in natural disaster prediction |
Badawy et al. [34] | 2023 | Integration of RNNs with other ML techniques for predictive analytics and patient monitoring in healthcare |
Ismaeel et al. [35] | 2023 | Application of RNNs in smart city technologies, including traffic prediction, energy management, and urban planning |
Mers et al. [36] | 2023 | Performance comparison of various RNN models in pavement performance forecasting |
Quradaa et al. [26] | 2024 | Start-of-the-art review of RNNs, covering core architectures with a focus on applications in code clones |
Al-Selwi et al. [29] | 2024 | Review of LSTM applications from 2018 to 2023 |
RNN Type | Key Features | Gradient Stability | Typical Applications |
---|---|---|---|
Basic RNN | Simple structure with short-term memory | High risk of vanishing gradients | Simple sequence tasks like text generation |
LSTM | Long-term memory with input, forget, and output gates | Stable, handles vanishing gradients well | Language translation, speech recognition |
GRU | Simplified LSTM with fewer gates | Stable, handles vanishing gradients effectively | Tasks requiring faster training than LSTM |
Bidirectional RNN | Processes data in both forward and backward directions for better context understanding | Medium stability, depends on depth | Speech recognition and sentiment analysis |
Deep RNN | Multiple RNN layers are stacked to learn hierarchical features | Variable, and the risk of vanishing gradients increases with depth | Complex sequence modeling like video processing |
ESN | Fixed hidden layer weights, trained only at the output | Not applicable as training bypasses typical gradient issues | Time series prediction and system control |
Peephole LSTM | Adds peephole connections to LSTM gates | Stable and similar to LSTM | Recognition of complex temporal patterns like musical notation |
IndRNN | Allows training of deeper networks by maintaining independence between time steps | Reduces risk of vanishing and exploding gradients | Very long sequences, such as in video processing or long text generation |
Dataset Name | Application | Description |
---|---|---|
Penn Treebank [105] | Natural language processing | A corpus of English sentences annotated for part-of-speech tagging, parsing, and named entity recognition; widely used for language modeling with RNNs |
IMDB Reviews [106] | Sentiment analysis | A dataset of movie reviews used for binary sentiment classification; suitable for studying the effectiveness of RNNs in text sentiment classification tasks |
MNIST Sequential [107] | Image recognition | A version of the MNIST dataset formatted as sequences for studying sequence-to-sequence learning with RNNs |
TIMIT Speech Corpus [108] | Speech recognition | An annotated speech database used for automatic speech recognition systems |
Reuters-21578 Text Categorization Collection [109] | Text categorization | A collection of newswire articles that is a common benchmark for text categorization and NLP tasks with RNNs |
UCI ML Repository: Time Series Data [110] | Time series analysis | Contains various time series datasets, including stock prices and weather data, ideal for forecasting with RNNs. |
CORe50 Dataset [111] | Object Recognition | Used for continuous object recognition, ideal for RNN models dealing with video input sequences where object persistence and temporal context are important |
Application Domain | Reference | Year | Methods and Application |
---|---|---|---|
Text generation | Souri et al. [112] | 2018 | RNNs for generating coherent and contextually relevant Arabic text |
Holtzman et al. [116] | 2019 | Controlled text generation using RNNs for style and content control | |
Hu et al. [115] | 2020 | VAEs combined with RNNs to enhance creativity in text generation | |
Gajendran et al. [114] | 2020 | Character-level text generation using BiLSTM for various tasks | |
Hussein and Savas [118] | 2024 | LSTM for text generation | |
Baskaran et al. [119] | 2024 | LSTM for text generation, achieving excellent performance | |
Islam [113] | 2019 | Sequence-to-sequence framework using LSTM for improved text generation quality | |
Yin et al. [117] | 2018 | Attention mechanisms with RNNs for improved text generation quality | |
Guo [121] | 2015 | Integration of reinforcement learning with RNNs for text generation | |
Keskar et al. [120] | 2019 | Conditional Transformer Language (CTRL) for generating text in various styles | |
Sentiment analysis | He and McAuley [128] | 2016 | Adversarial training framework for robustness in sentiment analysis |
Pujari et al. [125] | 2024 | Hybrid CNN-RNN model for sentiment classification | |
Wankhade et al. [126] | 2024 | Fusion of CNN and BiLSTM with attention mechanism for sentiment classification | |
Sangeetha and Kumaran [127] | 2023 | BiLSTM for sentiment analysis by processing text in both directions | |
Yadav et al. [122] | 2023 | LSTM-based models for sentiment analysis in customer reviews and social media posts | |
Zulqarnain et al. [124] | 2024 | Attention mechanisms and GRU for enhanced sentiment analysis | |
Samir et al. [129] | 2021 | Use of pre-trained models like BERT for sentiment analysis | |
Prottasha et al. [130] | 2022 | Transfer learning with BERT and GPT for sentiment analysis | |
Abimbola et al. [123] | 2024 | Hybrid LSTM-CNN model for document-level sentiment classification | |
Mujahid et al. [131] | 2023 | Analyzing sentiment with pre-trained models fine-tuned for specific tasks | |
Machine Translation | Sennrich et al. [133] | 2015 | Byte-Pair Encoding for handling rare words in translation models |
Wu et al. [132] | 2016 | Google Neural Machine Translation with deep RNNs for improved accuracy | |
Vaswani et al. [104] | 2017 | Fully attention-based transformer models for superior translation performance | |
Yang et al. [135] | 2017 | Hybrid model integrating RNNs into the transformer architecture | |
Song et al. [136] | 2019 | Incorporating BERT into translation models for enhanced understanding and fluency | |
Kang et al. [134] | 2023 | Bilingual attention-based machine translation model combining RNN with attention | |
Zulqarnain et al. [124] | 2024 | Multi-stage feature attention mechanism model using GRU |
Application Domain | Reference | Year | Methods and Application |
---|---|---|---|
Speech recognition | Hinton et al. [137] | 2012 | Deep neural networks, including RNNs, for speech-to-text systems |
Hannun et al. [138] | 2014 | DeepSpeech: LSTM-based speech recognition system | |
Amodei et al. [139] | 2016 | DeepSpeech2: Enhanced LSTM-based speech recognition with bidirectional RNNs | |
Zhang et al. [141] | 2017 | Convolutional RNN for robust speech recognition | |
Chiu et al. [140] | 2018 | RNN-transducer models for end-to-end speech recognition | |
Dong et al. [142] | 2018 | Speech-Transformer: Leveraging self-attention for better processing of audio sequences | |
Bhaskar and Thasleema [143] | 2023 | LSTM for visual speech recognition using facial expressions | |
Daouad et al. [144] | 2023 | Various RNN variants for automatic speech recognition | |
Nasr et al. [146] | 2023 | End-to-end speech recognition using RNNs | |
Kumar et al. [147] | 2023 | Performance evaluation of RNNs in speech recognition tasks | |
Dhanjal et al. [145] | 2024 | Comprehensive study of different RNN models for speech recognition | |
Time series forecasting | Nelson et al. [149] | 2017 | Hybrid CNN-RNN model for stock price prediction |
Bao et al. [151] | 2017 | Combining LSTM with stacked autoencoders for financial time series forecasting | |
Fischer and Krauss [148] | 2018 | Deep RNNs for predicting stock returns, outperforming traditional ML models | |
Feng et al. [152] | 2019 | Transfer learning with RNNs for stock prediction | |
Rundo [153] | 2019 | Combining reinforcement learning with LSTM for trading strategy development | |
Devi et al. [154] | 2024 | RNN-based model for weather prediction and capturing sequential dependencies in meteorological data | |
Anshuka et al. [155] | 2022 | LSTM networks for predicting extreme weather events by learning complex temporal patterns | |
Lin et al. [100] | 2022 | Integrating attention mechanisms with LSTM for enhanced weather forecasting accuracy | |
Marulanda et al. [156] | 2023 | LSTM model for short-term wind power forecasting and improving prediction accuracy | |
Chen et al. [157] | 2024 | Bidirectional GRU with TCNs for energy time series forecasting | |
Hasanat et al. [158] | 2024 | RNNs for forecasting energy demand in smart grids and optimizing renewable energy integration | |
Asiri et al. [159] | 2024 | Short-term renewable energy predictions using RNN-based models | |
Yildiz et al. [160] | 2024 | Hybrid model of LSTM with CNN for accurate electricity demand prediction | |
Luo et al. [150] | 2024 | Attention-based CNN-BiLSTM model for improved financial forecasting | |
Gao et al. [90] | 2023 | Dynamic ensemble deep ESN for wave height forecasting | |
Bhambu et al. [161] | 2024 | Recurrent ensemble deep random vector functional link neural network for financial time series forecasting |
Application Domain | Reference | Year | Methods and Application |
---|---|---|---|
Signal processing | Mastoi et al. [163] | 2019 | ESNs for real-time heart rate variability monitoring |
Valin et al. [164] | 2021 | ESNs for speech signal enhancement in noisy environments | |
Gao et al. [91] | 2021 | EWT integrated with ESNs for enhanced time series forecasting | |
Bioinformatics | Li et al. [165] | 2019 | RNNs for gene prediction and protein-structure prediction |
Zhang et al. [166] | 2020 | Bidirectional LSTM for predicting DNA-binding protein sequences | |
Xu et al. [167] | 2021 | RNN-based model for predicting protein secondary structures | |
Yadav et al. [168] | 2019 | Combining BiLSTM with CNNs for protein sequence analysis | |
Aybey et al. [169] | 2023 | Ensemble model for predicting protein–protein interactions | |
Autonomous vehicles | Altché and de La Fortelle [173] | 2017 | LSTM for predicting the future trajectories of vehicles |
Codevilla et al. [172] | 2018 | RNNs with imitation learning for autonomous driving | |
Li et al. [170] | 2020 | RNNs for path planning and object detection | |
Lee et al. [171] | 2020 | Integrating LSTM with CNN for end-to-end autonomous driving | |
Li et al. [174] | 2024 | Attention-based LSTM for video object tracking | |
Liu and Diao [176] | 2024 | GRU with deep reinforcement learning for decision-making | |
Anomaly detection | Zhou and Paffenroth [182] | 2017 | RNNs in unsupervised anomaly detection with deep autoencoders |
Munir et al. [184] | 2018 | Hybrid CNN-RNN model for anomaly detection in time series | |
Ren et al. [183] | 2019 | Attention-based RNN model for anomaly detection | |
Li et al. [180] | 2023 | RNNs with Transfer learning for anomaly detection in manufacturing | |
Mini et al. [181] | 2023 | RNNs for detecting anomalies in ECG signals | |
Matar et al. [178] | 2023 | BiLSTM for anomaly detection in multivariate time series | |
Kumaresan et al. [179] | 2024 | RNNs for detecting network traffic anomalies | |
Altindal et al. [177] | 2024 | LSTM for anomaly detection in time series data |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024, 15, 517. https://doi.org/10.3390/info15090517
Mienye ID, Swart TG, Obaido G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information. 2024; 15(9):517. https://doi.org/10.3390/info15090517
Chicago/Turabian StyleMienye, Ibomoiye Domor, Theo G. Swart, and George Obaido. 2024. "Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications" Information 15, no. 9: 517. https://doi.org/10.3390/info15090517
APA StyleMienye, I. D., Swart, T. G., & Obaido, G. (2024). Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information, 15(9), 517. https://doi.org/10.3390/info15090517