Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network
Abstract
:1. Introduction
- The system approach is different from previous systems; it recognizes dynamic gestures with complex backgrounds.
- Hands are detected from both images using two-way detection: first, the skin tone pixels are extracted, and then the saliency map is applied for greater precision.
- Features are collected using different algorithms, like fast marching, neural gas, and the 8-freeman chain model. All the features are extracted with modifications to the algorithms listed. The features are collected and fused to make a feature fusion for recognition.
- The proposed system uses a deep learning algorithm such as RNN to achieve higher accuracy.
2. Literature Review
2.1. Hand Gesture Recognition via RGB Sensors
2.2. Hand Gesture Recognition via Marker Sensors
3. Materials and Methods
3.1. System Methodology
3.2. Images Pre-Processing
3.3. Hand Detection
3.4. Hand Skeleton
3.5. Fusion Features Extraction
3.5.1. Joint Color Cloud
- (1)
- Suppose we are interested in the region of interest function value . This leads to two types of spatial derivative operators.
- (2)
- For the difference operator, a discrete function is used to calculate . For this purpose, at a specific point, the speed function is defined as follows:The above equation is interpreted as follows, where is the arrival time of .
- (3)
- For the neighbor pixel value calculation, only point included in the set point can be used. The value computation is defined as follows:
- (4)
- The quadratic equation is formulated for : which leads to the following:
3.5.2. Neural Gas
Algorithm 1: Pseudocode for Neural Gas Formation |
Input: Input space; |
Output: the map; |
← [] |
Method: |
← , where represents the first node and represents the second node |
← 0; |
← 100; |
Whereas, the input signal Φ is as follows: |
← [Φ] |
Calculate winning nodes nearest Φ |
Adjust |
Adjust |
← [] |
repeat |
until ← 100 |
end while |
return |
3.5.3. Directional Active Model
3.6. Feature Analysis and Optimization
3.7. Gesture Classification Using RNN
4. Experimental Setup and Evaluation
4.1. Dataset Descriptions
4.1.1. HaGRI Dataset
4.1.2. Egogesture Dataset
4.1.3. Jester Dataset
4.1.4. WLASL Dataset
4.2. Evaluation via Experimental Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Panwar, M.; Mehra, P.S. Hand gesture recognition for human computer interaction. In Proceedings of the IEEE 2011 International Conference on Image Information Processing, Shimla, India, 3–5 November 2011. [Google Scholar]
- Khan, R.Z.; Ibraheem, N.A. Hand gesture recognition: A literature review. Int. J. Artif. Intell. Appl. 2012, 3, 161. [Google Scholar] [CrossRef]
- Wu, C.H.; Lin, C.H. Depth-based hand gesture recognition for home appliance control. In Proceedings of the 2013 IEEE International Symposium on Consumer Electronics (ISCE), Hsinchu, Taiwan, 3–6 June 2013. [Google Scholar]
- Solanki, U.V.; Desai, N.H. Hand gesture based remote control for home appliances: Handmote. In Proceedings of the 2011 IEEE World Congress on Information and Communication Technologies, Mumbai, India, 11–14 December 2011. [Google Scholar]
- Hsieh, C.C.; Liou, D.H.; Lee, D. A real time hand gesture recognition system using motion history image. In Proceedings of the IEEE 2010 2nd International Conference on Signal Processing Systems, Dalian, China, 5–7 July 2010. [Google Scholar]
- Chung, H.Y.; Chung, Y.L.; Tsai, W.F. An efficient hand gesture recognition system based on deep CNN. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), Melbourne, VIC, Australia, 13–15 February 2019. [Google Scholar]
- Wang, M.; Yan, Z.; Wang, T.; Cai, P.; Gao, S.; Zeng, Y.; Wan, C.; Wang, H.; Pan, L.; Yu, J.; et al. Gesture recognition using a bioinspired learning architecture that integrates visual data with somatosensory data from stretchable sensors. Nat. Electron. 2020, 3, 563–570. [Google Scholar] [CrossRef]
- Moin, A.; Zhou, A.; Rahimi, A.; Menon, A.; Benatti, S.; Alexandrov, G.; Tamakloe, S.; Ting, J.; Yamamoto, N.; Khan, Y.; et al. A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nat. Electron. 2021, 4, 54–63. [Google Scholar] [CrossRef]
- Dang, L.M.; Min, K.; Wang, H.; Piran, M.J.; Lee, C.H.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
- Mujahid, A.; Awan, M.J.; Yasin, A.; Mohammed, M.A.; Damaševičius, R.; Maskeliūnas, R.; Abdulkareem, K.H. Real-time hand gesture recognition based on deep learning YOLOv3 model. Appl. Sci. 2021, 11, 4164. [Google Scholar] [CrossRef]
- Al-Hammadi, M.; Muhammad, G.; Abdul, W.; Alsulaiman, M.; Bencherif, M.A.; Alrayes, T.S.; Mathkour, H.; Mekhtiche, M.A. Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation. IEEE Access 2020, 8, 192527–192542. [Google Scholar] [CrossRef]
- Pinto, R.F.; Borges, C.D.; Almeida, A.M.; Paula, I.C. Static hand gesture recognition based on convolutional neural networks. J. Electr. Comput. Eng. 2019, 2019, 4167890. [Google Scholar] [CrossRef]
- Tolentino, L.K.S.; Juan, R.O.S.; Thio-ac, A.C.; Pamahoy, M.A.B.; Forteza, J.R.R.; Garcia, X.J.O. Static sign language recognition using deep learning. Int. J. Mach. Learn. Comput. 2019, 9, 821–827. [Google Scholar] [CrossRef]
- Beddiar, D.R.; Nini, B.; Sabokrou, M.; Hadid, A. Vision-based human activity recognition: A survey. Multimed. Tools Appl. 2020, 79, 30509–30555. [Google Scholar] [CrossRef]
- Skaria, S.; Al-Hourani, A.; Lech, M.; Evans, R.J. Hand-gesture recognition using two-antenna Doppler radar with deep convolutional neural networks. IEEE Sens. J. 2019, 19, 3041–3048. [Google Scholar] [CrossRef]
- Tabernik, D.; Skočaj, D. Deep learning for large-scale traffic-sign detection and recognition. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1427–1440. [Google Scholar] [CrossRef]
- Zheng, Y.; Lv, X.; Qian, L.; Liu, X. An Optimal BP Neural Network Track Prediction Method Based on a GA–ACO Hybrid Algorithm. J. Mar. Sci. Eng. 2022, 10, 1399. [Google Scholar] [CrossRef]
- Qian, L.; Zheng, Y.; Li, L.; Ma, Y.; Zhou, C.; Zhang, D. A New Method of Inland Water Ship Trajectory Prediction Based on Long Short-Term Memory Network Optimized by Genetic Algorithm. Appl. Sci. 2022, 12, 4073. [Google Scholar] [CrossRef]
- Dinh, D.L.; Kim, J.T.; Kim, T.S. Hand gesture recognition and interface via a depth imaging sensor for smart home appliances. Energy Procedia 2014, 62, 576–582. [Google Scholar] [CrossRef]
- Kim, M.; Cho, J.; Lee, S.; Jung, Y. IMU sensor-based hand gesture recognition for human-machine interfaces. Sensors 2019, 19, 3827. [Google Scholar] [CrossRef] [PubMed]
- Rautaray, S.S.; Agrawal, A. Vision based hand gesture recognition for human computer interaction: A survey. Artif. Intell. Rev. 2015, 43, 1–54. [Google Scholar] [CrossRef]
- Pisharady, P.K.; Saerbeck, M. Recent methods and databases in vision-based hand gesture recognition: A review. Comput. Vis. Image Underst. 2015, 141, 152–165. [Google Scholar] [CrossRef]
- Irie, K.; Wakamura, N.; Umeda, K. Construction of an intelligent room based on gesture recognition: Operation of electric appliances with hand gestures. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, 28 September–2 October 2004. [Google Scholar]
- Lone, M.R.; Khan, E. A good neighbor is a great blessing: Nearest neighbor filtering method to remove impulse noise. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 9942–9952. [Google Scholar] [CrossRef]
- Ren, Z.; Meng, J.; Yuan, J. Depth camera based hand gesture recognition and its applications in human-computer-interaction. In Proceedings of the 2011 8th International Conference on Information, Communications & Signal Processing, Singapore, 13–16 December 2011. [Google Scholar]
- Sahoo, J.P.; Prakash, A.J.; Pławiak, P.; Samantray, S. Real-time hand gesture recognition using fine-tuned convolutional neural network. Sensors 2022, 22, 706. [Google Scholar] [CrossRef] [PubMed]
- Ding, J.; Zheng, N.W. RGB-D Depth-sensor-based Hand Gesture Recognition Using Deep Learning of Depth Images with Shadow Effect Removal for Smart Gesture Communication. Sens. Mater. 2022, 34, 203–216. [Google Scholar] [CrossRef]
- Li, J.; Wei, L.; Wen, Y.; Liu, X.; Wang, H. An approach to continuous hand movement recognition using SEMG based on features fusion. Vis. Comput. 2023, 39, 2065–2079. [Google Scholar] [CrossRef]
- Alam, M.M.; Islam, M.T.; Rahman, S.M. A Unified Learning Approach for Hand Gesture Recognition and Fingertip Detection; UMBC Student Collection; University of Maryland: Baltimore, MD, USA, 2021. [Google Scholar]
- Ameur, S.; Khalifa, A.B.; Bouhlel, M.S. A novel hybrid bidirectional unidirectional LSTM network for dynamic hand gesture recognition with leap motion. Entertain. Comput. 2020, 35, 100373. [Google Scholar] [CrossRef]
- Zhang, X.; Yang, Z.; Chen, T.; Chen, D.; Huang, M.C. Cooperative sensing and wearable computing for sequential hand gesture recognition. IEEE Sens. J. 2019, 19, 5775–5783. [Google Scholar] [CrossRef]
- Hakim, N.L.; Shih, T.K.; Arachchi, S.P.K.; Aditya, W.; Chen, Y.-C.; Lin, C.-Y. Dynamic hand gesture recognition using 3DCNN and LSTM with FSM context-aware model. Sensors 2019, 19, 5429. [Google Scholar] [CrossRef]
- Dong, B.; Shi, Q.; Yang, Y.; Wen, F.; Zhang, Z.; Lee, C. Technology evolution from self-powered sensors to AIoT enabled smart homes. Nano Energy 2021, 79, 105414. [Google Scholar] [CrossRef]
- Muneeb, M.; Rustam, H.; Jalal, A. Automate appliances via gestures recognition for elderly living assistance. In Proceedings of the IEEE 2023 4th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 20–22 February 2023. [Google Scholar]
- Hung, C.H.; Bai, Y.W.; Wu, H.Y. Home appliance control by a hand gesture recognition belt in LED array lamp case. In Proceedings of the 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE), Osaka, Japan, 27–30 October 2015. [Google Scholar]
- Deng, Z.; Gao, Q.; Ju, Z.; Leng, Y. A Self-Distillation Multi-Feature Learning Method for Skeleton-Based Sign Language Recognition. Pattern Recognit. 2023, 144, 1–33. [Google Scholar]
- Rahimian, E.; Zabihi, S.; Asif, A.; Farina, D.; Atashzar, S.F.; Mohammadi, A. Hand gesture recognition using temporal convolutions and attention mechanism. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022. [Google Scholar]
- Zhang, X.; Huang, D.; Li, H.; Zhang, Y.; Xia, Y.; Liu, J. Self-training maximum classifier discrepancy for EEG emotion recog-nition. CAAI Trans. Intell. Technol. 2023. [Google Scholar] [CrossRef]
- Khandizod, A.G.; Deshmukh, R.R. Comparative analysis of image enhancement technique for hyperspectral palmprint images. Int. J. Comput. Appl. 2015, 121, 30–35. [Google Scholar]
- Soni, H.; Sankhe, D. Image restoration using adaptive median filtering. IEEE Int. Res. J. Eng. Technol. 2019, 6, 841–844. [Google Scholar]
- Balasamy, K.; Shamia, D. Feature extraction-based medical image watermarking using fuzzy-based median filter. IETE J. Res. 2023, 69, 83–91. [Google Scholar] [CrossRef]
- Veluchamy, M.; Subramani, B. Image contrast and color enhancement using adaptive gamma correction and histogram equalization. Optik 2019, 183, 329–337. [Google Scholar] [CrossRef]
- Veluchamy, M.; Subramani, B. Fuzzy dissimilarity color histogram equalization for contrast enhancement and color correction. Appl. Soft Comput. 2020, 89, 106077. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, W.; Li, X.; Rao, Q.; Jiang, T.; Han, M.; Fan, H.; Sun, J.; Liu, S. ADNet: Attention-guided deformable convolutional network for high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20 June–25 June 2021. [Google Scholar]
- Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Kashem, S.B.A.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.; Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput. Biol. Med. 2021, 132, 104319. [Google Scholar] [CrossRef]
- Ghose, D.; Desai, S.M.; Bhattacharya, S.; Chakraborty, D.; Fiterau, M.; Rahman, T. Pedestrian detection in thermal images using saliency maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Ye, D.; Chen, C.; Liu, C.; Wang, H.; Jiang, S. Detection defense against adversarial attacks with saliency map. Int. J. Intell. Syst. 2021, 37, 10193–10210. [Google Scholar] [CrossRef]
- Etmann, C.; Lunz, S.; Maass, P.; Schönlieb, C.B. On the connection between adversarial robustness and saliency map interpretability. arXiv 2019, arXiv:1905.04172. [Google Scholar]
- Zhao, Y.; Po, L.-M.; Cheung, K.-W.; Yu, W.-Y.; Rehman, Y.A.U. SCGAN: Saliency map-guided colorization with generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3062–3077. [Google Scholar] [CrossRef]
- Li, H.; Li, C.; Ding, Y. Fall detection based on fused saliency maps. Multimed. Tools Appl. 2020, 80, 1883–1900. [Google Scholar] [CrossRef]
- Rastgoo, R.; Kiani, K.; Escalera, S. Video-based isolated hand sign language recognition using a deep cascaded model. Multimed. Tools Appl. 2020, 79, 22965–22987. [Google Scholar] [CrossRef]
- Yang, L.; Qi, Z.; Liu, Z.; Liu, H.; Ling, M.; Shi, L.; Liu, X. An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Mach. Vis. Appl. 2019, 30, 1071–1082. [Google Scholar] [CrossRef]
- Gao, Q.; Liu, J.; Ju, Z. Robust real-time hand detection and localization for space human–robot interaction based on deep learning. Neurocomputing 2020, 390, 198–206. [Google Scholar] [CrossRef]
- Tang, J.; Yao, X.; Kang, X.; Nishide, S.; Ren, F. Position-free hand gesture recognition using single shot multibox detector based neural network. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019. [Google Scholar]
- Tang, H.; Liu, H.; Xiao, W.; Sebe, N. Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing 2018, 331, 424–433. [Google Scholar] [CrossRef]
- Tan, G.; Zou, J.; Zhuang, J.; Wan, L.; Sun, H.; Sun, Z. Fast marching square method based intelligent navigation of the unmanned surface vehicle swarm in restricted waters. Appl. Ocean Res. 2020, 95, 102018. [Google Scholar] [CrossRef]
- Xia, J.; Jiang, Z.; Zhang, H.; Zhu, R.; Tian, H. Dual fast marching tree algorithm for human-like motion planning of anthropomorphic arms with task constraints. IEEE/ASME Trans. Mechatron. 2020, 26, 2803–2813. [Google Scholar] [CrossRef]
- Muñoz, J.; López, B.; Quevedo, F.; Barber, R.; Garrido, S.; Moreno, L. Geometrically constrained path planning for robotic grasping with Differential Evolution and Fast Marching Square. Robotica 2023, 41, 414–432. [Google Scholar] [CrossRef]
- Liu, Y.; Nedo, A.; Seward, K.; Caplan, J.; Kambhamettu, C. Quantifying actin filaments in microscopic images using keypoint detection techniques and a fast marching algorithm. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
- Gadekallu, T.R.; Alazab, M.; Kaluri, R.; Maddikunta, P.K.; Bhattacharya, S.; Lakshmanna, K. Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell. Syst. 2021, 7, 1855–1868. [Google Scholar] [CrossRef]
- Qi, J.; Jiang, G.; Li, G.; Sun, Y.; Tao, B. Surface EMG hand gesture recognition system based on PCA and GRNN. Neural Comput. Appl. 2020, 32, 6343–6351. [Google Scholar] [CrossRef]
- Todorov, H.; Cannoodt, R.; Saelens, W.; Saeys, Y. TinGa: Fast and flexible trajectory inference with Growing Neural Gas. Bioinformatics 2020, 36, i66–i74. [Google Scholar] [CrossRef]
- Hahn, C.; Feld, S.; Zierl, M.; Linnhoff-Popien, C. Dynamic Path Planning with Stable Growing Neural Gas. InICAART 2019, 138–145. [Google Scholar] [CrossRef]
- Mirehi, N.; Tahmasbi, M.; Targhi, A.T. Hand gesture recognition using topological features. Multimed. Tools Appl. 2019, 78, 13361–13386. [Google Scholar] [CrossRef]
- Ansar, H.; Jalal, A.; Gochoo, M.; Kim, K. Hand gesture recognition based on auto-landmark localization and reweighted genetic algorithm for healthcare muscle activities. Sustainability 2021, 13, 2961. [Google Scholar] [CrossRef]
- Zaaraoui, H.; El Kaddouhi, S.; Abarkan, M. A novel approach to face recognition using freeman chain code and nearest neighbor classifier. In Proceedings of the 2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS), Taza, Morocco, 26–27 December 2019. [Google Scholar]
- Jalal, A.; Khalid, N.; Kim, K. Automatic recognition of human interaction via hybrid descriptors and maximum entropy markov model using depth sensors. Entropy 2020, 22, 817. [Google Scholar] [CrossRef] [PubMed]
- Zahra, A.K.A.; Abdalla, T.Y. Design of fuzzy super twisting sliding mode control scheme for unknown full vehicle active suspension systems using an artificial bee colony optimization algorithm. Asian J. Control 2020, 23, 1966–1981. [Google Scholar] [CrossRef]
- Dhruv, P.; Naskar, S. Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): A review. In Machine Learning and Information Processing: Proceedings of ICMLIP; Springer: Singapore, 2019. [Google Scholar]
- Kapitanov, A.; Makhlyarchuk, A.; Kvanchiani, K. HaGRID-HAnd Gesture Recognition Image Dataset. arXiv 2022, arXiv:2206.08219. [Google Scholar]
- Chalasani, T.; Smolic, A. Simultaneous segmentation and recognition: Towards more accurate ego gesture recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Materzynska, J.; Berger, G.; Bax, I.; Memisevic, R. The jester dataset: A large-scale video dataset of human gestures. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Naz, N.; Sajid, H.; Ali, S.; Hasan, O.; Ehsan, M.K. Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition. IEEE Access 2023, 11, 19135–19147. [Google Scholar] [CrossRef]
- Molchanov, P.; Yang, X.; Gupta, S.; Kim, K.; Tyree, S.; Kautz, J. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Cutura, R.; Morariu, C.; Cheng, Z.; Wang, Y.; Weiskopf, D.; Sedlmair, M. Hagrid—Gridify scatterplots with hilbert and gosper curves. In Proceedings of the 14th International Symposium on Visual Information Communication and Interaction, Potsdam, Germany, 6–7 September 2021. [Google Scholar]
- Padhi, P.; Das, M. Hand Gesture Recognition using DenseNet201-Mediapipe Hybrid Modelling. In Proceedings of the 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 13–15 December 2022. [Google Scholar]
- Li, S.; Aich, A.; Zhu, S.; Asif, S.; Song, C.; Roy-Chowdhury, A. Krishnamurthy. Adversarial attacks on black box video classifiers: Leveraging the power of geometric transformations. Adv. Neural Inf. Process. Syst. 2021, 34, 2085–2096. [Google Scholar]
- Zhao, B.; Hua, X.; Yu, K.; Xuan, W.; Chen, X.; Tao, W. Indoor Point Cloud Segmentation Using Iterative Gaussian Mapping and Improved Model Fitting. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7890–7907. [Google Scholar] [CrossRef]
Authors | Methodology |
---|---|
S. Nagarajan et al. [19] | The proposed system captures the American sign language and filters the images using Canny edge detection. An Edge Orientation Histogram (EOH) for feature extraction was used, and these feature sets were classified by a multiclass SVM classifier; however, some signs were not detected due to hand orientation and gesture similarity. |
Mandeep et al. [20] | The hand gesture system used the skin color model and thresholding; the YCbCr segmented the hand region, skin color segmentation was used to extract the skin pixels, and Otsu thresholding removed the image’s background. In the last PCA, the template-matching method was used to recognize a total of twenty images per gesture from five different poses from four gesture captures. On the other hand, this system has some limitations in that skin color varies due to light colors, and the background contains skin color pixels. |
Thanh et al. [21] | Multimodal streams are used to increase the performance of hand recognition by combining depth, RGB, and optical flow. A deep learning model is used for feature extraction from each stream; afterward, these features are combined with different fusion methods for the final classification. This system outperforms the results with multi-modal streams of different viewpoints collected from twelve gestures. |
Noorkholis et al. [22] | In dynamic hand gesture recognition, the dataset of RGB and depth images is preprocessed from the Region of Interest (ROI) to extract the original pixel value of the hand instead of other unnecessary points. To extract the feature set, a three-dimensional convolutional neural network (3DNN) and long short-term memory (LSTM) combination of deep learning is used to extract the spatio-temporal features that are further classified by finite state machine (FSM) model classification to solve the problem of different gestures used in different applications for ease. This proposed system is designed for a smart TV environment, and for this purpose, eight gestures perform robustly in real-time testing out of 24 gestures. |
K. Irie et al. [23] | In this paper, the hand gesture is detected by the emotion of the hand in front of the camera. The hand motion is detected to control the electronic appliances in intelligent rooms with complete control of hand gestures. The cameras have the ability to zoom in and focus on the user to detect the hand gesture. The hand is detected via color information and motion direction using fingers. |
Chen-Chiung Hsieh et al. [24] | This research was conducted to reduce issues like hand gesture detection from complex backgrounds and light intensity issues. The hand gesture was detected with the help of the body skin detection method. The gestures were classified with the help of a new hand gesture recognition model called the motion history image-based method. A total of six hand gestures at different distances from the camera were used as the dataset. The images were trained using a haar-like structure with up, down, right, and left movements. The home automation-based system generated 94.1% accuracy using the proposed method. |
Zhou Ren et al. [25] | A new study was conducted on hand gesture recognition using the finger earth mover distance (FEMD) approach. They noticed the speed and accuracy of the FEMD, shape context, and shape-matching algorithm. The dataset was collected from the Kinect camera, so it contained both depth and RGB images. |
Jaya Prakash Saho [26] | Currently, convolutional neural networks (CNNs) exhibit good recognition rates for image classification problems. It is difficult to train deep CNN networks such as AlexNet, VGG-16, and ResNet from scratch due to the lack of big, labelled picture examples in static hand gesture images. To recognize hand gestures in a dataset with a low number of gesture images, they used an end-to-end fine-tuning strategy for a pre-trained CNN model with score-level fusion. They used two benchmark datasets, and the efficacy of the proposed approach was assessed using leave-one-subject-out cross-validation (LOO CV) and conventional CV tests. They proposed a real-time American Sign Language (ASL) recognition system and also evaluated it. |
Ing-Jr Ding [27] | In the proposed system, the suggested method consists of two sequential computation steps: phase 1 and phase 2. The deep learning model, a visual geometry group (VGG)-type convolutional neural network (CNN), also known as the VGG-CNN, is used to assess the recognition rate. The experiments proved that image extraction efficiently eliminates the undesirable shadow region in hand gesture depth pictures and greatly improves the identification accuracy. |
Jun Li [28] | They proposed MFFCNN-LSTM for forearm sEMG signal recognition using time-domain and time-frequency spectrum features. They first extracted hand movements from the NinaPro db8 dataset, and then images were denoised via empirical Fourier decomposition. The images were passed through the different channels using CNN to collect the time-domain and time-frequency-spectrum features. The features were fused and passed to the LSTM. They achieved 98.5% accuracy with the proposed system. |
Authors | Methodology |
---|---|
Safa et al. [29] | Currently, the hand gesture system deploys many recognition systems with sensors to locate the correct motion and gesture of the hand without any distortion. Combining machine learning and sensors increases the potential in the field of digital entertainment by using touchless and touch-dynamic hand motion. In a recent study, a leap motion device was used to detect the dynamic motion of the hand without touching it, analyse the sequential time series data using long short-term memory (LSTM) for recognition, and separate unidirectional and bidirectional LSTM. The novel model, named Hybrid Bidirectional Unidirectional LSTM (HBU-LSTM), improves performance by considering spatial and temporal features between leap motion data and neural network layers. |
Xiaoliang et al. [30] | The hand gesture system, with a novel approach, combines a wearable armband and customized pressure sensor smart gloves for sequential hand motion. The data collected from the inertial measurement unit (IMU), fingers, palm pressure, and electromyography was computed using deep learning. Long and short-term memory models (LSTM) for testing and training were applied. The experimental work showed outstanding results with dynamic and air gestures collected from ten different participants. |
Muhammad et al. [31] | In a smart home, the automatic system developed for the elder’s care deployed a home automation system with the gesture to control the appliances of daily use by using embedded hand gloves to detect the motion of the hand. For hand movements, wearable sensors such as an accelerometer and gyroscope were used to collect the combined feature set, and a random forest classifier was used to recognize the nine different gestures. |
Dong-Luong-Dinh et al. [32] | In hand gesture recognition for home appliances, a novel approach towards detection is provided in this paper. They controlled home appliances using hand gestures by detecting hands and generating control commands. They created a database for hand gestures via labelling part maps and then classifying them using random forests. They generated a system for TV, lights, doors, changing channels, fans, temperature, and volume using hand gestures. |
Muhammad Muneeb et al. [33] | Smart homes for the elderly and disabled people need special attention, as awareness of geriatric problems is necessary to resolve these issues. Researchers have developed many gesture recognition systems in various domains, but the authors of this paper presented a way to deal with elderly issues in particular. They used gloves to record the movements of the rotation, tilting of the hand, and acceleration. The nine gestures were classified using random forest, attaining an accuracy of 94% over the benchmark dataset. |
Chi-Huang Hung et al. [34] | They proposed a system for an array lamp that performed ON/OFF actions and dimmed the light. They used a gyroscope and an accelerometer for hand detection. The noise was removed using a Kalman filter, and signals were decoded after receiving them from the devices to convert them into the desired gestures. |
Marvin S. Verdadero et al. [35] | Remote control devices are common, but the setup is very expensive. The static hand gestures are taken from an Android mobile, and the signals are passed to the electronic devices. The distance should be 6 m from the device to pass the signals accurately for gesture recognition. |
Zhiwen Deng [36] | Sign language recognition (SLR) is an efficient way to bridge communication gaps. SLR can additionally be used for human–computer interaction (HCI), virtual reality (VR), and augmented reality (AR). To enhance the research study, they proposed a skeleton-based self-distillation multi-feature learning method (SML). They constructed a multi-feature aggregation module (MFA) for the fusion of the features. For feature extraction and recognition, a self-distillation-guided adaptive residual graph convolutional network (SGA-ResGCN) was used. They tested the system on two benchmark datasets, WLASL and AUTSL, attaining accuracies of 55.85% and 96.85%, respectively. |
Elahe Rahimian [37] | For the reduction in computation costs in complex architectures while training larger datasets, they proposed a temporal convolution-based hand gesture recognition system (TC-HGR). The 17 gestures were trained using attention mechanisms and temporal convolutions. They attained 81.65% and 80.72% classification accuracy for window sizes of 300 ms and 200 ms, respectively. |
Gesture Classes | Call | Dislike | Like | Mute | Ok | Stop | Two Up |
---|---|---|---|---|---|---|---|
call | 0.93 | 0 | 0 | 0.03 | 0 | 0 | 0.04 |
dislike | 0 | 0.92 | 0 | 0 | 0.05 | 0 | 0.03 |
like | 0.05 | 0 | 0.95 | 0 | 0 | 0 | 0 |
mute | 0 | 0.04 | 0 | 0.94 | 0 | 0.02 | 0 |
ok | 0 | 0 | 0.07 | 0 | 0.93 | 0 | 0 |
stop | 0 | 0.05 | 0 | 0.05 | 0 | 0.90 | 0 |
two up | 0 | 0 | 0.04 | 0 | 0 | 0.05 | 0.91 |
Mean Accuracy = 92.57% |
Gesture Classes | Scroll Hand towards Right | Scroll Hand Downward | Scroll Hand Backward | Zoom in with Fists | Zoom Out with Fists | Rotate Finger Clockwise | Zoom in with Fingers |
---|---|---|---|---|---|---|---|
scroll hand towards the right | 0.90 | 0 | 0 | 0.03 | 0 | 0.07 | 0 |
scroll hand downward | 0 | 0.93 | 0.07 | 0 | 0 | 0 | 0 |
scroll hand backward | 0 | 0 | 0.92 | 0 | 0.05 | 0 | 0.03 |
zoom in with fists | 0 | 0.03 | 0 | 0.93 | 0 | 0.04 | 0 |
zoom out with fists | 0.04 | 0 | 0 | 0 | 0.94 | 0 | 0.02 |
rotate finger clockwise | 0 | 0.07 | 0 | 0 | 0.02 | 0.91 | 0 |
zoom in with fingers | 0.04 | 0 | 0 | 0.06 | 0 | 0 | 0.90 |
Mean Accuracy = 91.86% |
Gesture Classes | Sliding Two Fingers Down | Stop Sign | Swiping Left | Swiping Right | Turning Hand Clockwise | Turning Hand Counterclockwise | Zoom in with Two Fingers |
---|---|---|---|---|---|---|---|
Sliding two fingers down | 0.91 | 0 | 0 | 0 | 0 | 0.09 | 0 |
stop sign | 0 | 0.92 | 0 | 0.05 | 0 | 0 | 0.03 |
swiping left | 0.01 | 0 | 0.93 | 0 | 0.06 | 0 | 0 |
swiping right | 0.06 | 0 | 0 | 0.92 | 0 | 0.02 | 0 |
turning hand clockwise | 0 | 0.04 | 0 | 0 | 0.92 | 0 | 0.04 |
turning hand counterclockwise | 0 | 0 | 0.08 | 0 | 0 | 0.92 | 0 |
zoom in with two fingers | 0.06 | 0 | 0 | 0 | 0.05 | 0 | 0.89 |
Mean Accuracy = 91.57% |
Gesture Classes | Hungry | Wish | Scream | Forgive | Attention | Appreciate | Abuse |
---|---|---|---|---|---|---|---|
hungry | 0.91 | 0 | 0 | 0.08 | 0 | 0 | 0.01 |
wish | 0 | 0.90 | 0.09 | 0 | 0.01 | 0 | 0 |
scream | 0.02 | 0.06 | 0.92 | 0 | 0 | 0 | 0 |
forgive | 0 | 0 | 0 | 0.90 | 0.07 | 0.03 | 0 |
attention | 0 | 0 | 0.04 | 0 | 0.89 | 0 | 0.07 |
appreciate | 0 | 0.09 | 0 | 0.01 | 0 | 0.90 | 0 |
abuse | 0.01 | 0 | 0 | 0 | 0 | 0.08 | 0.91 |
Mean Accuracy = 90.43% |
Gesture Classes | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
call | 0.98 | 0.93 | 0.95 | 0.94 |
dislike | 0.97 | 0.92 | 0.91 | 0.92 |
like | 0.97 | 0.95 | 0.90 | 0.92 |
mute | 0.98 | 0.94 | 0.92 | 0.93 |
ok | 0.98 | 0.93 | 0.95 | 0.94 |
stop | 0.97 | 0.90 | 0.93 | 0.91 |
two up | 0.97 | 0.91 | 0.93 | 0.92 |
Gesture Classes | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
scroll hand towards the right | 0.97 | 0.90 | 0.92 | 0.91 |
scroll hand downward | 0.97 | 0.93 | 0.90 | 0.92 |
scroll hand backward | 0.98 | 0.92 | 0.93 | 0.92 |
zoom in with fists | 0.98 | 0.93 | 0.91 | 0.94 |
zoom out with fists | 0.98 | 0.94 | 0.93 | 0.94 |
rotate finger clockwise | 0.97 | 0.91 | 0.89 | 0.90 |
zoom in with fingers | 0.98 | 0.90 | 0.95 | 0.92 |
Gesture Classes | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Sliding two fingers down | 0.96 | 0.91 | 0.88 | 0.89 |
stop sign | 0.98 | 0.92 | 0.96 | 0.94 |
swiping left | 0.97 | 0.93 | 0.92 | 0.93 |
swiping right | 0.98 | 0.92 | 0.95 | 0.93 |
turning hand clockwise | 0.97 | 0.92 | 0.89 | 0.91 |
turning hand counterclockwise | 0.97 | 0.92 | 0.89 | 0.91 |
zoom in with two fingers | 0.97 | 0.89 | 0.93 | 0.91 |
Gesture Classes | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
hungry | 0.98 | 0.91 | 0.97 | 0.94 |
wish | 0.96 | 0.90 | 0.86 | 0.88 |
scream | 0.97 | 0.92 | 0.88 | 0.90 |
forgive | 0.97 | 0.90 | 0.91 | 0.90 |
attention | 0.97 | 0.89 | 0.92 | 0.90 |
appreciate | 0.97 | 0.90 | 0.89 | 0.90 |
abuse | 0.97 | 0.91 | 0.92 | 0.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alabdullah, B.I.; Ansar, H.; Mudawi, N.A.; Alazeb, A.; Alshahrani, A.; Alotaibi, S.S.; Jalal, A. Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network. Sensors 2023, 23, 7523. https://doi.org/10.3390/s23177523
Alabdullah BI, Ansar H, Mudawi NA, Alazeb A, Alshahrani A, Alotaibi SS, Jalal A. Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network. Sensors. 2023; 23(17):7523. https://doi.org/10.3390/s23177523
Chicago/Turabian StyleAlabdullah, Bayan Ibrahimm, Hira Ansar, Naif Al Mudawi, Abdulwahab Alazeb, Abdullah Alshahrani, Saud S. Alotaibi, and Ahmad Jalal. 2023. "Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network" Sensors 23, no. 17: 7523. https://doi.org/10.3390/s23177523
APA StyleAlabdullah, B. I., Ansar, H., Mudawi, N. A., Alazeb, A., Alshahrani, A., Alotaibi, S. S., & Jalal, A. (2023). Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network. Sensors, 23(17), 7523. https://doi.org/10.3390/s23177523