Sensing-HH: A Deep Hybrid Attention Model for Footwear Recognition
Abstract
:1. Introduction
- (1)
- To the best of our knowledge, we are the first to recognize the subject’s footwear by the dynamics of gait changes acquired from smartphone sensors in daily life. We categorize the shoes into 3 classes by the height of the heels (flat, mid HH and ultra HH). We propose Sensing-HH, a novel deep attention model, which can automatically learn a hierarchical feature representation and the infinite temporal contexts from raw signals through the hybrid net structures. It also has the ability to implicitly learn to suppress irrelevant parts in the raw signals and to highlight salient features useful for this specific task by adding the attention mechanism.
- (2)
- We established a dataset with 35 young females wearing 3 kinds of shoes. All of them were asked to walk for 4 min on a flat surface, with 3 smartphones as recording devices, which at the same time were held by their hands, attached to their waists, and placed in their handbags, respectively.
- (3)
- We conducted comprehensive experiments on this dataset to evaluate the proposed Sensing-HH model. The results showed that our model achieved competitive performance with a mean F1-score ( ) of 0.827 when the smartphone was attached to the waist, from different classes, through cross verification. Meanwhile, the Score of the Ultra HH was as high as 0.91.
2. Related Work
2.1. General Footwear Related Gait Analysis Using Motion Sensors
2.2. Previous Deep Learning Approaches for Motion Sensors-Based Recognition
2.3. Attention Mechanism for Sensor Data Processing
3. Dataset
3.1. Participants Selection
3.2. Data Collection
4. Methodology
4.1. Notation and Definitions
4.1.1. Sequence
4.1.2. Sub-Sequence
4.1.3. Instance
4.2. Sensing-HH: A Deep Attention Model
4.2.1. Data Preprocessing Module
- Step 1: Resampling and Interpolating
- Step 2: Gravity Filtering
- Step 3: Normalization
4.2.2. Deep Hybrid Connection Network Module
4.2.3. Attention Network Module
4.2.4. Fusion Module
5. Experiments
5.1. Experimental Settings
5.1.1. Baselines
5.1.2. Setup
5.1.3. Cross-Validation Strategies
5.1.4. Evaluation Criteria
5.2. Experimental Results and Analysis
5.2.1. Comparsion with Baselines
5.2.2. Ablation Study
- (1)
- HHw/oATT: The Sensing-HH model without the attention component.
- (2)
- HHw/oLSTM: The Sensing-HH model without the BiLSTM component.
- (1)
- The best recognition performance was obtained with the smartphone attached to the waist. The Sensing-HH significantly outperformed other deep models in this scene. But the differences amongst all the deep models in other scenes, i.e., held by the hand or in the bag, were not significant.
- (2)
- Removing the attention component (in HHw/oATT) from the Sensing-HH caused the most significant performance drop in the waist scene, which dropped nearly 14.6%. This suggests the importance of the attention component in this mode.
- (3)
- Removing the BiLSTM component (in HHw/oLSTM) from the Sensing-HH caused a performance drop of nearly 4–6% in most of the scenes.
5.2.3. Failure Cases
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Guan, Y.; Plötz, T. Ensembles of deep lstm learners for activity recognition using wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef] [Green Version]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional neural networks for human activity recognition using mobile sensors. In Proceedings of the 6th International Conference on Mobile Computing, Applications and Services, Austin, TX, USA, 6–7 November 2014; pp. 197–205. [Google Scholar]
- Jiang, W.; Yin, Z. Human activity recognition using wearable sensors by deep convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 13 October 2015; ACM: New York, NY, USA, 2015; pp. 1307–1310. [Google Scholar]
- Ha, S.; Choi, S. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 381–388. [Google Scholar]
- Yang, J.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Zhu, G.; Zhang, L.; Shen, P.; Song, J. Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 2017, 5, 4517–4524. [Google Scholar] [CrossRef]
- Um, T.T.; Pfister, F.M.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 216–220. [Google Scholar]
- Yao, Y.; Song, L.; Ye, J. Motion-To-BMI: Using Motion Sensors to Predict the Body Mass Index of Smartphone Users. Sensors 2020, 20, 1134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nickel, C.; Brandt, H.; Busch, C. Classification of acceleration data for biometric gait recognition on mobile devices. In Proceedings of the BIOSIG 2011 Biometrics Special Interest Group, Darmstadt, Germany, 8–9 September 2011. [Google Scholar]
- Zhao, Y.; Zhou, S. Wearable device-based gait recognition using angle embedded gait dynamic images and a convolutional neural network. Sensors 2017, 17, 478. [Google Scholar] [CrossRef] [PubMed]
- Gadaleta, M.; Rossi, M. Idnet: Smartphone-based gait recognition with convolutional neural networks. Patt. Recognit. 2018, 74, 25–37. [Google Scholar] [CrossRef] [Green Version]
- Zou, Q.; Wang, Y.; Wang, Q.; Zhao, Y.; Li, Q. Deep Learning-Based Gait Recognition Using Smartphones in the Wild. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3197–3212. [Google Scholar] [CrossRef] [Green Version]
- Gafurov, D.; Snekkenes, E. Towards understanding the uniqueness of gait biometric. In Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, The Netherlands, 17–19 September 2008; pp. 1–8. [Google Scholar]
- Gafurov, D.; Snekkenes, E.; Bours, P. Improved gait recognition performance using cycle matching. In Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, Perth, Australia, 20–13 April 2010; pp. 836–841. [Google Scholar]
- Marsico, M.D.; Mecca, A. A survey on gait recognition via wearable sensors. ACM Comput. Surv. (CSUR) 2019, 52, 1–39. [Google Scholar] [CrossRef] [Green Version]
- Cronin, N.J.; Barrett, R.S.; Carty, C.P. Long-term use of high-heeled shoes alters the neuromechanics of human walking. J. Appl. Physiol. 2012, 112, 1054–1058. [Google Scholar] [CrossRef] [Green Version]
- Sarkar, S.; Phillips, P.J.; Liu, Z.; Vega, I.R.; Grother, P.; Bowyer, K.W. The humanid gait challenge problem: Data sets, performance, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 162–177. [Google Scholar] [CrossRef]
- Kim, M.; Kim, M.; Park, S.; Kwon, J.; Park, J. Feasibility study of gait recognition using points in three-dimensional space. Int. J. Fuzzy Log. Intell. Syst. 2013, 13, 124–132. [Google Scholar] [CrossRef]
- Marcin, D. Human gait recognition based on ground reaction forces in case of sport shoes and high heels. In Proceedings of the 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Gdynia, Poland, 3–5 July 2017; pp. 247–252. [Google Scholar]
- Derlatka, M.; Bogdan, M. Recognition of a Person Wearing Sport Shoes or High Heels through Gait Using Two Types of Sensors. Sensors 2018, 18, 1639. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Frey, C.; Thompson, F.; Smith, J.; Sanders, M.; Horstman, H. American Orthopaedic Foot and Ankle Society women’s shoe survey. Foot Ankle 1993, 14, 78–81. [Google Scholar] [CrossRef]
- Bevilacqua, A.; MacDonald, K.; Rangarej, A.; Widjaya, V.; Caulfield, B.; Kechadi, T. Human Activity Recognition with Convolutional Neural Networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Dublin, Ireland, 10–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 541–552. [Google Scholar]
- Sabir, A.T.; Maghdid, H.S.; Asaad, S.M.; Ahmed, M.H.; Asaad, A.T. Gait-based Gender Classification Using Smartphone Accelerometer Sensor. In Proceedings of the 2019 5th International Conference on Frontiers of Signal Processing (ICFSP), Marseille, France, 18–20 September 2019; pp. 12–20. [Google Scholar]
- Steven Eyobu, O.; Han, D. Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network. Sensors 2018, 18, 2892. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sprager, S.; Juric, M.B. Inertial sensor-based gait recognition: A review. Sensors 2015, 15, 22089–22127. [Google Scholar] [CrossRef]
- Yun, X.; Bachmann, E.R. Design, implementation, and experimental results of a quaternion-based Kalman filter for human body motion tracking. IEEE Trans. Robot. 2006, 22, 1216–1227. [Google Scholar] [CrossRef] [Green Version]
- Liu, T.; Inoue, Y.; Shibata, K. Development of a wearable sensor system for quantitative gait analysis. Measurement 2009, 42, 978–988. [Google Scholar] [CrossRef]
- Renaudin, V.; Susi, M.; Lachapelle, G. Step length estimation using handheld inertial sensors. Sensors 2012, 12, 8507–8525. [Google Scholar] [CrossRef]
- Schepers, H.M.; Koopman, H.F.; Veltink, P.H. Ambulatory assessment of ankle and foot dynamics. IEEE Trans. Biomed. Eng. 2007, 54, 895–902. [Google Scholar] [CrossRef] [Green Version]
- Sabatini, A.M.; Martelloni, C.; Scapellato, S.; Cavallo, F. Assessment of walking features from foot inertial sensing. IEEE Trans. Biomed. Eng. 2005, 52, 486–494. [Google Scholar] [CrossRef] [Green Version]
- Favre, J.; Aissaoui, R.; Jolles, B.M.; de Guise, J.A.; Aminian, K. Functional calibration procedure for 3D knee joint angle description using inertial sensors. J. Biomech. 2009, 42, 2330–2335. [Google Scholar] [CrossRef] [PubMed]
- Seel, T.; Raisch, J.; Schauer, T. IMU-based joint angle measurement for gait analysis. Sensors 2014, 14, 6891–6909. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rucco, R.; Sorriso, A.; Liparoti, M.; Ferraioli, G.; Sorrentino, P.; Ambrosanio, M.; Baselice, F. Type and location of wearable sensors for monitoring falls during static and dynamic tasks in healthy elderly: A review. Sensors 2018, 18, 1613. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1995; Volume 3361. [Google Scholar]
- Abdel-Hamid, O.; Mohamed, A.r.; Jiang, H.; Penn, G. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 4277–4280. [Google Scholar]
- Abdel-Hamid, O.; Mohamed, A.r.; Jiang, H.; Deng, L.; Penn, G.; Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1533–1545. [Google Scholar] [CrossRef] [Green Version]
- Liu, J.; Pan, Y.; Li, M.; Chen, Z.; Tang, L.; Lu, C.; Wang, J. Applications of deep learning to MRI images: A survey. Big Data Min. Anal. 2018, 1, 1–18. [Google Scholar]
- Kong, Y.; Gao, J.; Xu, Y.; Pan, Y.; Wang, J.; Liu, J. Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier. Neurocomputing 2019, 324, 63–68. [Google Scholar] [CrossRef]
- Zeng, M.; Li, M.; Fei, Z.; Yu, Y.; Pan, Y.; Wang, J. Automatic ICD-9 coding via deep transfer learning. Neurocomputing 2019, 324, 43–50. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Advances in neural information processing systems. arXiv 2014, arXiv:1411.1792. [Google Scholar]
- Lee, J.B.; Rossi, R.A.; Kim, S.; Ahmed, N.K.; Koh, E. Attention models in graphs: A survey. ACM Trans. Knowl. Discov. Data (TKDD) 2019, 13, 1–25. [Google Scholar] [CrossRef] [Green Version]
- Ba, J.; Mnih, V.; Kavukcuoglu, K. Multiple object recognition with visual attention. arXiv 2014, arXiv:1412.7755. [Google Scholar]
- Peng, Y.; He, X.; Zhao, J. Object-part attention model for fine-grained image classification. IEEE Trans. Image Process. 2017, 27, 1487–1500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 7–11 September 2015; pp. 1412–1421. [Google Scholar]
- Yin, W.; Schütze, H.; Xiang, B.; Zhou, B. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 2016, 4, 259–272. [Google Scholar] [CrossRef]
- Zeyer, A.; Irie, K.; Schlüter, R.; Ney, H. Improved training of end-to-end attention models for speech recognition. arXiv 2018, arXiv:1805.03294. [Google Scholar]
- Zhang, D.; Yao, L.; Chen, K.; Wang, S. Ready for Use: Subject-Independent Movement Intention Recognition via a Convolutional Attention Model. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 1763–1766. [Google Scholar]
- Zhang, X.; Yao, L.; Huang, C.; Wang, S.; Tan, M.; Long, G.; Wang, C. Multi-modality sensor data classification with selective attention. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3111–3117. [Google Scholar]
- Zeng, M.; Gao, H.; Yu, T.; Mengshoel, O.J.; Langseth, H.; Lane, I.; Liu, X. Understanding and improving recurrent networks for human activity recognition by continuous attention. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 56–63. [Google Scholar]
- Wang, K.; He, J.; Zhang, L. Attention-Based Convolutional Neural Network for Weakly Labeled Human Activities’ Recognition With Wearable Sensors. IEEE Sens. J. 2019, 19, 7598–7604. [Google Scholar] [CrossRef] [Green Version]
- Chiewchanwattana, S.; Lursinsap, C. FI-GEM networks for incomplete time-series prediction. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), Honolulu, HI, USA, 12–17 May 2002; Volume 2, pp. 1757–1762. [Google Scholar]
- Lu, Q.; Pang, L.; Huang, H.; Shen, C.; Cao, H.; Shi, Y.; Liu, J. High-G calibration denoising method for high-G MEMS accelerometer based on EMD and wavelet threshold. Micromachines 2019, 10, 134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Casale, P.; Pujol, O.; Radeva, P. Human activity recognition from accelerometer data using a wearable device. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Las Palmas de Gran Canaria, Spain, 8–10 June 2011; pp. 289–296. [Google Scholar]
- Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In International Workshop on Ambient Assisted Living; Springer: Berlin/Heidelberg, Germany, 2012; pp. 216–223. [Google Scholar]
- Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1533–1540. [Google Scholar]
- Edel, M.; Köppe, E. Binarized-blstm-rnn based human activity recognition. In Proceedings of the 2016 International conference on indoor positioning and indoor navigation (IPIN), Sapporo, Japan, 18–21 September 2016; pp. 1–7. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Hoos, H.; Leyton-Brown, K. An efficient approach for assessing hyperparameter importance. In Proceedings of the International Conference on Machine Learning, Bejing, China, 22–24 June 2014; pp. 754–762. [Google Scholar]
Height of Heels | Categories of Shoes |
---|---|
0–2.54 cm (0–1 inch) | Flat |
2.54–7.62 cm (1–3 inch) | Mid HH |
>7.62 cm (> 3 inch) | Ultra HH |
Smartphone | Accelerometer | Gyroscope | Magnetometer |
---|---|---|---|
Samsung S10 | STMicro LSM6DSO (416 Hz) | STMicro LSM6DSO (416 Hz) | AK09918 (50 Hz) |
Samsung Note8 | STMicro LSM6DSL (400 Hz) | STMicro LSM6DSL (400 Hz) | AK09916C (50 Hz) |
Smartisan Pro2 | Bosch BMI160 (200 Hz) | Bosch BMI160 (200 Hz) | AK09918 (50 Hz) |
Parameter | DL Models | Optimal Values | |||
---|---|---|---|---|---|
Search Space | Sensing-HH | CNN [57] | BiLSTM [58] | CNN-LSTM [9] | |
Optimizer | (Rmsprop, Adam, Sgd) | Adam | Sgd | Rmsprop | Adam |
Learning Rate | (0.001, 0.01) | 0.001 | 0.001 | 0.001 | 0.001 |
Epochs | (20, 50, 100) | 50 | 50 | 20 | 100 |
Batch Size | (20, 30, 40, 50) | 10 | 10 | 30 | 10 |
Dropout Rate | (0, 0.3, 0.5) | 0.5 | 0.3 | 0.5 | 0.5 |
Regularizer | (L1, L2) | L2 | L2 | L2 | L2 |
Model | :1s, | :2s, | :5s, | ||||||
---|---|---|---|---|---|---|---|---|---|
H | W | B | H | W | B | H | W | B | |
RF [55] | 0.601 | 0.649 | 0.521 | 0.613 | 0.671 | 0.557 | 0.582 | 0.648 | 0.428 |
SVM [56] | 0.637 | 0.688 | 0.582 | 0.672 | 0.701 | 0.564 | 0.601 | 0.652 | 0.536 |
CNN [57] | 0.682 | 0.715 | 0.604 | 0.707 | 0.723 | 0.641 | 0.653 | 0.694 | 0.572 |
BiLSTM [58] | 0.676 | 0.693 | 0.609 | 0.662 | 0.709 | 0.583 | 0.699 | 0.714 | 0.565 |
CNN-LSTM [9] | 0.683 | 0.711 | 0.614 | 0.703 | 0.716 | 0.659 | 0.704 | 0.723 | 0.534 |
Sensing-HH | 0.716 | 0.759 | 0.627 | 0.743 | 0.826 | 0.636 | 0.721 | 0.786 | 0.617 |
Categories by Shoes | Number of Instances | ||
---|---|---|---|
Hand | Waist | Handbag | |
Flat | 9937 | 9451 | 9703 |
Mid HH | 4391 | 4479 | 4532 |
Ultra HH | 6591 | 6902 | 6811 |
No. | Heels (cm) | Height (cm) | Weight (kg) | True Label | Predicted Label |
---|---|---|---|---|---|
2 | 7.0 | 152 | 46.3 | Mid HH | Ultra HH |
22 | 7.1 | 154 | 43.8 | Mid HH | Ultra HH |
12 | 7.7 | 170 | 58.4 | Ultra HH | Flat |
31 | 8.1 | 176 | 51.6 | Ultra HH | Mid HH |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yao, Y.; Wen, Y.; Wang, J. Sensing-HH: A Deep Hybrid Attention Model for Footwear Recognition. Electronics 2020, 9, 1552. https://doi.org/10.3390/electronics9091552
Yao Y, Wen Y, Wang J. Sensing-HH: A Deep Hybrid Attention Model for Footwear Recognition. Electronics. 2020; 9(9):1552. https://doi.org/10.3390/electronics9091552
Chicago/Turabian StyleYao, Yumin, Ya Wen, and Jianxin Wang. 2020. "Sensing-HH: A Deep Hybrid Attention Model for Footwear Recognition" Electronics 9, no. 9: 1552. https://doi.org/10.3390/electronics9091552
APA StyleYao, Y., Wen, Y., & Wang, J. (2020). Sensing-HH: A Deep Hybrid Attention Model for Footwear Recognition. Electronics, 9(9), 1552. https://doi.org/10.3390/electronics9091552