MLPs Are All You Need for Human Activity Recognition
Abstract
:1. Introduction
- We investigate the performance of the MLP-Mixer in multi-sensor HAR, achieving competitive, and in some cases, state-of-the-art performance in HAR without convolution, recurrent, or attention-based mechanisms in the model. The accompanied code can be found here https://github.com/KMC07/MLPMixerHAR (accessed on 6 October 2023).
- We analyse the impact of each layer in the Mixer for HAR.
- We analyse the effect of the sliding windows on the Mixer’s performance in HAR.
- We perform a visual analysis of the Mixer’s weights to validate that the Mixer is successfully recognising different human activities.
2. Related Work
MLP Architectures
3. Methodology
3.1. MLP-Mixer
- The first block is the token-mixing MLP; the input matrix is normalised and transposed to allow the data to mix across each patch. The MLP(MLP1) will act on each column of the input matrix, sharing its weights across the columns. The matrix is transposed back into its original form. The overall context of the input is obtained by feeding each patch’s data into the MLP. This token-mixing block essentially allows different patches in the same channel to communicate.
- The second block is the channel-mixing MLP; this receives residual connections from its pre-normalised original input to prevent information from being lost during the training process. The result is normalised, and a different MLP(MLP2) performs the channel-mixing with a separate set of weights. The MLP acts on each input matrix row, and its weights are shared across the rows. A single patch’s MLP receives data from every channel, enabling communication between the information from various channels.
4. Datasets
4.1. Opportunity
- Opportunity Gestures: This involves successfully classifying different gestures being performed by the subjects from both arm sensors. There are 18 different gesture classes.
- Opportunity Locomotion: This involves accurately classifying the locomotion of the subjects using full body sensors. There are five different locomotion classes.
4.2. PAMAP2
4.3. Daphnet Gait
4.4. Sliding Windows
- Opportunity: The dataset was fit into a sliding window with an interval of 2.57 s. This duration represents 77 samples, which makes the input dimensions identical, allowing the patch resolution to be a factor of 77. The dataset was normalised to account for the wide range of sensors used in the dataset. After preprocessing the data, there were no labels of “close drawer 2” activity in the test set (ADL4 and AD5 from subjects 2 and 3).
- PAMAP2: Before downsampling, the dataset was fitted into a sliding window interval of 0.84 s, which corresponds to 84 samples. The “rope-jumping” activity in subject 6 had a very small number of samples. After preprocessing, there were no labels of this activity present in the test set (subject 6).
- Daphnet Gait: Before downsampling, a sliding window interval of 2.1 s was used to fit the dataset; this interval corresponds to 126 samples. Daphnet Gait contains a lot of longer activities, so a wider sliding window interval was chosen to provide the Mixer with more information.
4.5. Data Sampler and Generation
4.6. Patches
5. Experimental Setup
5.1. Ablation Study
5.2. Measuring Performance
- True Positive (TP): the model accurately predicts that the class is an activity.
- True Negative (TN): the model accurately predicts that the class is not an activity.
- False Positive (FP): the model inaccurately predicts that the class is an activity.
- False Negative (FN): the model inaccurately predicts that the class is not an activity.
5.2.1. Precision
5.2.2. Recall
5.2.3. -Score
5.2.4. Macro -Score
5.2.5. Weighted -Score
6. Results
- Ensemble LSTMs [32]: combines multiple LSTMs using ensemble techniques to produce a single LSTM.
- CNN-BiGRU [37]: CNN connected with a biGRU.
- AttenSense [22]: a CNN and GRU are combined using an attention mechanism to learn spatial and temporal patterns.
- Multi-Agent Attention [38]: combines multi-agent collaboration with attention-based selection.
- DeepConvLSTM [35]: combines an LSTM to learn temporal information with a CNN to learn spatial features.
- BLSTM-RNN [33]: a bi-LSTM, with its weights and activation functions binarized.
- Triple Attention [39]: a ResNet, using a triple-attention mechanism.
- Self-Attention [40]: a self-attention-based model without any recurrent architectures.
- CNN [18]: a CNN with three layers and max pooling.
- b-LSTM-S [18]: bidirectional LSTM that uses future training data.
7. Discussion
7.1. Performance of Sliding Window Parameters
7.2. Weight Visualisation
8. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Parker, S.J.; Strath, S.J.; Swartz, A.M. Physical Activity Measurement in Older Adults: Relationships With Mental Health. J. Aging Phys. Act. 2008, 16, 369–380. [Google Scholar] [CrossRef] [PubMed]
- Kranz, M.; Möller, A.; Hammerla, N.; Diewald, S.; Plötz, T.; Olivier, P.; Roalter, L. The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices. Pervasive Mob. Comput. 2013, 9, 203–215. [Google Scholar] [CrossRef]
- Patel, S.; Park, H.S.; Bonato, P.; Chan, L.; Rodgers, M. A Review of Wearable Sensors and Systems with Application in Rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [Google Scholar] [CrossRef]
- Cedillo, P.; Sanchez-Zhunio, C.; Bermeo, A.; Campos, K. A Systematic Literature Review on Devices and Systems for Ambient Assisted Living: Solutions and Trends from Different User Perspectives. In 2018 International Conference on eDemocracy & eGovernment (ICEDEG); IEEE: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
- De Leonardis, G.; Rosati, S.; Balestra, G.; Agostini, V.; Panero, E.; Gastaldi, L.; Knaflitz, M. Human Activity Recognition by Wearable Sensors: Comparison of different classifiers for real-time applications. In Proceedings of the 2018 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Rome, Italy, 11–13 June 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Park, S.; Jayaraman, S. Enhancing the quality of life through wearable technology. IEEE Eng. Med. Biol. Mag. 2003, 22, 41–48. [Google Scholar] [CrossRef] [PubMed]
- Lara, O.D.; Labrador, M.A. A Survey on Human Activity Recognition using Wearable Sensors. IEEE Commun. Surv. Tutorials 2013, 15, 1192–1209. [Google Scholar] [CrossRef]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. MLP-Mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
- Le, V.T.; Tran-Trung, K.; Hoang, V.T. A comprehensive review of recent deep learning techniques for human activity recognition. Comput. Intell. Neurosci. 2022, 2022, 8323962. [Google Scholar] [CrossRef]
- Roggen, D.; Calatroni, A.; Rossi, M.; Holleczek, T.; Förster, K.; Tröster, G.; Lukowicz, P.; Bannach, D.; Pirkl, G.; Ferscha, A.; et al. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the 2010 Seventh International Conference on Networked Sensing Systems (INSS), Kassel, Germany, 15–18 June 2010; pp. 233–240. [Google Scholar] [CrossRef]
- Bächlin, M.; Plotnik, M.; Roggen, D.; Maidan, I.; Hausdorff, J.; Giladi, N.; Troster, G. Wearable Assistant for Parkinson’s Disease Patients with the Freezing of Gait Symptom. Inf. Technol. Biomed. IEEE Trans. 2010, 14, 436–446. [Google Scholar] [CrossRef]
- Reiss, A.; Stricker, D. Introducing a New Benchmarked Dataset for Activity Monitoring. In Proceedings of the 2012 16th International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109. [Google Scholar] [CrossRef]
- Zappi, P.; Lombriser, C.; Stiefmeier, T.; Farella, E.; Roggen, D.; Benini, L.; Tröster, G. Activity Recognition from On-Body Sensors: Accuracy-Power Trade-Off by Dynamic Sensor Selection. In Proceedings of the Wireless Sensor Networks; Verdone, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 17–33. [Google Scholar]
- Weiss, G.M.; Yoneda, K.; Hayajneh, T. Smartphone and Smartwatch-Based Biometrics Using Activities of Daily Living. IEEE Access 2019, 7, 133190–133202. [Google Scholar] [CrossRef]
- Banos, O.; García, R.; Holgado-Terriza, J.; Damas, M.; Pomares, H.; Rojas, I.; Saez, A.; Villalonga, C. mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications; Proceedings 6; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; Volume 8868, pp. 91–98. [Google Scholar] [CrossRef]
- Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition using Smartphones. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN), Computational Intelligence and Machine Learning, Bruges, Belgium, 24–26 April 2013. [Google Scholar]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional Neural Networks for human activity recognition using mobile sensors. In Proceedings of the 6th International Conference on Mobile Computing, Applications and Services, Austin, TX, USA, 6–7 November 2014; pp. 197–205. [Google Scholar] [CrossRef]
- Hammerla, N.Y.; Halloran, S.; Ploetz, T. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. arXiv 2016, arXiv:1604.08880. [Google Scholar]
- Tang, Y.; Teng, Q.; Zhang, L.; Min, F.; He, J. Layer-Wise Training Convolutional Neural Networks with Smaller Filters for Human Activity Recognition Using Wearable Sensors. IEEE Sens. J. 2021, 21, 581–592. [Google Scholar] [CrossRef]
- Yang, Z.; Wang, Y.; Liu, C.; Chen, H.; Xu, C.; Shi, B.; Xu, C.; Xu, C. Legonet: Efficient convolutional neural networks with lego filters. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7005–7014. [Google Scholar]
- Murad, A.; Pyun, J.Y. Deep Recurrent Neural Networks for Human Activity Recognition. Sensors 2017, 17, 2556. [Google Scholar] [CrossRef]
- Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 3109–3115. [Google Scholar] [CrossRef]
- Gao, W.; Zhang, L.; Teng, Q.; He, J.; Wu, H. DanHAR: Dual Attention Network for multimodal human activity recognition using wearable sensors. Appl. Soft Comput. 2021, 111, 107728. [Google Scholar] [CrossRef]
- Liu, R.; Li, Y.; Tao, L.; Liang, D.; Zheng, H.T. Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 2022, 3, 100520. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Dai, Z.; So, D.R.; Le, Q.V. Pay Attention to MLPs. Adv. Neural Inf. Process. Syst. 2021, 34, 9204–9215. [Google Scholar]
- Yu, T.; Li, X.; Cai, Y.; Sun, M.; Li, P. S2-MLP: Spatial-Shift MLP Architecture for Vision. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3615–3624. [Google Scholar] [CrossRef]
- Wei, G.; Zhang, Z.; Lan, C.; Lu, Y.; Chen, Z. ActiveMLP: An MLP-like Architecture with Active Token Mixer. arXiv 2022, arXiv:2203.06108. [Google Scholar]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Li, Y.; Xu, C.; Wang, Y. An Image Patch is a Wave: Phase-Aware Vision MLP. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10935–10944. [Google Scholar]
- Wang, Z.; Jiang, W.; Zhu, Y.; Yuan, L.; Song, Y.; Liu, W. DynaMixer: A Vision MLP Architecture with Dynamic Mixing. In Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; Volume 162, pp. 22691–22701. [Google Scholar]
- Hendrycks, D.; Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv 2016, arXiv:1610.02136. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Guan, Y.; Ploetz, T. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1–28. [Google Scholar]
- Edel, M.; Köppe, E. Binarized-BLSTM-RNN based Human Activity Recognition. In Proceedings of the 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Alcala de Henares, Spain, 18–21 September 2016; pp. 1–7. [Google Scholar] [CrossRef]
- Moya Rueda, F.; Grzeszick, R.; Fink, G.A.; Feldhorst, S.; Ten Hompel, M. Convolutional Neural Networks for Human Activity Recognition Using Body-Worn Sensors. Informatics 2018, 5, 26. [Google Scholar] [CrossRef]
- Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Mekruksavanich, S.; Jitpattanakul, A. Deep Convolutional Neural Network with RNNs for Complex Activity Recognition Using Wrist-Worn Wearable Sensor Data. Electronics 2021, 10, 1685. [Google Scholar] [CrossRef]
- Chen, K.; Yao, L.; Zhang, D.; Guo, B.; Yu, Z. Multi-agent Attentional Activity Recognition. arXiv 2019, arXiv:1905.08948. [Google Scholar]
- Tang, Y.; Zhang, L.; Teng, Q.; Min, F.; Song, A. Triple Cross-Domain Attention on Human Activity Recognition Using Wearable Sensors. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 1–10. [Google Scholar] [CrossRef]
- Mahmud, S.; Tonmoy, M.T.H.; Bhaumik, K.K.; Rahman, A.K.M.M.; Amin, M.A.; Shoyaib, M.; Khan, M.A.H.; Ali, A.A. Human Activity Recognition from Wearable Sensor Data Using Self-Attention. arXiv 2020, arXiv:2003.09018. [Google Scholar]
- Li, B.; Yao, Z.; Wang, J.; Wang, S.; Yang, X.; Sun, Y. Improved Deep Learning Technique to Detect Freezing of Gait in Parkinson’s Disease Based on Wearable Sensors. Electronics 2020, 9, 1919. [Google Scholar] [CrossRef]
- Thu, N.T.H.; Han, D.S. Freezing of Gait Detection Using Discrete Wavelet Transform and Hybrid Deep Learning Architecture. In Proceedings of the 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), Jeju Island, Republic of Korea, 17–20 August 2021; pp. 448–451. [Google Scholar] [CrossRef]
- El-ziaat, H.; El-Bendary, N.; Moawad, R. A Hybrid Deep Learning Approach for Freezing of Gait Prediction in Patients with Parkinson’s Disease. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 766–776. [Google Scholar] [CrossRef]
Opportunity | PAMAP2 | Daphnet Gait | |
---|---|---|---|
Parameters | |||
Number of Activities | 18 | 19 | 2 |
Number of Features | 77 | 40 | 9 |
Sliding Window Length | 77 | 84 | 126 |
Sampling Rate | 30 Hz | 100 Hz | 64 Hz |
Downsampling | 1 | 3 | 2 |
Step Size | 3 | 3 | 3 |
Normalisation | True | False | False |
Interpolation | False | True | False |
Includes Null activities | True | False | False |
Opportunity | PAMAP2 | Daphnet Gait | |
---|---|---|---|
Specifications | |||
Number of Layers | 10 | 10 | 10 |
Patch Resolution | 11 | 4 | 9 |
Input Sequence Length | 49 | 210 | 14 |
Patch-Embedding Size | 512 | 512 | 512 |
Token Dimension | 256 | 256 | 256 |
Channel Dimension | 2048 | 2048 | 512 |
Learnable Parameters (M) | 21 | 21 | 5 |
Opportunity | PAMAP2 | Daphnet | |
---|---|---|---|
Metric | |||
Base Mixer | 0.68 | 0.971 | 0.85 |
Mixer with no RGB Embedding | 0.63 | 0.940 | 0.79 |
Mixer with no Token-Mixing | 0.05 | 0.165 | 0.12 |
Mixer with no Channel-Mixing | 0.569 | 0.82 | 0.795 |
Opportunity Locomotion | Opportunity Gestures | PAMAP2 | Daphnet Gait | |
---|---|---|---|---|
Metric | ||||
Ensemble LSTMs [32] | - | 0.726 | 0.854 | - |
CNN-BiGRU [37] | - | - | 0.855 | - |
AttenSense [22] | - | - | 0.893 | - |
Multi-Agent Attention [38] | - | - | 0.899 | - |
DeepConvLSTM [35] | 0.895 | 0.917 | - | - |
BLSTM-RNN [33] | - | - | 0.93 | - |
Triple Attention [39] | - | - | 0.932 | - |
Self-Attention [40] | - | - | 0.96 | - |
CNN [18] | - | 0.894 | 0.937 | 0.684 |
b-LSTM-S [18] | - | 0.927 | 0.868 | 0.741 |
MLP-Mixer | 0.90 ± 0.005 | 0.912 ± 0.002 | 0.97 ± 0.002 | 0.842 ± 0.007 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ojiako, K.; Farrahi, K. MLPs Are All You Need for Human Activity Recognition. Appl. Sci. 2023, 13, 11154. https://doi.org/10.3390/app132011154
Ojiako K, Farrahi K. MLPs Are All You Need for Human Activity Recognition. Applied Sciences. 2023; 13(20):11154. https://doi.org/10.3390/app132011154
Chicago/Turabian StyleOjiako, Kamsiriochukwu, and Katayoun Farrahi. 2023. "MLPs Are All You Need for Human Activity Recognition" Applied Sciences 13, no. 20: 11154. https://doi.org/10.3390/app132011154
APA StyleOjiako, K., & Farrahi, K. (2023). MLPs Are All You Need for Human Activity Recognition. Applied Sciences, 13(20), 11154. https://doi.org/10.3390/app132011154