CLEAR: Multimodal Human Activity Recognition via Contrastive Learning Based Feature Extraction Refinement
Abstract
:1. Introduction
- •
- Extensive augmented data are obtained from multiple perspectives by fine-tuning the amplitude, frequency, and phase of the data in the frequency domain. Combined with time-domain augmentation techniques, we have significantly enriched the training dataset. This strategy notably enhances the model’s ability to generalize across various activity patterns.
- •
- We effectively leverage the multimodal characteristics of sensor data by implementing convolutional subnetworks and attention mechanisms for localized feature extraction and adaptive feature fusion. It improves feature representativeness and performance in downstream classification tasks.
- •
- We introduce an enhanced contrastive learning method that improves feature discriminability by strategically selecting contrastive samples near the classification boundary. This approach reduces computational costs while facilitating the learning of discriminative inter-class features.
- •
- We conducted extensive experiments on three publicly available datasets. The experimental results clearly demonstrate that CLEAR exhibits superior generalization performance in addressing distribution shift issues, resulting in a significant improvement in accuracy.
2. Related Work
2.1. Human Activity Recognition
2.2. Multimodal Human Activity Recognition
2.3. Data Augmentation Improves Generalization by Enriching Data Diversity
2.4. Contrastive Learning Increases Feature Distinguishability to Optimize Model Performance
3. Methodology
3.1. Problem Definitions
3.2. The Overall Structure of CLEAR
3.3. Data Augmentation Enriches the Data Distribution
3.4. Multimodal Feature Fusion
3.5. Contrastive Learning Increases Discrimination
- •
- is a function representing the cosine similarity between two samples;
- •
- is the temperature parameter;
- •
- is the feature representation of the anchor sample;
- •
- is the set containing the feature representations of m positive samples for the anchor sample;
- •
- is the set containing the feature representations of q negative samples for the anchor sample.
3.6. Activity Classification
4. Experiments
4.1. Datasets and Experimental Setting
4.1.1. Datasets
4.1.2. Implementation Details
4.2. Experiment Results
4.3. Robustness to Training Data Volume
4.4. Visualization Experiment
4.5. Ablation Study
4.6. Comparison Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Morshed, M.G.; Sultana, T.; Alam, A.; Lee, Y.-K. Human action recognition: A taxonomy-based survey, updates, and opportunities. Sensors 2023, 23, 2182. [Google Scholar] [CrossRef] [PubMed]
- Tamura, T. Advanced Wearable Sensors Technologies for Healthcare Monitoring. Sensors 2025, 25, 322. [Google Scholar] [CrossRef]
- Duong, H.-T.; Le, V.-T.; Hoang, V.T. Deep learning-based anomaly detection in video surveillance: A survey. Sensors 2023, 23, 5024. [Google Scholar] [CrossRef] [PubMed]
- Hussain, A.; Khan, S.U.; Khan, N.; Bhatt, M.W.; Farouk, A.; Bhola, J.; Baik, S.W. A Hybrid Transformer Framework for Efficient Activity Recognition Using Consumer Electronics. IEEE Trans. Consum. Electron. 2024, 70, 6800–6807. [Google Scholar] [CrossRef]
- Guarducci, S.; Jayousi, S.; Caputo, S.; Mucchi, L. Key Fundamentals and Examples of Sensors for Human Health: Wearable, Non-Continuous, and Non-Contact Monitoring Devices. Sensors 2025, 25, 556. [Google Scholar] [CrossRef]
- Gu, F.; Chung, M.-H.; Chignell, M.; Valaee, S.; Zhou, B.; Liu, X. A survey on deep learning for human activity recognition. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
- Xiao, Z.; Xu, X.; Xing, H.; Song, F.; Wang, X.; Zhao, B. A federated learning system with enhanced feature extraction for human activity recognition. Knowl.-Based Syst. 2021, 229, 107338. [Google Scholar] [CrossRef]
- Becker, E.; Khaksar, S.; Booker, H.; Hill, K.; Ren, Y.; Tan, T.; Watson, C.; Wordsworth, E.; Harrold, M. Using Inertial Measurement Units and Machine Learning to Classify Body Positions of Adults in a Hospital Bed. Sensors 2025, 25, 499. [Google Scholar] [CrossRef]
- Han, C.; Zhang, L.; Tang, Y.; Huang, W.; Min, F.; He, J. Human activity recognition using wearable sensors by heterogeneous convolutional neural networks. Expert Syst. Appl. 2022, 198, 116764. [Google Scholar] [CrossRef]
- Barshan, B.; Yüksek, M.C. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Comput. J. 2014, 57, 1649–1667. [Google Scholar] [CrossRef]
- Li, B.; Cui, W.; Wang, W.; Zhang, L.; Chen, Z.; Wu, M. Two-stream convolution augmented transformer for human activity recognition. Proc. AAAI Conf. Artif. Intell. 2021, 35, 286–293. [Google Scholar] [CrossRef]
- Shao, X.; Cai, B.; Zou, Z.; Shao, H.; Yang, C.; Liu, Y. Artificial intelligence enhanced fault prediction with industrial incomplete information. Mech. Syst. Signal Process. 2025, 224, 112063. [Google Scholar] [CrossRef]
- Essa, E.; Abdelmaksoud, I.R. Temporal-channel convolution with self-attention network for human activity recognition using wearable sensors. Knowl.-Based Syst. 2023, 278, 110867. [Google Scholar] [CrossRef]
- Yang, C.; Chen, X.; Sun, L.; Yang, H.; Wu, Y. Enhancing representation learning for periodic time series with FLOSS: A frequency domain regularization approach. arXiv 2023, arXiv:2308.01011. [Google Scholar]
- Chen, Z.; Zhang, L.; Cao, Z.; Guo, J. Distilling the knowledge from handcrafted features for human activity recognition. IEEE Trans. Ind. Inform. 2018, 14, 4334–4342. [Google Scholar] [CrossRef]
- Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 3109–3115. [Google Scholar]
- Gao, W.; Zhang, L.; Teng, Q.; He, J.; Wu, H. DanHAR: Dual attention network for multimodal human activity recognition using wearable sensors. Appl. Soft Comput. 2021, 111, 107728. [Google Scholar] [CrossRef]
- Qian, H. A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition (IJCAI-19). In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019. [Google Scholar]
- Qin, Z.; Zhang, Y.; Meng, S.; Qin, Z.; Choo, K.-K.R. Imaging and fusing time series for wearable sensor-based human activity recognition. Inf. Fusion 2020, 53, 80–87. [Google Scholar] [CrossRef]
- Huynh-The, T.; Hua, C.-H.; Tu, N.A.; Kim, D.-S. Physical activity recognition with statistical-deep fusion model using multiple sensory data for smart health. IEEE Internet Things J. 2020, 8, 1533–1543. [Google Scholar] [CrossRef]
- Lu, W.; Wang, J.; Sun, X.; Chen, Y.; Xie, X. Out-of-distribution Representation Learning for Time Series Classification. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Qian, H.; Pan, S.J.; Miao, C. Latent independent excitation for generalizable sensor-based cross-person activity recognition. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11921–11929. [Google Scholar] [CrossRef]
- Lu, W.; Wang, J.; Chen, Y.; Pan, S.J.; Hu, C.; Qin, X. Semantic-discriminative mixup for generalizable sensor-based cross-domain activity recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 65. [Google Scholar] [CrossRef]
- Chen, K.; Yao, L.; Zhang, D.; Wang, X.; Chang, X.; Nie, F. A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 1747–1756. [Google Scholar] [CrossRef]
- Tang, C.I.; Perez-Pozuelo, I.; Spathis, D.; Brage, S.; Wareham, N.; Mascolo, C. Selfhar: Improving human activity recognition through self-training with unlabeled data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 1–30. [Google Scholar] [CrossRef]
- Chen, K.; Yao, L.; Wang, X.; Zhang, D.; Gu, T.; Yu, Z.; Yang, Z. Interpretable parallel recurrent neural networks with convolutional attentions for multi-modality activity modeling. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Deldari, S.; Xue, H.; Saeed, A.; Smith, D.V.; Salim, F.D. Cocoa: Cross modality contrastive learning for sensor data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 108. [Google Scholar] [CrossRef]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Jain, Y.; Tang, C.I.; Min, C.; Kawsar, F.; Mathur, A. Collossl: Collaborative self-supervised learning for human activity recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 17. [Google Scholar] [CrossRef]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.-K.; Li, X.; Guan, C. Self-supervised contrastive representation learning for semi-supervised time-series classification. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15604–15618. [Google Scholar] [CrossRef]
- Deldari, S.; Spathis, D.; Malekzadeh, M.; Kawsar, F.; Salim, F.; Mathur, A. Latent Masking for Multimodal Self-supervised Learning in Health Timeseries. arXiv 2023, arXiv:2307.16847. [Google Scholar]
- Ma, Y.; Ghasemzadeh, H. LabelForest: Non-parametric semi-supervised learning for activity recognition. Proc. 33rd AAAI Conf. Artif. Intell. 2019, 33, 4520–4527. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Shin, Y.; Yoon, S.; Song, H.; Lee, J.-G.; Lee, B.S. Cross-Window Self-Training via Context Variations from Sparsely-Labeled Time Series. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2022. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Lu, W.; Wang, J.; Yu, H.; Huang, L.; Zhang, X.; Chen, Y.; Xie, X. Fixed: Frustratingly easy domain generalization with mixup. PMLR 2024, 234, 159–178. [Google Scholar]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
- Bakhshayesh, P.R.; Ejtehadi, M.; Taheri, A.; Behzadipour, S. The Effects of Data Augmentation Methods on the Performance of Human Activity Recognition. In Proceedings of the 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Virtual, 7–8 December 2022; pp. 1–6. [Google Scholar]
- Xu, M.; Zhang, J.; Ni, B.; Li, T.; Wang, C.; Tian, Q.; Zhang, W. Adversarial domain adaptation with domain mixup. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6502–6509. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. PMLR 2020, 119, 1597–1607. [Google Scholar]
- Meng, Q.; Qian, H.; Liu, Y.; Cui, L.; Xu, Y.; Shen, Z. MHCCL: Masked hierarchical cluster-wise contrastive learning for multivariate time series. Proc. AAAI Conf. Artif. Intell. 2023, 37, 9153–9161. [Google Scholar] [CrossRef]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.-K.; Li, X. Cotmix: Contrastive domain adaptation for time-series via temporal mixup. arXiv 2022, arXiv:2212.01555. [Google Scholar]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.-K.; Li, X.; Guan, C. Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), Virtual, 19–27 August 2021. [Google Scholar]
- Zhang, W.; Zhang, J.; Li, J.; Tsung, F. A co-training approach for noisy time series learning. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 3308–3318. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Lan, X.; Yan, H.; Hong, S.; Feng, M. Towards Enhancing Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Ozyurt, Y.; Feuerriegel, S.; Zhang, C. Contrastive Learning for Unsupervised Domain Adaptation of Time Series. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Ragab, M.; Eldele, E.; Wu, M.; Foo, C.-S.; Li, X.; Chen, Z. Source-free domain adaptation with temporal imputation for time series data. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 1989–1998. [Google Scholar]
- Cai, R.; Chen, J.; Li, Z.; Chen, W.; Zhang, K.; Ye, J.; Li, Z.; Yang, X.; Zhang, Z. Time series domain adaptation via sparse associative structure alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 6859–6867. [Google Scholar]
- Chen, K.; Yao, L.; Zhang, D.; Gu, B.; Yu, Z. Multi-agent attentional activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1344–1350. [Google Scholar]
- Qin, X.; Wang, J.; Ma, S.; Lu, W.; Zhu, Y.; Xie, X.; Chen, Y. Generalizable low-resource activity recognition with diverse and discriminative representation learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 1943–1953. [Google Scholar]
- Cooley, J.W.; Tukey, J.W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
- Lu, W.; Wang, J.; Li, H.; Chen, Y.; Xie, X. Domain-invariant Feature Exploration for Domain Generalization. Trans. Mach. Learn. Res. 2023, in press. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Zhang, M.; Sawchuk, A.A. USC-HAD: A daily activity dataset for ubiquitous activity recognition using wearable sensors. In Proceedings of the ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 1036–1043. [Google Scholar]
- Reiss, A.; Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In Proceedings of the 16th International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Li, D.; Yang, Y.; Song, Y.-Z.; Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Shi, Y.; Seely, J.; Torr, P.; Siddharth, N.; Hannun, A.; Usunier, N.; Synnaeve, G. Gradient Matching for Domain Generalization. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2023. [Google Scholar]
Dataset | Subjects | Sensors | Positions | Data Size | Sampling Rate (Hz) |
---|---|---|---|---|---|
USC-HAD | 14 7 F, 7 M | single Motion Node Acc, Gyro | Front right hip | 2.81 M | 100 |
DSADS | 8 4 F, 4 M | Five Xsens MTx units acc, gyro, mag | torso, arms, legs | 1.14 M | 25 |
PAMAP2 | 9 1 F, 8 M | IMUs/HRM acc, gyro | arm, chest, ankle, mag | 2.84 M | 100/9 |
Dataset | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|
PAMAP2 | 82.76 | 85.50 | 82.87 | 82.25 |
USC-HAD | 81.09 | 84.48 | 76.74 | 79.88 |
DSADS | 90.45 | 91.57 | 90.84 | 91.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, M.; Wan, J.; Gu, X. CLEAR: Multimodal Human Activity Recognition via Contrastive Learning Based Feature Extraction Refinement. Sensors 2025, 25, 896. https://doi.org/10.3390/s25030896
Cao M, Wan J, Gu X. CLEAR: Multimodal Human Activity Recognition via Contrastive Learning Based Feature Extraction Refinement. Sensors. 2025; 25(3):896. https://doi.org/10.3390/s25030896
Chicago/Turabian StyleCao, Mingming, Jie Wan, and Xiang Gu. 2025. "CLEAR: Multimodal Human Activity Recognition via Contrastive Learning Based Feature Extraction Refinement" Sensors 25, no. 3: 896. https://doi.org/10.3390/s25030896
APA StyleCao, M., Wan, J., & Gu, X. (2025). CLEAR: Multimodal Human Activity Recognition via Contrastive Learning Based Feature Extraction Refinement. Sensors, 25(3), 896. https://doi.org/10.3390/s25030896