Data Augmentation Method for Pedestrian Dress Recognition in Road Monitoring and Pedestrian Multiple Information Recognition Model
Abstract
:1. Introduction
- Mask-Mosaic and Mask-Mosaic++ data augmentation methods are proposed. Both of them involve data augmentation of mixed samples, the former can speed up the training speed of the network, and the latter can improve the generalization ability of the network. Both of these data augmentation methods combine four pictures into one picture by placing them in a certain form. In the end, the recognition ability of Mask-Mosaic++ in the case of small targets and occlusions is greatly improved.
- An integrated system of clothing type recognition and color recognition is proposed. By using the Mask R-CNN [20] instance segmentation model to predict the clothing type, the instance mask is input into the k-means clustering, and the main color information is obtained after k-means clustering of the color.
1.1. Instance Segmentation
1.2. Data Augmentation
1.3. Clothing Recognition Task
2. Model
2.1. Mask-Mosaic++ Data Augmentation
2.2. Mask R-CNN
- w and h correspond to the width and height of the preselected box, respectively;
- K represents the level at which the preselected box belongs to the feature layer;
- generally takes 4.
- represents the loss of classification branch;
- represents the loss of the location branch, which is the boundary frame regression loss;
- represents the loss of the split branch, which is the loss of mask.
- represents the probability that the ith anchor is predicted to be a true frame;
- is a constant, generally 10 in the experiment;
- represents the boundary box regression parameter of the ith anchor predicted;
- represents the regression parameter of the boundary box of the ith anchor corresponding to the real label;
- represents the number of samples; and
- represents the number of anchor positions.
- y represents the true value after binarization; and
- represents the predicted value after binarization.
2.3. k-Means Clustering Algorithm
- K is the total number of clusters, and is the average value of cluster .
- is a small number, and , respectively, represent the results of the previous and second iteration of Equation (9).
2.4. Network Training
2.5. Model Assessment
3. Results
3.1. Dataset
3.2. Data Augmentation Experiment
3.3. Clothing Color Recognition
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Huang, X.; Ge, Z.; Jie, Z.; Yoshie, O. Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 10750–10759. [Google Scholar]
- Chu, X.; Zheng, A.; Zhang, X.; Sun, J. Detection in crowded scenes: One proposal, multiple predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 12214–12223. [Google Scholar]
- Wu, J.; Zhou, C.; Yang, M.; Zhang, Q.; Li, Y.; Yuan, J. Temporal-context enhanced detection of heavily occluded pedestrians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 13430–13439. [Google Scholar]
- Zhang, Z.; Gao, J.; Mao, J.; Liu, Y.; Anguelov, D.; Li, C. Stinet: Spatio-temporal-interactive network for pedestrian detection and trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 11346–11355. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Yuan, J.; Panagiotis, B.; Stathaki, T. Effectiveness of Vision Transformer for Fast and Accurate Single-Stage Pedestrian Detection. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Zhang, Y.; Zhou, A.; Zhao, F.; Wu, H. A lightweight vehicle-pedestrian detection algorithm based on attention mechanism in traffic scenarios. Sensors 2022, 22, 8480. [Google Scholar] [CrossRef] [PubMed]
- Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.Y.; Shlens, J.; Le, Q.V. Learning data augmentation strategies for object detection. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 566–583. [Google Scholar]
- Zhou, K.; Zhao, W.X.; Wang, S.; Zhang, F.; Wu, W.; Wen, J.R. Virtual data augmentation: A robust and general framework for fine-tuning pre-trained models. arXiv 2021, arXiv:2109.05793. [Google Scholar]
- Luo, C.; Zhu, Y.; Jin, L.; Wang, Y. Learn to augment: Joint data augmentation and network optimization for text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 13746–13755. [Google Scholar]
- Yuan, J.; Liu, Y.; Shen, C.; Wang, Z.; Li, H. A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 8229–8238. [Google Scholar]
- Bosquet, B.; Cores, D.; Seidenari, L.; Brea, V.M.; Mucientes, M.; Del Bimbo, A. A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 2023, 133, 108998. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Qiu, S.; Wang, X.; Tang, X. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1096–1104. [Google Scholar]
- Zheng, S.; Yang, F.; Kiapour, M.H.; Piramuthu, R. Modanet: A large-scale street fashion dataset with polygon annotations. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 1670–1678. [Google Scholar]
- Aulia, N.; Arnia, F.; Munadi, K. HOG of Region of Interest for Improving Clothing Retrieval Performance. In Proceedings of the 2019 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Banda Aceh, Indonesia, 22–24 August 2019; pp. 7–12. [Google Scholar]
- Hussain, T.; Ahmad, M.; Ali, S.; Khan, S.; Rahman, A.; Haider, A. An Intelligent Dress Uniform Identification System. In Proceedings of the 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 30–31 January 2019; pp. 1–4. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Sidnev, A.; Trushkov, A.; Kazakov, M.; Korolev, I.; Sorokin, V. Deepmark: One-shot clothing detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Prinosil, J. Clothing Color Based De-Identification. In Proceedings of the 2018 41st International Conference on Telecommunications and Signal Processing (TSP), Athens, Greece, 4–6 July 2018; pp. 1–5. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
- Hataya, R.; Zdenek, J.; Yoshizoe, K.; Nakayama, H. Meta approach to data augmentation optimization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 2574–2583. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Zhang, X.; Wang, Z.; Liu, D.; Lin, Q.; Ling, Q. Deep adversarial data augmentation for extremely low data regimes. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 15–28. [Google Scholar] [CrossRef]
- Mansourifar, H.; Chen, L.; Shi, W. Virtual big data for GAN based data augmentation. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 1478–1487. [Google Scholar]
- Kora Venu, S.; Ravula, S. Evaluation of deep convolutional generative adversarial networks for data augmentation of chest X-ray images. Future Internet 2020, 13, 8. [Google Scholar] [CrossRef]
- Algabri, R.; Choi, M.T. Deep-learning-based indoor human following of mobile robot using color feature. Sensors 2020, 20, 2699. [Google Scholar] [CrossRef] [PubMed]
- Patel, C.; Liao, Z.; Pons-Moll, G. Tailornet: Predicting clothing in 3d as a function of human pose, shape and garment style. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 7365–7375. [Google Scholar]
- Hidayati, S.C.; Goh, T.W.; Chan, J.S.G.; Hsu, C.C.; See, J.; Wong, L.K.; Hua, K.L.; Tsao, Y.; Cheng, W.H. Dress with style: Learning style from joint deep embedding of clothing styles and body shapes. IEEE Trans. Multimed. 2020, 23, 365–377. [Google Scholar] [CrossRef]
- Zoph, B.; Ghiasi, G.; Lin, T.Y.; Cui, Y.; Liu, H.; Cubuk, E.D.; Le, Q. Rethinking pre-training and self-training. Adv. Neural Inf. Process. Syst. 2020, 33, 3833–3845. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Ge, Y.; Zhang, R.; Wang, X.; Tang, X.; Luo, P. Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2019; pp. 5337–5345. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact++: Better real-time instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1108–1121. [Google Scholar] [CrossRef]
Dataset | Train | Valid | Test | ||||||
---|---|---|---|---|---|---|---|---|---|
Deepfashion2 [38] (a little) | 1044 | 456 | degree of occlusion | object size | |||||
small | moderate | serious | small | medium | large | normal test | |||
269 | 265 | 193 | 191 | 253 | 192 | 418 |
Resnet101(512) | Degree of Occlusion | Target Size | |||||
---|---|---|---|---|---|---|---|
Small | Moderate | Serious | Small | Medium | Large | ||
Original | 34.89 | 32.33 | 31.11 | 30.37 | 35.75 | 34.59 | |
Mask-Mosaic | 32.60 | 32.63 | 25.55 | 36.05 | 30.35 | 26.89 | |
Mask-Mosaic++ | 39.81 | 39.90 | 37.13 | 42.74 | 39.65 | 38.52 | |
Original | 25.62 | 23.43 | 17.62 | 21.67 | 24.35 | 22.56 | |
Mask-Mosaic | 22.73 | 23.34 | 15.23 | 25.36 | 20.57 | 16.39 | |
Mask-Mosaic++ | 29.63 | 29.47 | 26.35 | 30.28 | 28.37 | 28.88 | |
Original | 17.26 | 13.68 | 13.33 | 12.19 | 16.58 | 14.36 | |
Mask-Mosaic | 14.75 | 14.26 | 10.38 | 16.48 | 11.78 | 8.37 | |
Mask-Mosaic++ | 20.75 | 19.89 | 18.38 | 21.45 | 17.45 | 18.23 |
Model | #Params | |
---|---|---|
Resnet101 FPN (512) | 250 M | 33.49 |
w/Mask-Mosaic | 250 M | 29.38 |
w/Mask-Mosaic++ | 250 M | 37.65 |
Resnet51 FPN (512) | 175 M | 24.82 |
w/Mask-Mosaic | 175 M | 12.34 |
w/Mask-Mosaic++ | 175 M | 27.49 |
Model | ||
---|---|---|
YOLOv5n + Resnet101 | 31.79 | 30.36 |
YOLOv5n + Mask-Mosaic++ | 33.15 | 31.65 |
YOLOCT++ + Resnet101 | 32.58 | 31.22 |
YOLOCT++ + Mask-Mosaic++ | 34.05 | 32.52 |
1 | 2 | 3 | 4 | |
---|---|---|---|---|
RGB_image + RGB_k-means 1 | 73.33% | 75% | 75% | 73.33% |
RGB_image + HSV_k-means | 71.66% | 73.33% | 73.33% | 75% |
HSV_image + RGB_k-means | 63.33% | 65% | 62% | 65% |
HSV_image + HSV_k-means | 71.66% | 75% | 76.66% | 73.33% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Guo, L.; Yang, D.; Zhang, X. Data Augmentation Method for Pedestrian Dress Recognition in Road Monitoring and Pedestrian Multiple Information Recognition Model. Information 2023, 14, 125. https://doi.org/10.3390/info14020125
Wang H, Guo L, Yang D, Zhang X. Data Augmentation Method for Pedestrian Dress Recognition in Road Monitoring and Pedestrian Multiple Information Recognition Model. Information. 2023; 14(2):125. https://doi.org/10.3390/info14020125
Chicago/Turabian StyleWang, Huiyong, Liang Guo, Ding Yang, and Xiaoming Zhang. 2023. "Data Augmentation Method for Pedestrian Dress Recognition in Road Monitoring and Pedestrian Multiple Information Recognition Model" Information 14, no. 2: 125. https://doi.org/10.3390/info14020125
APA StyleWang, H., Guo, L., Yang, D., & Zhang, X. (2023). Data Augmentation Method for Pedestrian Dress Recognition in Road Monitoring and Pedestrian Multiple Information Recognition Model. Information, 14(2), 125. https://doi.org/10.3390/info14020125