1. Introduction
With the rapid advancement of e-commerce, online shopping has surged in popularity. In this evolving landscape, consumers often find themselves inundated with abundant information regarding available clothing options. Simultaneously, retailers assist customers in efficiently selecting the items that best suit their needs. Furthermore, customers exhibit individual preferences, and may adapt their purchases to align with their clothing requirements [
1].
Two industries that stand to benefit significantly from using image data, as opposed to an overreliance on manual labeling, are the fashion and online retail sectors. Instead of relying on human annotation for images, it would be advantageous to employ this technique for the automatic labeling of categories, attributes, and virtually any other data associated with photos used to showcase digital inventory. This shift eliminates individuals’ [
2] need to engage in the labor-intensive task of physically and mentally categorizing images.
The significance of fashion and clothing in contemporary society has grown considerably, influencing people’s daily lives. The fashion industry has become a significant contributor to economic growth, mainly due to the Internet serving as the primary platform for a significant portion of the industry’s transactions. Technological advancements have led to the proliferation of new online fashion websites. As mentioned, consumers are increasingly drawn to online shopping because it eliminates spending hours traveling to physical stores searching for desired clothing. With the convenience of Internet shopping platforms, customers can effortlessly browse and purchase their preferred attire from anywhere, using a mobile device or their preferred platform. Online retailers must ensure user-friendly search functionalities and provide high-quality results that meet customers’ requirements [
3]. However, clothing suppliers need help providing online marketplaces with precise, systematic, and comprehensive product descriptions [
4], as it necessitates automated equipment they may not currently possess. Moreover, the online retail landscape is highly diverse, with numerous retailers offering various clothing types and subcategories, each characterized by unique attributes.
In the fashion industry, the concept of “fast fashion”, characterized by the efficient production of a diverse range of clothing at lower costs and faster turnaround times, has become increasingly prevalent. To meet this objective, fashion companies strive to offer a broad selection of trendy products promptly. Furthermore, these businesses are leveraging technology to tailor their products to meet the specific preferences of their customers and stay attuned to the latest fashion trends. Customers can easily explore clothing items from their favorite brands, including exclusive offerings. By visualizing these brand-aligned apparel products, consumers can gain a better understanding of how new items might complement their wardrobe, thereby reducing the likelihood of unsatisfactory online purchases.
Additionally, fashion brand recommendation systems have the potential to streamline the shopping experience by minimizing the number of clothing items a shopper needs to browse before making a decision. With the guidance provided, customers can try on a select few outfits, simplifying the final selection process. Image classification is a classic challenge in computer vision, machine learning, and image processing. Deep learning algorithms [
5], particularly those used for image and video recognition, have significantly improved accuracy across various domains. Interestingly, deep learning techniques are also being applied in information recommendation, with numerous recommendation systems designed to offer tailored suggestions for information retrieval [
6].
Our research endeavors encompass the utilization of a clothing dataset [
7] for classification purposes, the formulation of an enhanced hybrid model for accurate attire recognition, the fine-tuning of hyperparameters to optimize model performance, and the subsequent selection of the most effective model for ensemble integration. Recommendation systems [
8] have employed a diverse array of methods, including deep learning, machine learning, and image processing, over the years. The primary objective of this study is to initially strategically apply fine-tuning techniques to hyperparameters and subsequently identify the optimal configuration for achieving superior performance in attire recognition tasks.
The contributions of this study are:
Our methodology presents an improved procedure for developing hybrid models, specifically focused on Two-Objective Learning, which illuminates crucial connections within deep feature correlations through the utilization of transfer learning techniques. Additionally, incorporating ensemble learning in our approach offers the advantage of enhancing model robustness and performance compared to relying on a single model, thereby providing more reliable predictions and insights.
After initial testing, ResNet152 and EfficientNetB7 outperform other models, making them the preferred backbone models in our proposed methodology for accurately detecting attire items. This superiority is attributed to the combined strengths of ResNet152 and EfficientNetB7 deep features, which elevate performance levels, even without the use of data augmentation techniques.
In the realm of clothing product dataset recognition, this direct classification approach demonstrates superior performance compared to traditional CNN architectures in terms of accuracy and dependability. However, attire prediction poses challenges due to variations in style, texture, and color, making it difficult for a single model to accurately classify all items. The proposed hybrid learning method offers the advantage of combining the strengths of backbone models to enhance prediction accuracy and robustness in attire recognition tasks.
Author Contributions
Conceptualization, W.A., Z.Z. and M.A.; methodology, W.A. and M.A.; software, W.A.; validation, W.A., M.A., J.C. and S.A.; formal analysis, W.A. and M.A.; investigation, J.C. and S.A.; resources, Z.Z., J.C. and S.A.; data curation, W.A., Z.Z. and M.A.; writing—original draft preparation, W.A. and M.A.; writing—review and editing, Z.Z., M.A., J.C. and S.A.; visualization, W.A. and M.A.; supervision, Z.Z.; project administration, Z.Z.; funding acquisition, M.A. and S.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by EIAS Data Science & Blockchain Lab, Prince Sultan University. Authors would like to thank Prince Sultan University for paying the APC of this article.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data can be shared upon reasonable request from corresponding author.
Acknowledgments
The authors would like to thank Prince Sultan University for their support.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Agarwal, A.; Das, A. Facial Gestures-Based Recommender System for Evaluating Online Classes. In Recommender Systems; CRC Press: Boca Raton, FL, USA, 2023; pp. 173–189. [Google Scholar]
- Yousuf, S.B.; Sajid, H.; Poon, S.; Khushi, M. IMDB-Attire: A Novel Dataset for Attire Detection and Localization. In Proceedings of the Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, 12–15 December 2019; Proceedings, Part II 26; Springer International Publishing: New York, NY, USA, 2019. [Google Scholar]
- Ma, W.; Guan, Z.; Wang, X.; Yang, C.; Cao, J. YOLO-FL: A target detection algorithm for reflective clothing wearing inspection. Displays 2023, 80, 102561. [Google Scholar] [CrossRef]
- Park, Y.E. Research evidence for reshaping global energy strategy based on trend-based approach of big data analytics in the corona era. Energy Strategy Rev. 2022, 41, 100835. [Google Scholar] [CrossRef]
- Amin, M.S.; Wang, C.; Jabeen, S. Fashion sub-categories and attributes prediction model using deep learning. Vis. Comput. 2023, 39, 3851–3864. [Google Scholar] [CrossRef]
- Cosma, A.C.; Simha, R. Machine learning method for real-time non-invasive prediction of individual thermal preference in transient conditions. Build. Environ. 2019, 148, 372–383. [Google Scholar] [CrossRef]
- Liu, K.H.; Chen, T.Y.; Chen, C.S. Mvc: A dataset for view-invariant clothing retrieval and attribute prediction. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 6–9 June 2016. [Google Scholar]
- Gharaei, N.Y.; Dadkhah, C.; Daryoush, L. Content-based clothing recommender system using deep neural network. In Proceedings of the 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 3–4 March 2021. [Google Scholar]
- Guan, C.; Qin, S.; Ling, W.; Ding, G. Apparel recommendation system evolution: An empirical review. Int. J. Cloth. Sci. Technol. 2016, 28, 854–879. [Google Scholar] [CrossRef]
- Schafer, J.B.; Konstan, J.; Riedl, J. Recommender systems in e-commerce. In Proceedings of the 1st ACM conference on Electronic Commerce, Denver, CO, USA, 3–5 November 1999. [Google Scholar]
- Sulthana, R. A review on the literature of fashion recommender system using deep learning. Int. J. Perform. Eng. 2021, 17, 695. [Google Scholar]
- Raza, A.; Meeran, M.T.; Bilhaj, U. Enhancing Breast Cancer Detection through Thermal Imaging and Customized 2D CNN Classifiers. VFAST Trans. Softw. Eng. 2023, 11, 80–92. [Google Scholar]
- Khan, S.U.R.; Asif, S.; Bilal, O.; Ali, S. Deep hybrid model for Mpox disease diagnosis from skin lesion images. Int. J. Imaging Syst. Technol. 2024, 34, e23044. [Google Scholar] [CrossRef]
- Jain, K. Transfer Learning-Based Machine Learning Approach to Solve Problems of E-commerce: Image Search. In International Joint Conference on Advances in Computational Intelligence; Springer: Singapore, 2022. [Google Scholar]
- Yamamoto, T.; Nakazawa, A. Fashion style recognition using component-dependent convolutional neural networks. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
- Villar-Martinez, A.; Rodriguez-Gil, L.; Angulo, I.; Orduña, P.; García-Zubía, J.; López-De-Ipiña, D. Improving the scalability and replicability of embedded systems remote laboratories through a cost-effective architecture. IEEE Access 2019, 7, 164164–164185. [Google Scholar] [CrossRef]
- Pereira, A.M.; Moura JA, B.; Costa ED, B.; Vieira, T.; Landim, A.R.; Bazaki, E.; Wanick, V. Customer models for artificial intelligence-based decision support in fashion online retail supply chains. Decis. Support Syst. 2022, 158, 113795. [Google Scholar] [CrossRef]
- Xhaferra, E.; Cina, E.; Toti, L. Classification of Standard FASHION MNIST Dataset Using Deep Learning Based CNN Algorithms. In Proceedings of the 2022 International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 20–22 October 2022. [Google Scholar]
- Tseng, Y.H.; Wen, C.Y. Hybrid Learning Models for IMU-Based HAR with Feature Analysis and Data Correction. Sensors 2023, 23, 7802. [Google Scholar] [CrossRef]
- Albattah, W.; Albahli, S. Intelligent arabic handwriting recognition using different standalone and hybrid CNN architectures. Appl. Sci. 2022, 12, 10155. [Google Scholar] [CrossRef]
- Sakib, S.; Fahad, N.M.; Raiaan, M.A.K.; Rahman, M.A.; Al Mamun, A.; Islam, S.; Mukta, M.S.H. Predicting gender from human or non-human social media profile photos by using transfer learning. In Proceedings of the 2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE), Kolkata, India, 20–21 January 2023. [Google Scholar]
- Ye, S.; Wang, S.; Chen, N.; Xu, A.; Shi, X. OMNet: Outfit Memory Net for clothing parsing. Int. J. Cloth. Sci. Technol. 2023, 35, 493–505. [Google Scholar] [CrossRef]
- Shi, H.; Zhao, D. License Plate Localization in Complex Environments Based on Improved GrabCut Algorithm. IEEE Access 2022, 10, 88495–88503. [Google Scholar] [CrossRef]
- Wang, H.; Li, J.; Wu, H.; Hovy, E.; Sun, Y. Pre-trained language models and their applications. Engineering 2022, 25, 51–65. [Google Scholar] [CrossRef]
- Zhang, L.; Li, H.; Zhu, R.; Du, P. An infrared and visible image fusion algorithm based on ResNet-152. Multimed. Tools Appl. 2022, 81, 9277–9287. [Google Scholar] [CrossRef]
- Khan, S.U.R.; Zhao, M.; Asif, S.; Chen, X.; Zhu, Y. GLNET: Global–local CNN’s-based informed model for detection of breast cancer categories from histopathological slides. J. Supercomput. 2023, 80, 7316–7348. [Google Scholar] [CrossRef]
- Khan, S.U.R.; Zhao, M.; Asif, S.; Chen, X. Hybrid-NET: A fusion of DenseNet169 and advanced machine learning classifiers for enhanced brain tumor diagnosis. Int. J. Imaging Syst. Technol. 2024, 34, e22975. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Torrey, L.; Shavlik, J. Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques; IGI Global: Hershey, PA, USA, 2010; pp. 242–264. [Google Scholar]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
- Niu, S.; Liu, Y.; Wang, J.; Song, H. A decade survey of transfer learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1, 151–166. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Farooq, M.U.; Beg, M.O. Bigdata analysis of stack overflow for energy consumption of android framework. In Proceedings of the 2019 International Conference on Innovative Computing (ICIC), Lahore, Pakistan, 1–2 November 2019; pp. 1–9. [Google Scholar]
- Khan, S.U.R.; Raza, A.; Waqas, M.; Zia, M.A.R. Efficient and Accurate Image Classification Via Spatial Pyramid Matching and SURF Sparse Coding. Lahore Garrison Univ. Res. J. Comput. Sci. Inf. Technol. 2023, 7, 10–23. [Google Scholar]
- Mohsen, S.; Ali, A.M.; Emam, A. Automatic modulation recognition using CNN deep learning models. Multimed. Tools Appl. 2023, 83, 7035–7056. [Google Scholar] [CrossRef]
- Boulila, W.; Khlifi, M.K.; Ammar, A.; Koubaa, A.; Benjdira, B.; Farah, I.R. A Hybrid Privacy-Preserving Deep Learning Approach for Object Classification in Very High-Resolution Satellite Images. Remote Sens. 2022, 14, 4631. [Google Scholar] [CrossRef]
Figure 1.
Clothing products sample images.
Figure 2.
Preprocessing and Augmented Images.
Figure 3.
Background remove via GrabCut algorithm.
Figure 4.
Proposed two-goal learning architecture.
Figure 5.
Basic Architecture of ResNet152.
Figure 6.
Basic Architecture of EfficientNetB7.
Figure 7.
Proposed model (Two-Objective Learning) confusion matrix analysis.
Figure 8.
Baselines model and proposed (Two-Objective Learning) model performance comparison via training accuracy.
Figure 9.
Baselines model and proposed (Two-Objective Learning) model performance comparison via training loss.
Table 1.
Literature on machine learning techniques to detect attire on different areas.
Reference | Publication Year | Purpose | Method | Accuracy | Precision |
---|
Agarwal et al. [1] | 2023 | Automatic system to save time via manual dress code reading | CNN + YOLOv4 | - | 81.82% |
Yousuf et al. [2] | 2019 | Address the difficulty of using clothes detection in the actual world. | Modified YOLO and SSD | - | 91.14% |
Ma et al. [3] | 2023 | Balance between accuracy and speed | YOLO-FL | 91.14% | - |
Table 2.
Dataset details.
Product | Samples | Product | Samples |
---|
Drees | 288 | Hat | 149 |
Long Sleeve | 576 | Out Wear | 246 |
Pants | 559 | Shirt | 345 |
Shoes | 297 | Shorts | 257 |
Skirt | 136 | T-Shirt | 928 |
Table 3.
Hyper parameter details for Base and proposed models.
Model | Image Size | Epochs | Loss Function | Optimizer | Activation | Learning Rate | Batch Size |
---|
ResNet152 (Base) | 224 × 224 | 10 | Categorical | Adam | Softmax | 0.0001 | 16 |
EfficientNetB7 (Base) | 224 × 224 | 10 | Categorical | Adam | Softmax | 0.0001 | 16 |
Two-Goal Learning Model | 224 × 224 | 10 | Categorical | Adam | Softmax | 0.0001 | 16 |
Table 4.
Classification performance of baselines and proposed model.
Models | Evaluations |
---|
Accuracy | Precision | Recall | F1-Score |
---|
EfficientNetB7 (Base) | 0.91 | 0.92 | 0.91 | 0.91 |
ResNet152 (Base) | 0.86 | 0.86 | 0.86 | 0.86 |
Two-Goal Learning Model | 0.94 | 0.94 | 0.94 | 0.94 |
Table 5.
Performance comparison of pretrained model with proposed Two-Objective Learning.
Algorithm | Testing Set |
---|
Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 91.3% | 0.92 | 0.91 | 0.91 |
MobileNet (Base) | 66.3% | 0.72 | 0.66 | 0.68 |
MobileNetV2 (Base) | 60.2% | 0.64 | 0.60 | 0.61 |
ResNet50 (Base) | 84.9% | 0.87 | 0.85 | 0.85 |
ResNet152 (Base) | 85.7% | 0.86 | 0.86 | 0.86 |
VGG16 (Base) | 79.3% | 0.81 | 0.79 | 0.80 |
VGG19 (Base) | 75.2% | 0.81 | 0.75 | 0.76 |
Proposed (Proposed Two-Objective Learning) Model | 94% | 0.94 | 0.94 | 0.94 |
Table 6.
Model’s performance using Adam optimizer for Study Various Learning Rate (0.1, 0.01), along with ten epochs (Batch Size = 64). Ac—accuracy, Pr—precision, Re—recall, Fs—F1-score.
Algorithm | Testing Set—LR (0.1) | Testing Set—LR (0.01) |
---|
Ac | Pr | Re | Fs | Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 87.9% | 0.88 | 0.88 | 0.88 | 82.2% | 0.86 | 0.82 | 0.82 |
MobileNet (Base) | 67.0% | 0.72 | 0.67 | 0.69 | 84.6% | 0.85 | 0.85 | 0.83 |
MobileNetV2 (Base) | 52.0% | 0.59 | 0.52 | 0.50 | 48.1% | 0.57 | 0.48 | 0.47 |
ResNet50 (Base) | 82.2% | 0.84 | 0.82 | 0.82 | 81.9% | 0.85 | 0.82 | 0.82 |
ResNet152 (Base) | 84.4% | 0.86 | 0.84 | 0.84 | 83% | 0.85 | 0.83 | 0.83 |
VGG16 (Base) | 79.8% | 0.84 | 0.80 | 0.80 | 59.6% | 0.61 | 0.60 | 0.57 |
VGG19 (Base) | 85.2% | 0.86 | 0.85 | 0.85 | 76.3% | 0.81 | 0.76 | 0.76 |
Table 7.
Model’s performance using Adam optimizer for Study Various Learning Rate-LR (0.001, 0.0001), along with ten epochs (Batch Size = 64). Ac—accuracy, Pr—precision, Re—recall, Fs—F1-score.
Algorithm | Testing Set—LR (0.001) | Testing Set—LR (0.0001) |
---|
Ac | Pr | Re | Fs | Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 83.6% | 0.88 | 0.84 | 0.84 | 86.2% | 0.90 | 0.86 | 0.87 |
MobileNet (Base) | 65.5% | 0.71 | 0.66 | 0.67 | 70.6% | 0.72 | 0.71 | 0.71 |
MobileNetV2 (Base) | 48.1% | 0.57 | 0.48 | 0.50 | 60.2% | 0.61 | 0.60 | 0.58 |
ResNet50 (Base) | 84.1% | 0.85 | 0.84 | 0.84 | 83% | 0.84 | 0.83 | 0.83 |
ResNet152 (Base) | 84.9% | 0.87 | 0.85 | 0.85 | 87% | 0.88 | 0.87 | 0.87 |
VGG16 (Base) | 81.1% | 0.85 | 0.81 | 0.82 | 79.5% | 0.81 | 0.80 | 0.80 |
VGG19 (Base) | 76.6% | 0.80 | 0.77 | 0.77 | 80.3% | 0.83 | 0.80 | 0.80 |
Table 8.
Model’s performance using Adam optimizer for Study Various Batch Size (8, 16), along with ten epochs (LR = 0.001). Ac—accuracy, Pr—precision, Re—recall, Fs—F1-score.
Algorithm | Testing Set—Batch Size (8) | Testing Set—Batch Size (16) |
---|
Ac | Pr | Re | Fs | Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 87.6% | 0.90 | 0.88 | 0.88 | 91.1% | 0.92 | 0.91 | 0.91 |
MobileNet (Base) | 69.6% | 0.71 | 0.70 | 0.70 | 66.3% | 0.72 | 0.66 | 0.68 |
MobileNetV2 (Base) | 57.2% | 0.61 | 0.57 | 0.56 | 60.2% | 0.64 | 0.60 | 0.61 |
ResNet50 (Base) | 86.2% | 0.87 | 0.86 | 0.86 | 84.9% | 0.87 | 0.85 | 0.85 |
ResNet152 (Base) | 80.9% | 0.84 | 0.81 | 0.80 | 85.7% | 0.86 | 0.86 | 0.86 |
VGG16 (Base) | 79.3% | 0.80 | 0.79 | 0.79 | 79.3% | 0.81 | 0.79 | 0.80 |
VGG19 (Base) | 74.7% | 0.78 | 0.75 | 0.75 | 75.2% | 0.81 | 0.75 | 0.76 |
Table 9.
Model’s performance using Adam optimizer for Study Various Batch Size (32, 64), along with ten epochs (LR = 0.0001). Ac—accuracy, Pr—precision, Re—recall, Fs—F1-score.
Algorithm | Testing Set—Batch Size (32) | Testing Set—Batch Size (64) |
---|
Ac | Pr | Re | Fs | Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 87.9% | 0.90 | 0.88 | 0.88 | 86.2% | 0.90 | 0.86 | 0.87 |
MobileNet (Base) | 62.9% | 0.67 | 0.63 | 0.63 | 70.6% | 0.72 | 0.71 | 0.71 |
MobileNetV2 (Base) | 56.9% | 0.62 | 0.57 | 0.57 | 60.2% | 0.61 | 0.60 | 0.58 |
ResNet50 (Base) | 83.8% | 0.86 | 0.84 | 0.84 | 83% | 0.84 | 0.83 | 0.83 |
ResNet152 (Base) | 83% | 0.83 | 0.83 | 0.82 | 87% | 0.88 | 0.87 | 0.87 |
VGG16 (Base) | 81.4% | 0.83 | 0.81 | 0.82 | 79.5% | 0.81 | 0.80 | 0.80 |
VGG19 (Base) | 76.6% | 0.82 | 0.77 | 0.77 | 80.3% | 0.83 | 0.80 | 0.80 |
Table 10.
Model’s performance using Adam optimizer for Study Optimal Hyper Parameter Batch Size (16), along with ten epochs (LR = 0.0001). Ac—accuracy, Pr—precision, Re—recall, Fs—F1-score.
Algorithm | Testing Set |
---|
Ac | Pr | Re | Fs |
---|
EfficientNetB7 (Base) | 91.1% | 0.92 | 0.91 | 0.91 |
MobileNet (Base) | 66.3% | 0.72 | 0.66 | 0.68 |
MobileNetV2 (Base) | 60.2% | 0.64 | 0.60 | 0.61 |
ResNet50 (Base) | 84.9% | 0.87 | 0.85 | 0.85 |
ResNet152 (Base) | 85.7% | 0.86 | 0.86 | 0.86 |
VGG16 (Base) | 79.3% | 0.81 | 0.79 | 0.80 |
VGG19 (Base) | 75.2% | 0.81 | 0.75 | 0.76 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).