Multi-Scale Feature Fusion for Interior Style Detection
Abstract
:1. Introduction
- (1)
- We propose a new multi-scale feature fusion method for interior style detection using BoVW, color information, SPM, and object detection. The proposed method outperforms the conventional BoVW methods and residual network (ResNet) in terms of accuracy.
- (2)
- Our method confirms that using texture and color information can better detect interior style. Specifically, CIELAB is more effective as color information than red green blue (RGB).
2. Related Works
3. Proposed Method
- Step-1:
- Preprocessing Resize each of training images = to 256 × 256 to create the training images = . Note that there are pieces in each of the styles.
- Step-2:
- Extract Local Features Extract 64-dimensional SURF = for each .
- Step-3:
- Create VWs Cluster the local features of all training images to create VWs = of the training images .
- Step-4:
- Create histograms Create histogram = for each training image based on from the local features . Then, normalize the histogram and calculate the image features = for each training image.
- Step-5:
- Extraction of representative colors Cluster the color information of each training image by the number of clusters and extract representative color vector = . Then, the representative color vector is normalized by the maximum value in the color space, and the representative color vector is calculated.
- Step-6:
- Create color histogram Color histogram = based on the representative color vector is created from color information in each training image . Then, normalize to calculate = .
- Step-7:
- Create low-level regions objects are detected from each training image using YOLO to create region = .
- Step-8:
- Create features for low-level regions Create histogram , color vector , and color histogram for as in Steps 2–6.
- Step-9:
- Create image features for low-level regions Histogram , representative color vector , and color histogram multiplied by the weight in each low-level region are concatenated to create image features = .
- Step-10:
- Create image features for scene images Create image features = for each training image based on the image features of the low-level regions.
4. Evaluation Experiment
4.1. Experimental Settings
4.2. Comparison Method
4.3. Parameter Settings
4.4. Performance Evaluation
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Godi, M.; Joppi, C.; Giachetti, A.; Pellacini, F.; Cristani, M. Texel-Att: Representing and Classifying Element-based Textures by Attributes. arXiv 2019, arXiv:1908.11127. [Google Scholar]
- Zhu, J.; Guo, Y.; Ma, H. A data-driven approach for furniture and indoor scene colorization. IEEE Trans. Vis. Comput. Graph. 2017, 24, 2473–2486. [Google Scholar] [CrossRef] [PubMed]
- Tautkute, I.; Możejko, A.; Stokowiec, W.; Trzciński, T.; Brocki, Ł.; Marasek, K. What looks good with my sofa: Multimodal search engine for interior design. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS); Prague, Czech Republic, 3–6 September 2017; pp. 1275–1282. [Google Scholar] [CrossRef]
- Achlioptas, P.; Fan, J.; Hawkins, R.; Goodman, N.; Guibas, L.J. ShapeGlot: Learning language for shape differentiation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8938–8947. [Google Scholar] [CrossRef]
- Tautkute, I.; Trzcinski, T.; Skorupa, A.P.; Brocki, L.; Marasek, K. Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access 2019, 7, 84613–84628. [Google Scholar] [CrossRef]
- Polania, L.F.; Flores, M.; Nokleby, M.; Li, Y. Learning Furniture Compatibility with Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 366–367. [Google Scholar] [CrossRef]
- Bermeitinger, B.; Freitas, A.; Donig, S.; Handschuh, S. Object classification in images of Neoclassical furniture using Deep Learning. In Proceedings of the International Workshop on Computational History and Data-Driven Humanities, Dublin, Ireland, 25 May 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 109–112. [Google Scholar] [CrossRef]
- Aggarwal, D.; Valiyev, E.; Sener, F.; Yao, A. Learning style compatibility for furniture. In Proceedings of the German Conference on Pattern Recognition, Stuttgart, Germany, 9–12 October 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 552–566. [Google Scholar] [CrossRef]
- Weiss, T.; Yildiz, I.; Agarwal, N.; Ataer-Cansizoglu, E.; Choi, J.W. Image-Driven Furniture Style for Interactive 3D Scene Modeling. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2020; Volume 39, pp. 57–68. [Google Scholar] [CrossRef]
- Filtenborg, M.; Gavves, E.; Gupta, D. Siamese Tracking with Lingual Object Constraints. arXiv 2020, arXiv:2011.11721. [Google Scholar] [CrossRef]
- Kurian, J.; Karunakaran, V. A survey on image classification methods. Int. J. Adv. Res. Electron. Commun. Eng. 2012, 1, 69–72. [Google Scholar] [CrossRef]
- de Lima, G.V.; Saito, P.T.; Lopes, F.M.; Bugatti, P.H. Classification of texture based on bag-of-visual-words through complex networks. Expert Syst. Appl. 2019, 133, 215–224. [Google Scholar] [CrossRef]
- Santani, D.; Hu, R.; Gatica-Perez, D. InnerView: Learning place ambiance from social media images. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 451–455. [Google Scholar]
- Chen, J.; Lu, W.; Xue, F. “Looking beneath the surface”: A visual-physical feature hybrid approach for unattended gauging of construction waste composition. J. Environ. Manag. 2021, 286, 112233. [Google Scholar] [CrossRef] [PubMed]
- Zheng, J.; Tian, Y.; Yuan, C.; Yin, K.; Zhang, F.; Chen, F.; Chen, Q. MDESNet: Multitask Difference-Enhanced Siamese Network for Building Change Detection in High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3775. [Google Scholar] [CrossRef]
- Rawat, R.; Mahor, V.; Chirgaiya, S.; Shaw, R.N.; Ghosh, A. Analysis of darknet traffic for criminal activities detection using TF-IDF and light gradient boosted machine learning algorithm. In Innovations in Electrical and Electronic Engineering; Springer: Berlin/Heidelberg, Germany, 2021; pp. 671–681. [Google Scholar] [CrossRef]
- Kamyab, M.; Liu, G.; Adjeisah, M. Attention-Based CNN and Bi-LSTM Model Based on TF-IDF and GloVe Word Embedding for Sentiment Analysis. Appl. Sci. 2021, 11, 11255. [Google Scholar] [CrossRef]
- Shrinivasa, S.; Prabhakar, C. Scene image classification based on visual words concatenation of local and global features. Multimed. Tools Appl. 2022, 81, 1237–1256. [Google Scholar] [CrossRef]
- Sun, H.; Zhang, X.; Han, X.; Jin, X.; Zhao, Z. Commodity image classification based on improved bag-of-visual-words model. Complexity 2021, 2021, 5556899. [Google Scholar] [CrossRef]
- Xie, L.; Lee, F.; Liu, L.; Yin, Z.; Yan, Y.; Wang, W.; Zhao, J.; Chen, Q. Improved spatial pyramid matching for scene recognition. Pattern Recognit. 2018, 82, 118–129. [Google Scholar] [CrossRef]
- Bansal, M.; Kumar, M.; Kumar, M. 2D object recognition: A comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed. Tools Appl. 2021, 80, 18839–18857. [Google Scholar] [CrossRef]
- Huang, K. Image Classification Using Bag-of-Visual-Words Model. Master’s Thesis, Technological University Dublin, Dublin, Ireland, 2018. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.K. Stochastic Detection of Interior Design Styles Using a Deep-Learning Model for Reference Images. Appl. Sci. 2020, 10, 7299. [Google Scholar] [CrossRef]
- Bell, S.; Bala, K. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. (TOG) 2015, 34, 1–10. [Google Scholar] [CrossRef]
- Yaguchi, A.; Ono, K.; Makihara, E.; Taisho, A.; Nakayama, T. Space Ambiance Extraction using Bag of Visual Words with Color Feature. In Proceedings of the 48th Japan Society of Kansei Engineering, Tokyo, Japan, 2–4 September 2021. (In Japanese). [Google Scholar]
- Wengert, C.; Douze, M.; Jegou, H. Bag-of-colors for Improved Image Search. In Proceedings of the 19th ACM International Conference on Multimedia, Scottsdale, AZ, USA, 28 November–1 December 2011; pp. 1437–1440. [Google Scholar] [CrossRef]
- Lazebnik, S.; Schmid, C.; Ponce, J. Spatial pyramid matching. In Object Categorization: Computer and Human Vision Perspectives; Dickinson, S.J., Leonardis, A., Schiele, B., Tarr, M.J., Eds.; Cambridge University Press: Cambridge, UK, 2009; pp. 401–415. [Google Scholar]
- Alqasrawi, Y.; Neagu, D.; Cowling, P.I. Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification. Signal Image Video Process. 2013, 7, 759–775. [Google Scholar] [CrossRef]
- Vyas, K.; Vora, Y.; Vastani, R. Using Bag of Visual Words and Spatial Pyramid Matching for Object Classification Along with Applications for RIS. Procedia Comput. Sci. 2016, 89, 457–464. [Google Scholar] [CrossRef] [Green Version]
Methods | Abbreviations | Algorithm | ||
---|---|---|---|---|
− | + | |||
Conventional | - | Steps 5–10 | ||
Histogram + Color | HC | Steps 5–10 | Step 2 → |
Extract 64-dimensional SURF features = for . Then, extract the color feature = corresponding to the pixels of each SURF feature from the color information of each training image . Finally, the SURF feature is concatenated with color feature to create a 65- dimensional local feature = . |
Histogram + Color Vector | H+CV | Steps 6–10 | Step 10 → | Create image features |
Histogram + Histogram | H+H | Steps 7–10 | Step 10 → | Create image features |
Histogram + Color Histogram | H+CH | Steps 7–10 | Step 10 → | Create image features |
SPM + Color Histogram | S+CH | Step 7 | Step 7 → |
Based on the number of hierarchical levels , the total number of divisions = of each training image is calculated, and the image is divided into 256/ × 256/ to create a subdi -vided image = . |
Methods | H+CV | H+H | H+CH | S+CH | Proposed |
---|---|---|---|---|---|
The optimal number | 45 | 50 | 25 | 50 | 20 |
Label | Japanese | Modern | Rustic | Scandinavian | Traditional |
---|---|---|---|---|---|
Methods | |||||
ResNet | 0.567 | 0.686 | 0.747 | 0.549 | 0.861 |
Conventional | 0.721 | 0.358 | 0.725 | 0.591 | 0.464 |
HC | 0.774 | 0.523 | 0.751 | 0.713 | 0.633 |
H+CV | 0.830 | 0.537 | 0.714 | 0.704 | 0.552 |
H+H | 0.726 | 0.391 | 0.721 | 0.676 | 0.568 |
H+CH | 0.846 | 0.592 | 0.755 | 0.707 | 0.569 |
S+CH | 0.847 | 0.603 | 0.799 | 0.758 | 0.599 |
Proposed | 0.909 | 0.623 | 0.789 | 0.784 | 0.553 |
Styles | All | Japanese | Modern | Rustic | Scandinavian | Traditional |
---|---|---|---|---|---|---|
p-values | 0.006 | 0.286 | 0.072 | 0.938 |
Modern | Rustic | Traditional | ||||
---|---|---|---|---|---|---|
Estimated Label | FPproportion | FNratio | FPproportion | FNratio | FPproportion | FNratio |
Label | ||||||
Japanese | 0.014 | 0.028 | 0.013 | 0.039 | 0.007 | 0.000 |
Modern | - | - | 0.039 | 0.045 | ||
Rustic | 0.048 | 0.041 | - | - | 0.040 | |
Scandinavian | 0.110 | 0.006 | 0.006 | 0.040 | 0.033 | |
Traditional | 0.090 | 0.039 | - | - |
Modern | Rustic | Traditional | ||||
---|---|---|---|---|---|---|
Estimated Label | FPproportion | FNratio | FPproportion | FNratio | FPproportion | FNratio |
Label | ||||||
Japanese | 0.020 | 0.020 | 0.045 | 0.026 | 0.029 | 0.022 |
Modern | - | - | 0.006 | 0.065 | 0.223 | |
Rustic | 0.067 | 0.067 | - | - | 0.101 | 0.187 |
Scandinavian | 0.107 | 0.148 | 0.000 | 0.013 | 0.072 | 0.101 |
Traditional | - | - |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yaguchi, A.; Ono, K.; Makihara, E.; Ikushima, N.; Nakayama, T. Multi-Scale Feature Fusion for Interior Style Detection. Appl. Sci. 2022, 12, 9761. https://doi.org/10.3390/app12199761
Yaguchi A, Ono K, Makihara E, Ikushima N, Nakayama T. Multi-Scale Feature Fusion for Interior Style Detection. Applied Sciences. 2022; 12(19):9761. https://doi.org/10.3390/app12199761
Chicago/Turabian StyleYaguchi, Akitaka, Keiko Ono, Erina Makihara, Naoya Ikushima, and Tomomi Nakayama. 2022. "Multi-Scale Feature Fusion for Interior Style Detection" Applied Sciences 12, no. 19: 9761. https://doi.org/10.3390/app12199761
APA StyleYaguchi, A., Ono, K., Makihara, E., Ikushima, N., & Nakayama, T. (2022). Multi-Scale Feature Fusion for Interior Style Detection. Applied Sciences, 12(19), 9761. https://doi.org/10.3390/app12199761