Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism
Abstract
:1. Introduction
2. Related Work
2.1. Visual Question Answering (VQA)
2.2. Question Answering in 3D Scene
2.3. Aspect-Based Sentiment Analysis Model (ABSA)
3. Method
3.1. 3D Question Answering
3.2. Text Sentiment Analysis Model
Regularization and Loss Function
3.3. Proposed 3D Visual Question-Answering Model
3.4. Procedure of Interior Design Evaluation Model
4. Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wei, Z.; Zhang, J.; Shen, X.; Lin, Z.; Mech, R.; Hoai, M.; Samaras, D. Good view hunting: Learning photo composition from dense view pairs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5437–5446. [Google Scholar]
- Xie, D.; Hu, P.; Sun, X.; Pirk, S.; Zhang, J.; Mech, R.; Kaufman, A.E. Gait: Generating aesthetic indoor tours with deep reinforcement learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 7409–7419. [Google Scholar]
- Shao, Z.; Yu, Z.; Wang, M.; Yu, J. Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14974–14983. [Google Scholar]
- Azuma, D.; Miyanishi, T.; Kurita, S.; Kawanabe, M. Scanqa: 3D question answering for spatial scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19129–19139. [Google Scholar]
- Parelli, M.; Delitzas, A.; Hars, N.; Vlassis, G.; Anagnostidis, S.; Bachmann, G.; Hofmann, T. Clip-guided vision-language pre-training for question answering in 3D scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5606–5611. [Google Scholar]
- Dwedari, M.M.; Niessner, M.; Chen, Z. Generating Context-Aware Natural Answers for Questions in 3D Scenes Supplementary Material. arXiv 2013, arXiv:2310.19516. [Google Scholar]
- Zhu, L.; Xu, M.; Bao, Y.; Xu, Y.; Kong, X. Deep learning for aspect-based sentiment analysis: A review. PeerJ Comput. Sci. 2022, 8, e1044. [Google Scholar] [CrossRef] [PubMed]
- Cambria, E.; Das, D.; Bandyopadhyay, S.; Feraco, A. A Practical Guide to Sentiment Analysis; Springer: Cham, Switzerland, 2017; Volume 5. [Google Scholar]
- Malinowski, M.; Fritz, M. A multi-world approach to question answering about real-world scenes based on uncertain input. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
- Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; Zhang, L. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6077–6086. [Google Scholar]
- Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra, D.; Zitnick, C.L.; Parikh, D. Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 2425–2433. [Google Scholar]
- Zhou, Y.; Yu, J.; Cui, Y.; Tao, D.; Tian, Q. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6281–6290. [Google Scholar]
- Jain, U.; Zhang, Z.; Schwing, A.G. Creativity: Generating diverse questions using variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5415–5424. [Google Scholar]
- Li, Y.; Duan, N.; Zhou, B.; Chu, X.R.; Ouyang, W.; Wang, X. Visual question generation as dual task of visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6116–6124. [Google Scholar]
- Jang, Y.; Song, Y.; Kim, C.D.; Yu, Y.; Kim, Y.; Kim, G. Video question answering with spatio-temporal reasoning. Int. J. Comput. Vis. 2019, 127, 1385–1412. [Google Scholar] [CrossRef]
- Lei, J.; Li, L.; Zhou, L.; Gan, Z.; Berg, T.L.; Bansal, M.; Liu, J. Less is more: Clipbert for video-and-language learning via sparse sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7331–7341. [Google Scholar]
- Chou, S.-H.; Chao, W.-L.; Lai, W.-S.; Sun, M.; Yang, M.-H. Visual question answering on 360° images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Online, 2–5 March 2020; pp. 1607–1616. [Google Scholar]
- Bao, H.; Wang, W.; Dong, L.; Liu, Q.; Mohammed, O.K.; Aggarwal, K.; Som, S.; Piao, S.; Wei, F. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Process. Syst. 2022, 35, 32897–32912. [Google Scholar]
- Herrera, D.A.B. Towards an Image-Term Co-Occurrence Model for Multilingual Terminology Alignment and Cross-Language Image Indexing. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2014. [Google Scholar]
- Zitkovich, B.; Yu, T.; Xu, S.; Xu, P.; Xiao, T.; Xia, F.; Wu, J.; Wohlhart, P.; Welker, S.; Wahid, A.; et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Proceedings of the Conference on Robot Learning PMLR, Zhongshan, China, 9–11 December 2023; pp. 2165–2183. [Google Scholar]
- Lu, J.; Batra, D.; Parikh, D.; Lee, S. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 13–23. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5105–5114. [Google Scholar]
- Cao, A.-Q.; De Charette, R. Monoscene: Monocular 3D semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3991–4001. [Google Scholar]
- Chauhan, G.S.; Nahta, R.; Meena, Y.K.; Gopalani, D. Aspect-based sentiment analysis using deep learning approaches: A survey. Comput. Sci. Rev. 2023, 49, 100576. [Google Scholar] [CrossRef]
- Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
- Tang, D.; Qin, B.; Liu, T. Aspect-level sentiment classification with deep memory network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 214–224. [Google Scholar]
- Zeng, J.; Liu, T.; Jia, W.; Zhou, J. Relation construction for aspect-level sentiment classification. Inf. Sci. 2022, 586, 209–223. [Google Scholar] [CrossRef]
- Huang, B.; Ou, Y.; Carley, K.M. Aspect-level sentiment classification with attention-over-attention neural networks. In Social, Cultural, and Behavioral Modeling: 11th International Conference, SBP-BRiMS 2018, Washington, DC, USA, 10–13 July 2018; Springer: Cham, Switzerland, 2018; pp. 197–206. [Google Scholar]
- Tan, X.; Cai, Y.; Zhu, C. Recognizing conflict opinions in aspect-level sentiment classification with dual attention networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3426–3431. [Google Scholar]
- Vu, T.; Kim, K.; Luu, T.M.; Nguyen, T.; Yoo, C.D. Softgroup for 3D instance segmentation on point clouds. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 2698–2707. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual graph convolutional networks for aspect-based sentiment analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 6319–6329. [Google Scholar]
- Ma, L.; Zhang, P.; Luo, D.; Zhu, X.; Zhou, M.; Liang, Q.; Wang, B. Syntax-based graph matching for knowledge base question answering. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 8227–8231. [Google Scholar]
- Chen, D.Z.; Chang, A.X.; Nießner, M. Scanrefer: 3d object localization in rgb-d scans using natural language. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 202–221. [Google Scholar]
- Zhang, Q.; Zheng, S. Interior Design Materials Collection; China Architecture & Building Press: Beijing, China, 1991. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [Google Scholar]
Model | BLEU-1 | BLEU-4 | BLEU-4 | BLEU-4 |
---|---|---|---|---|
Gen3DQA+DGCN (single object) | 36.22 | 10.31 | 34.15 | 63.82 |
Gen3DQA+DGCN (multiple object) | 36.71 | 10.17 | 33.29 | 63.17 |
Gen3DQA+DGCN (w/o object localization) | 35.83 | 9.89 | 31.85 | 60.85 |
Model | BLEU-1 | BLEU-4 | Rouge | Meteor |
---|---|---|---|---|
Gen3DQA+DGCN | 39.72 | 12.41 | 36.32 | 72.41 |
Baseline1+DGCN | 34.14 | 9.37 | 31.36 | 66.37 |
Baseline2+DGCN | 29.78 | 8.47 | 28.46 | 61.48 |
Baseline3+DGCN | 32.58 | 8.92 | 29.33 | 66.92 |
VoteNet+MCAN | 29.46 | 6.08 | 30.97 | 12.07 |
ScanRefer+MCAN | 27.85 | 7.46 | 30.68 | 11.97 |
Gen3DQA+ATAE-LSTM | 38.72 | 10.46 | 33.74 | 71.39 |
Gen3DQA+DGEDT | 39.43 | 11.18 | 33.98 | 70.38 |
Gen3DQA+InterGCN | 39.15 | 11.84 | 35.17 | 70.93 |
Model | Accuracy |
---|---|
Gen3DQA+DGCN | 91.53 |
Baseline1+DGCN | 74.82 |
Baseline2+DGCN | 69.06 |
Baseline3+DGCN | 71.92 |
VoteNet+MCAN | 70.38 |
ScanRefer+MCAN | 72.49 |
Gen3DQA+ATAE-LSTM | 76.39 |
Gen3DQA+DGEDT | 84.29 |
Gen3DQA+InterGCN | 81.53 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fan, Y.; Zhou, Y.; Yuan, Z. Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism. Mathematics 2024, 12, 1560. https://doi.org/10.3390/math12101560
Fan Y, Zhou Y, Yuan Z. Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism. Mathematics. 2024; 12(10):1560. https://doi.org/10.3390/math12101560
Chicago/Turabian StyleFan, Yiyan, Yang Zhou, and Zheng Yuan. 2024. "Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism" Mathematics 12, no. 10: 1560. https://doi.org/10.3390/math12101560
APA StyleFan, Y., Zhou, Y., & Yuan, Z. (2024). Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism. Mathematics, 12(10), 1560. https://doi.org/10.3390/math12101560