Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural Networks
Abstract
:1. Introduction
- A novel direction-driven architecture of CNNs is introduced to provide an improvement in indoor scene recognition accuracy. Off-the-shelf pretrained CNNs have predefined architectures, with a fixed input size, which limits additional the information to be provided as an entry. We propose an image classification system guided by supplementary information. The magnetic heading direction of the smartphone assists in vision-based indoor scene recognition, helping the system to identify different specific indoor rooms, taking into account multiple viewpoints.
- A hybrid computing approach is proposed to address latency, scalability, and privacy challenges. In general, meeting the computational requirements of DL with the limited resources of handheld devices is not feasible. Several works have combined on-device computing with edge computing and/or cloud computing, resulting in hybrid architectures [20]. We take advantage of these new computing techniques to propose a global system computing strategy that meets users’ needs.
- While several indoor and/or outdoor localization datasets exist in the literature, none of them integrates information other than images. To overcome this issue, we provide a dataset containing images with their respective magnetic heading direction in the metadata.
2. Background
2.1. Convolutional Neural Networks with Transfer Learning
2.2. Lightweight Convolutional Neural Networks
2.3. Visual Place Recognition
2.4. Magnetic Field in Localization Applications
- The device: different sensors have varying precision, sensitivity, and stability. Various built-in sensors and algorithms utilized by smartphone manufacturers lead to different magnetic field measurements.
- The user’s surroundings: other electronic devices commonly found inside buildings cause interference and magnetic perturbation. The omnipresent magnetic field is disrupted by ferromagnetic materials used in buildings, affecting magnetic field measurements and causing inaccurate direction and position information.
2.5. Computing Strategies
- Bandwidth and scalability are main issues in cloud-based computation, that are exacerbated with an increasing number of connected mobile devices and increased data transfer volume. Likewise, as the number of connected devices increases, sending data from mobile devices to the cloud introduces scalability problems, as the cloud entry can become a bottleneck.
- Latency: Cloud-based computation may not always be a suitable solution when working on real-time applications, as data transfer to the cloud may suffer from extra network queuing and transmission delays, leading to high latency.
- Service availability: Due to wireless bandwidth limitations, network disconnection, and signal attenuation, a connection to the cloud might not always be possible. A sudden internet outage stops application functionalities, as cloud-assisted systems rely on the network to transfer data from users’ mobile devices to the cloud server and vice versa.
- Privacy: The data sent from end-user devices to the cloud may contain sensitive information, leading to privacy concerns. Data sharing and storage in the European Union and the European Economic Area must comply with the General Data Protection Regulation (GDPR), an EU regulation on data protection and privacy.
- Scalability: Cloud and edge servers offer high-performance computing capabilities, which enables the efficient execution of challenging tasks. The ability to scale resources dynamically based on the workload or demand ensures that the computational requirements of the tasks can be met effectively.
- Network bandwidth: Offloading computationally intensive tasks to servers minimizes the quantity of data that needs to be transferred across the network, which is important when bandwidth is limited. The overall network traffic can be reduced by transmitting only the necessary inputs and receiving only the processed results.
- Latency: Determining which tasks to offload to servers is critical to reduce latency. Offloading computationally complex operations that benefit from server-side processing can increase real-time performance and reduce the total response time. Lightweight tasks can be maintained on the mobile device for faster execution.
- Centralized maintenance and updates: When computationally intensive tasks are offloaded to servers, the server infrastructure carries the main responsibility of maintaining and updating the system. This decreases the complexity and effort necessary for maintenance and updating of each mobile device, simplifying overall system management.
- Energy: Hybrid computing architectures can help with energy efficiency. Energy consumption can be lowered by executing lightweight tasks or initial processing on-device. Edge computing lowers the requirement for long-distance data transmission, saving even more energy. Using cloud servers for resource-intensive tasks allows for more efficient server infrastructure use and potentially reduced power consumption.
- Privacy: When employing hybrid computing architectures, privacy is a crucial factor, especially when external servers are involved. To guarantee that privacy requirements are respected, task offloading policies should be carefully considered. Offloading only non-sensitive data to servers while retaining sensitive data on the mobile device can help to preserve user privacy.
3. Proposed Approach
3.1. Localization System Architecture
Algorithm 1: Inference classification methodology |
Input: Query image, Magnetic heading
Output: Prediction of the specific indoor room |
3.1.1. Selection Block
- Between north and east (i.e., ): select CNN A and CNN B;
- Between east and south (i.e., ): select CNN B and CNN C;
- Between south and west (i.e., ): select CNN C and CNN D;
- Between west and north (i.e., ): select CNN D and CNN A.
3.1.2. Image Classification Models
3.1.3. Fusion and Decision Block
4. Global System Architecture
4.1. Computing and Partitioning Deep Learning Tasks
4.2. Partitioning of the Proposed Model
5. Experiments and Results
5.1. Dataset Preparation
5.2. CNN Training and System Testing
5.3. Performance Evaluation
5.4. Stability Analysis: Effect of Sensor Accuracy on the System
5.5. Model Analysis and Partitioning for Inference
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
Symbol | Definition |
CNN | Convolutional neural network |
DL | Deep learning |
VPR | Visual place recognition |
Smartphone camera magnetic heading | |
p | Probability vector corresponding to the image inference output of a CNN |
Modified smartphone camera magnetic heading | |
Weighted parameter of the fusion method | |
Hyperparameter for piecewise linear weighted fusion | |
e | Magnetic heading error |
Mean | |
Standard deviation | |
Smartphone camera magnetic heading subject to error | |
ONNX | Open Neural Network Exchange |
References
- Asaad, S.M.; Maghdid, H.S. A Comprehensive Review of Indoor/Outdoor Localization Solutions in IoT era: Research Challenges and Future Perspectives. Comput. Netw. 2022, 212, 109041. [Google Scholar] [CrossRef]
- Bulusu, N.; Heidemann, J.; Estrin, D. GPS-less low-cost outdoor localization for very small devices. IEEE Pers. Commun. 2000, 7, 28–34. [Google Scholar] [CrossRef] [Green Version]
- Low, R.; Tekler, Z.D.; Cheah, L. An end-to-end point of interest (POI) conflation framework. ISPRS Int. J. Geo-Inf. 2021, 10, 779. [Google Scholar] [CrossRef]
- Liu, F.; Liu, J.; Yin, Y.; Wang, W.; Hu, D.; Chen, P.; Niu, Q. Survey on WiFi-based indoor positioning techniques. IET Commun. 2020, 14, 1372–1383. [Google Scholar] [CrossRef]
- Tekler, Z.D.; Low, R.; Gunay, B.; Andersen, R.K.; Blessing, L. A scalable Bluetooth Low Energy approach to identify occupancy patterns and profiles in office spaces. Build. Environ. 2020, 171, 106681. [Google Scholar] [CrossRef]
- Cheng, S.; Wang, S.; Guan, W.; Xu, H.; Li, P. 3DLRA: An RFID 3D indoor localization method based on deep learning. Sensors 2020, 20, 2731. [Google Scholar] [CrossRef] [PubMed]
- Tekler, Z.D.; Chong, A. Occupancy prediction using deep learning approaches across multiple space types: A minimum sensing strategy. Build. Environ. 2022, 226, 109689. [Google Scholar] [CrossRef]
- Vogel, J.; Schiele, B. Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 2007, 72, 133–157. [Google Scholar] [CrossRef]
- Liu, S.; Tian, G. An indoor scene classification method for service robot Based on CNN feature. J. Robot. 2019, 2019. [Google Scholar] [CrossRef] [Green Version]
- Sreenu, G.; Durai, M.S. Intelligent video surveillance: A review through deep learning techniques for crowd analysis. J. Big Data 2019, 6, 48. [Google Scholar] [CrossRef]
- Ma, W.; Xiong, H.; Dai, X.; Zheng, X.; Zhou, Y. An indoor scene recognition-based 3D registration mechanism for real-time AR-GIS visualization in mobile applications. ISPRS Int. J. Geo-Inf. 2018, 7, 112. [Google Scholar] [CrossRef] [Green Version]
- Morar, A.; Moldoveanu, A.; Mocanu, I.; Moldoveanu, F.; Radoi, I.E.; Asavei, V.; Gradinaru, A.; Butean, A. A comprehensive survey of indoor localization methods based on computer vision. Sensors 2020, 20, 2641. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12, Lake Tahoe, CA, USA, 3–8 December 2012; Volume 1, pp. 1097–1105. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tang, P.; Wang, H.; Kwong, S. G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing 2017, 225, 188–197. [Google Scholar] [CrossRef]
- Zeng, D.; Liao, M.; Tavakolian, M.; Guo, Y.; Zhou, B.; Hu, D.; Pietikäinen, M.; Liu, L. Deep Learning for Scene Classification: A Survey. arXiv 2021, arXiv:2101.10531. [Google Scholar]
- AlShamaa, D.; Chehade, F.; Honeine, P.; Chkeir, A. An Evidential Framework for Localization of Sensors in Indoor Environments. Sensors 2020, 20, 318. [Google Scholar] [CrossRef] [Green Version]
- AlShamaa, D.; Chehade, F.; Honeine, P. Tracking of Mobile Sensors Using Belief Functions in Indoor Wireless Networks. IEEE Sens. J. 2018, 18, 310–319. [Google Scholar] [CrossRef]
- Guo, W.; Wu, R.; Chen, Y.; Zhu, X. Deep learning scene recognition method based on localization enhancement. Sensors 2018, 18, 3376. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Zheng, L.; Yang, Y.; Tian, Q. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1224–1244. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Hussain, M.; Bird, J.J.; Faria, D.R. A study on cnn transfer learning for image classification. In Advances in Computational Intelligence Systems: Contributions Presented at the 18th UK Workshop on Computational Intelligence, Nottingham, UK, 5–7 September 2018; Springer: Cham, Switzerland, 2019; pp. 191–202. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–25 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Konlambigue, S.; Pothin, J.B.; Honeine, P.; Bensrhair, A. Performance Evaluation of State-of-the-art Filtering Criteria Applied to SIFT Features. In Proceedings of the 19th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Ajman, United Arab Emirates, 10–12 December 2019. [Google Scholar]
- Garg, S.; Fischer, T.; Milford, M. Where Is Your Place, Visual Place Recognition? In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21). International Joint Conferences on Artificial Intelligence, Montreal, QC, Canada, 21 August 2021; pp. 4416–4425. [Google Scholar]
- Quattoni, A.; Torralba, A. Recognizing indoor scenes. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 413–420. [Google Scholar]
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178. [Google Scholar]
- Xiao, J.; Hays, J.; Ehinger, K.A.; Oliva, A.; Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492. [Google Scholar]
- Patterson, G.; Hays, J. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2751–2758. [Google Scholar]
- Schubert, S.; Neubert, P. What makes visual place recognition easy or hard? arXiv 2021, arXiv:2106.12671. [Google Scholar]
- Ashraf, I.; Hur, S.; Park, Y. Smartphone sensor based indoor positioning: Current status, opportunities, and future challenges. Electronics 2020, 9, 891. [Google Scholar] [CrossRef]
- Tiglao, N.M.; Alipio, M.; Cruz, R.D.; Bokhari, F.; Rauf, S.; Khan, S.A. Smartphone-based indoor localization techniques: State-of-the-art and classification. Measurement 2021, 179, 109349. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, L.; Liu, Q.; Yin, Y.; Cheng, L.; Zimmermann, R. Fusion of magnetic and visual sensors for indoor localization: Infrastructure-free and more effective. IEEE Trans. Multimed. 2016, 19, 874–888. [Google Scholar] [CrossRef]
- Ashraf, I.; Hur, S.; Park, Y. Application of deep convolutional neural networks and smartphone sensors for indoor localization. Appl. Sci. 2019, 9, 2337. [Google Scholar] [CrossRef] [Green Version]
- Reyes Leiva, K.M.; Jaén-Vargas, M.; Codina, B.; Serrano Olmedo, J.J. Inertial measurement unit sensors in assistive technologies for visually impaired people, a review. Sensors 2021, 21, 4767. [Google Scholar] [CrossRef]
- Chien, C. The Hall Effect and ITS applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Liu, D. Mobile Data and Computation Offloading in Mobile Cloud Computing. Ph.D. Thesis, Université de Technologie de Troyes, Troyes, France, Université de Montréal, Montréal, QC, Canada, 2019. [Google Scholar]
- Wang, X.; Han, Y.; Leung, V.C.; Niyato, D.; Yan, X.; Chen, X. Convergence of edge computing and deep learning: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 869–904. [Google Scholar] [CrossRef] [Green Version]
- Murshed, M.S.; Murphy, C.; Hou, D.; Khan, N.; Ananthanarayanan, G.; Hussain, F. Machine learning at the network edge: A survey. ACM Comput. Surv. 2021, 54, 170. [Google Scholar] [CrossRef]
- Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 2017, 45, 615–629. [Google Scholar] [CrossRef] [Green Version]
- Xia, C.; Zhao, J.; Cui, H.; Feng, X.; Xue, J. DNNTune: Automatic benchmarking DNN models for mobile-cloud computing. ACM Trans. Archit. Code Optim. 2019, 16, 49. [Google Scholar] [CrossRef] [Green Version]
- Hölzl, M.; Neumeier, R.; Ostermayer, G. Analysis of compass sensor accuracy on several mobile devices in an industrial environment. In Proceedings of the International Conference on Computer Aided Systems Theory, Las Palmas de Gran Canaria, Spain, 10–15 February 2013; pp. 381–389. [Google Scholar]
Pretrained Model | Baseline | Proposed Approach | |||
---|---|---|---|---|---|
Linear Fusion | Cosinusoidal Fusion | ||||
SqueezeNet | 67.52 ± 1.95 | 81.02 ± 2.75 | 77.50 ± 3.30 | 77.02 ± 3.30 | 79.52 ± 2.62 |
ShuffleNet | 88.98 ± 2.03 | 92.22 ± 0.41 | 91.40 ± 0.54 | 91.34 ± 0.44 | 91.94 ± 0.69 |
MobileNet-v2 | 90.66 ± 1.80 | 93.10 ± 0.56 | 92.62 ± 1.03 | 92.50 ± 0.82 | 92.44 ± 0.82 |
Pre-Trained Model | Baseline | Different Selection Rules | ||
---|---|---|---|---|
Opposite Quadrant Selection | One Random CNN | One Specific CNN | ||
with Linear Fusion () | ||||
SqueezeNet | 67.52 | 32.96 | 52.96 | 51.91 |
ShuffleNet | 88.98 | 44.86 | 63.38 | 63.45 |
MobileNet-v2 | 90.66 | 46.30 | 63.92 | 64.21 |
Pre-Trained Model | Baseline | Proposed Approach with Linear Fusion () | ||||
---|---|---|---|---|---|---|
SqueezeNet | 67.52 | 81.02 | 80.18 | 79.26 | 77.84 | 75.78 |
ShuffleNet | 88.98 | 92.22 | 91.46 | 91.24 | 89.12 | 88.10 |
MobileNet-v2 | 90.66 | 93.10 | 92.86 | 92.60 | 91.30 | 89.60 |
Framework | Model | Model File Size |
---|---|---|
ONNX | Four complete CNNs | 35 MB |
Proposed computing strategy | 19.07 MB |
Submodels | Sub-Model File Size (MB) | Output Data Size (MB) | Computing Strategy |
---|---|---|---|
Frozen Layers | 5.31 | On-device (user side) | |
A (Trained Layers) | Cloud or edge server | ||
B (Trained Layers) | |||
C (Trained Layers) | |||
D (Trained Layers) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Daou, A.; Pothin, J.-B.; Honeine, P.; Bensrhair, A. Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural Networks. Sensors 2023, 23, 5672. https://doi.org/10.3390/s23125672
Daou A, Pothin J-B, Honeine P, Bensrhair A. Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural Networks. Sensors. 2023; 23(12):5672. https://doi.org/10.3390/s23125672
Chicago/Turabian StyleDaou, Andrea, Jean-Baptiste Pothin, Paul Honeine, and Abdelaziz Bensrhair. 2023. "Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural Networks" Sensors 23, no. 12: 5672. https://doi.org/10.3390/s23125672
APA StyleDaou, A., Pothin, J. -B., Honeine, P., & Bensrhair, A. (2023). Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural Networks. Sensors, 23(12), 5672. https://doi.org/10.3390/s23125672