1. Introduction
One of the key technologies utilized in applications for location-based services such as augmented reality (AR), the Internet of Things (IoT), artificial intelligence (AI), robotics navigation, and consumer analytics, is positioning [
1,
2]. Currently, outdoor GNSS-based smartphone locating services are capable of centimeter-level precision. Using the current indoor positioning systems, it is still challenging to attain low-cost, reliable, and strong indoor positioning effects because GNSS signals are not available in an indoor environment [
3]. A key indoor positioning technique in this field is the vision-based indoor positioning of smartphones, which uses real-time ornamental texture data in a space and does not require extra consumption of resources to transform the indoor environment. Visual localization has recently attracted a lot of attention in the realm of indoor navigation [
4]. To address the issue of image-based localization, the majority of cutting-edge techniques rely on regional features like SIFT or SURF [
5]. These techniques typically involve two steps: perspective-n-point, which determines the extrinsic characteristics, and descriptor match, which establishes 2D–3D matches between characteristics derived from the positioning image and 3D points.
With the advancement and development of the architecture of intelligent systems, indoor navigation systems are becoming more important [
6]. In any modern society, to help people reach their desired destination or achieve their required goals hassle-free and in a timely manner, there is a need for a navigational system. Currently, to help people navigate through large buildings or complex infrastructure, multiple systems have been developed based on ultrasonic sensor positioning, WiFi, RFID localization, etc. [
7]. Among the existing systems, most of them target academic buildings, while some of them are designed for specific buildings such as shopping malls, grocery shops, etc., using indoor mapping techniques. The main drawback of these systems is that they are not generalized and cannot be used for varying situations occurring in different buildings [
8]. To cope with the current indoor navigational problems, researchers are now focusing on image processing-based intelligent solutions that can automatically track a person and guide him/her through specific paths to their targeted areas [
9,
10]. There are numerous research works on indoor navigation, but the majority of them focus on a single area or feature and incorporate all of the necessary features for ease of access.
In this paper, we present a novel idea for an intelligent multi-floor navigational system that can be implemented in a variety of scenarios depending on the environment and can be changed based on location. In this study, we mainly focused on the deployment of multiple robots on each floor of a building that communicate with each other and with the person who needs navigational assistance. Once a person requests navigation from a single robot, each one gets updated and gives automatic directions after recognizing the person using facial recognition algorithms. The main features we used in the proposed system are: first, speech to get the required input from a person who wants to navigate in the environment; and then converting it into text in order to make it readable for the algorithm. The second feature is facial recognition, which ensures the automatic recognition of a person without giving any further commands once registered. The third feature is using voice broadcasting to communicate with the person according to their navigational requirements. All of the mentioned features are merged into a single mobile Android application that can be attached in the future to a mobile or stationary robot for more interactive communication and navigation. Before finalizing the required application, we started by taking a survey of users for the requirement of multi-story building navigation. The first step is to find out the percentage of people who face problems or need assistance in a multistory building, as shown in
Figure 1. From
Figure 1, it is clear that more than 85% of people faced problems; 37 people were selected for the random response of the questionnaire survey.
Furthermore, according to the responses of 37 people from different groups, in multistory buildings, they mostly face difficulty finding a specific location, and sometimes they need assistance for entry and exit points, as shown in
Figure 2 and
Figure 3.
The main and most important drawback of the earlier indoor navigation systems is that they were developed only for a particular location or building. When it comes to different locations and buildings, these systems are not applicable. In order to overcome this issue, we developed an indoor navigational system that is generalized and can be implemented in different environments or can be changed according to the environment. The main contributions of this study are given as follows.
We proposed a model of multi-story library navigation based on multiple robotic models using Android phones and tablets as communication platforms, where once a user enters his or her required book area or library, he or she is guided automatically on each floor of the library;
The proposed system consists of three basic and important modules (image recognition, speech recognition, and voice broadcasting). The developed system uses image processing techniques to recognize a person, speech recognition to get the user’s requirements, and voice broadcasting to help navigate on each floor of the building. In the proposed system model, a user gets registered using an Android app, where basic personal data are collected from him or her with real-time images. The real-time images are then used for face recognition to authenticate him or her on each floor of the building. After getting registered, the user is directed to the robots for the required book, and until the task is completed, the user is given directions on each floor. The proposed system model is generalized and can be implemented in any indoor environment; and
The results were attained by conducting a survey of the application users at the university library. For this purpose, we added 37 students’ data to the application and then recognized them separately to find out the level of accuracy of the image recognition algorithm. For speech recognition, Google voice-to-text is used and voice broadcasting is used to give proper suggestions/guidelines by entering multiple inputs and getting the suggestions from robots.
The remaining paper is organized in the following order;
Section 2 represents the review of literature in the domain of indoor navigation systems.
Section 3 illustrates the methodology adopted in the accomplishment of this research study. The experimental and simulation results are discussed in more detail in
Section 4 of the paper. Finally,
Section 5 concludes the overall theme and findings of this research study.
2. Literature Review
With the speedy advancement of photogrammetric, object recognition, and optical imaging technology, it is now possible to acquire images quickly and affordably, extract and correlate precise and effective picture features, and quickly solve the projection matrix and other exterior orientation-related problems [
11]. Picture-based visual positioning provides the advantages of improved precision, context-rich knowledge, and excellent visual impact. Further, it has the potential to offer an affordable and precise active indoor positioning solution. As a result, worldwide academics have extensively examined visual positioning technologies. Besides the advantages and extensive use of different indoor positioning techniques, these techniques have some limitations as well, which are discussed by different researchers in their research studies. For example, Lluvia et al. [
12] stated that indoor navigation faces various problems such as mapping the environment, indoor positioning, and planning for trajectory. Further, ISO 17438-1:2016 stated that indoor positioning and mapping are subcategories of navigational applications [
13]. In general terms, navigation can also be referred to as a multi-field application within others that include asset management, mapping, tracking, localization, and so on. In our literature review, we came to the conclusion that indoor navigation, mapping, and indoor positioning are the most fundamental technologies, and their requirements change according to the circumstances. We divided the literature for this study into multiple subcategories that include indoor navigational systems, indoor mapping, face detection and tracking, speech recognition, and voice broadcasting.
Nowadays, most of the research work in the field of indoor navigation is based on image processing technologies. Numerous methods and systems have emerged as a result of the use of these technologies in numerous fields. For example, in reference [
14], an indoor navigation system for blind people is developed using a mobile application in which color patterns around the application user are detected using image processing techniques, thereby providing assistance in the indoor navigation system. Indoor navigation is classified by researchers into several categories, such as computer vision, which includes omnidirectional cameras, 3D cameras, or inbuilt smartphone cameras for face and environment detection [
15]. Wi-Fi, Bluetooth, and RFID are widely used communication technologies, and the most widely used method for navigation is the pedestrian dead reckoning technique, which uses the inbuilt sensors of smartphones and does not require external hardware [
16]. Indoor navigation for blind people has been taken to the next level by B. Li et al. [
17], who developed a solution for dynamic environment navigation using image processing techniques. Their proposed prototype contained a Google Tango mobile device, a smart cane with a keypad, and two vibration motors. Other than that, SLAM-based solutions for indoor navigation are presented in [
18,
19]. Similarly, in another study, for indoor navigation, communication technology such as WI-FI is used to navigate and localize mobile communication with different nodes, and based on the time taken for a response, location is detected [
20]. Machine learning-based algorithms are also used for WIFI fingerprint-based tracking, and the results produced by these algorithms are more promising using SVM [
21], neural networks, and KNN [
22], where they specifically presented a technique for navigation in mines using fingerprint matching.
Most of the work also uses computer vision technology, which can be divided into two sub-research fields that use either an infrastructure of static cameras to track mobile entities (e.g., people, robots) or cameras attached to the mobile entities [
23]. After advancements in camera technology and the development of advanced algorithms, the use of a mobile phone camera (with image processing techniques embedded in the camera, such as different filters), is being considered. For example, M. Li et al. [
24] proposed multiple mobile application-based solutions using a smartphone, such as a precision single-image-based indoor visual positioning method in which the mobile camera color patterns were matched to find out the current position. Similarly, another approach was proposed in [
5] for spatial visual self-localization using mobile platforms in urban settings, which showed promising performance in investigating the high-precision visual placement of cellphones in outdoor settings. References [
25,
26,
27,
28,
29] investigated the most recent methods for image position tracking, using DL and visual location-based techniques for SOTA image features, as well as cutting-edge technology for extracting image features and retrieving the required images based on the extracted features.
Furthermore, related research for mobile application-based library management or multi-story building has been accomplished on multiple levels, such as [
25], which developed a smart voice assistant for the library based on the internet of things using a raspberry pi and a speaker module to assist people. The main problem associated with the earlier approaches is the single point of concern, i.e., they are not generic and are used for a specific point of interest. They are primarily designed for one environment and cannot be used in another. This is indeed a serious issue that needs to be addressed by developing a system that can be used in multiple environments under different circumstances. In order to sort out this problem, we proposed a system that can be used in different environments. We combined multiple existing solutions and proposed a single algorithm that can search, give directions using voice broadcasting, recognize people using the Android tensor flow library, convert voice-to-text using Google Voice to Text, and then obtain the required information. The proposed system is tested in the university’s multi-story library with the new idea of multiple robots communicating with each other and with a person who is searching to find a particular book in the library. The proposed system is anticipated to be of great assistance to persons who are new and do not know where a book of a particular discipline is located in the library.
3. Proposed Methodology
This section represents the approaches adopted and the materials used in the completion of this research study. For developing an efficient mobile or computer-based application, third-party libraries play a vital role in time minimization, providing an efficient interface, data management, algorithm development, and easy communication between multiple modules [
26]. The developed intelligent multi-floor navigation system is based on multiple recognition systems, i.e., speech, face, and voice broadcast using an Android application. To develop an application for multistory buildings, we focused on a single solution by implementing our changeable application. The proposed system comprises two main Android application systems, i.e., the indoor robot system, which is an administrative app, and the indoor robot navigation system, which enables the interaction of robots with the users. The methodology adopted in this study is represented in
Figure 4.
The two main Android applications developed in this study are discussed in more detail in the following subsections.
3.1. Indoor Robot System Administration
The main function of the administrative application is to gather data from different sources. The first step is to sign in to the administration application, then multiple options appear for the admin to configure the whole system, which can be changed according to the requirements. The administrative android application (indoor robot system app) consists of several sub-modules such as robot management, floor management, shelf management, book management, and member management, as shown in
Figure 5.
Robot management is used to add or remove a robot from the system. For every floor, there is a robot, and the number of robots is equal to the number of floors. Every robot in the robot management module has a unique name and ID, which are used to guide the customer on each floor of the building.
Figure 6a shows the backend of the robot management module.
The floor management module is used for the management of floors, i.e., how many floors are there in the building and to place robots on each floor. In this module we add floors to the system and then assign each a robot to a floor. The two main attributes of this module are floor name and robot selection.
Figure 6b illustrates the floor management module of the indoor robot system application.
The shelf management module of the indoor robot system is used for the arrangement of books on each floor of the building. The number of shelves on each floor of the building can be increased according to the needs and requirements of the library. The main attributes in this module include shelf number and floor selection. The backend of this module is represented in
Figure 7a.
The book management module of the indoor robot system is used to arrange books on each floor of the building. The main attributes of this module are: book name, author name, floor selection, and shelf number. The backend of this module is shown in
Figure 7b.
Similarly, the last module of this application is the member management module, which is used to register the members that will have access to the books in the library. The attributes included in this module are: member first name, last name, email address, phone number, and date of birth. The backend of this module is illustrated in
Figure 7c.
3.2. Indoor Robot Navigation System
The second android application of the proposed system, i.e., the indoor robot navigation application, is deployed on the robots and consists of a monitoring screen that enables the interaction of robots and users, as shown in
Figure 8. The main purpose of this application is to provide a connectionless and easy access to information needed by a person. When the robot application is opened, there are two options for the user. The first one is the monitoring screen, which is used for navigational help, and the second one is logout, as shown in
Figure 8.
In the monitoring screen, the front or back camera (depending on place of usage and administration requirements) of a mobile phone stays on until a face is detected and recognized using the Tensorflow Lite library for Android [
27]. The first step of the monitoring screen module is to recognize the user using a face recognition algorithm. The robot will announce the name of the user if he/she is already registered in the system, as shown in
Figure 9a. The monitoring system also recognizes the location of the user, such as on which floor he or she is. Once a person is identified, the next step is to obtain input from that person regarding the required book, in the case of a library. Input from the user can be in the form of voice (
Figure 9b), which is converted to text using Google API services for voice to text, or if the person is non-vocal, he/she can just enter the required book name using the keypad of the screen (
Figure 9c). The robot will assist the user in finding out the location of the book, i.e., on which floor the book is and on which shelf it is placed, as shown in
Figure 9d.
After application layout and design, the next step is the integration of algorithms for face detection and recognition, speech recognition and broadcasting, and database development and maintenance (adding new users, updating the information of existing users, and deleting users from the system database).
3.2.1. Image Recognition Module of Screen Monitoring
This module is used to recognize whether the user is an authorized person or not. After recognizing his or her face, this module further allows him or her to access different books in the library.
Figure 10 illustrates the workflow of the image recognition module.
The image of the user is acquired with the help of an RGB camera and then passed to the Viola Jones algorithm, which detects different points on the face. After this, different features are extracted from the acquired image and passed to the classification algorithm, i.e., CNN. CNN classifies the image and determines whose image it is; the user’s registration status is then checked. If the user is a registered user, then he or she is authorized to have access to each floor and book of the library. If the person is not a registered user, his request to use the library is denied.
3.2.2. Voice Recognition Module of Screen Monitoring
After the user recognition, the voice recognition module allows the user to search for a particular book. If the book is in the library, the user is provided complete information about the book, i.e., book name, author name, floor number on which the book is place, and shelf number in which the book is placed.
Figure 11 shows the workflow of the voice recognition system.
In this module, the user is asked to record their voice and search for a particular book. As the voice is in real-time, there is some noise. The noise is removed from the input voice by applying noise-removing filters. After removing the noise from the input voice, it is passed to the MFCC algorithm to extract useful features from it. After extracting useful features from the input voice, they are passed to the classification algorithm, i.e., CNN, which classifies the input voices and decides whether the book name for which the user has input the voice is in the library or not. If the book name is present in the library book list, then the person is informed about the complete information of the book, i.e., on which floor it is and on which shelf it is placed. If the book’s name is not in the library book list, then the person is informed that the book is not in the library list.
In this study, different deep learning and machine learning libraries are used for face detection and voice recognition. For example, for face detection and recognition, the TensorFlow Lite library for Android is used, which is a lightweight version of the TensorFlow library and is playing a vital role in embedded and mobile systems. The main feature of TensorFlow Lite is that it enables machine learning interfaces with small binary sizes and low latency. TensorFlow Lite also supports hardware acceleration with the Android neural network API. TensorFlow Lite applies many techniques for achieving low latency, including optimizing the kernels for mobile apps, pre-fusing activations, and quantized kernels that allow smaller and faster (fixed-point math) models. Another API that we used in our application is voice-to-text conversion [
28] for user input of required items in our case book and text-to-voice conversion for specific directions of queries entered by the user. To search for the required book, we used another function, i.e., the event change listener [
29].
5. Conclusions and Future Works
With the continuous development of advanced technologies such as IoT, ML, DL, etc., an intelligent multi-floor navigational system using advanced identification techniques such as speech recognition, face recognition, and voice broadcasting based on Android applications is a new and interesting topic that needs to be investigated. In this study, we proposed an indoor navigation system that guides users to find a particular book on different floors of the library. The proposed system mainly consists of two Android apps, i.e., the administrative app (robot indoor system administration) and the navigation app (robot indoor navigation system). During the experimental process, facial and voice recognition were mostly accurate, but sometimes errors occurred due to environmental factors that can be further reduced. The proposed system was successfully tested for navigation in a multistory library. Further, with a few changes, the same work can be implemented in other indoor multistory buildings such as grocery stores, shopping malls, etc. The future work of this study is to implement the proposed system on more than five floors and to deploy more robots on each floor of the building. The proposed solution needs to be tested in other multi-story buildings for navigational help. Mobile phones can be replaced by moveable robots such as Pepper robots, which will make navigation more interactive in the future.