1. Introduction
Nowadays, most large retail companies focus their marketing strategies on knowledge of customer behavior in all the commercial channels. These marketing techniques are known as omni-channel strategies [
1]. The overlapping of customers’ online and offline behavior requires awareness [
2] to create successful, specific marketing campaigns targeting particular segments of the population. The key is to study both types of behavior before launching any marketing strategy. Online customer behavior is studied in detail through the use of web/app analytics and cookies. Currently, research efforts are mainly addressed at learning about offline behavior; when the customers arrive at points-of-sale. Therefore, aspects such as the time that customers stay inside a given shop, the frequency of these visits, or the routes followed by customers are some of the inputs employed to increase knowledge about potential buyers [
3]. Suitable analysis of this information provides marketing departments with the opportunity to optimize efforts which could influence purchasing decisions and define more effective contextual marketing campaigns. However, difficulties arise when these campaigns are carried out in indoor environments in real time. In this case, an indoor positioning system (IPS) is required to learn about consumers’ offline behavior. In order to increase the success of a contextual marketing platform, the system employed to plan and manage an omni-channel marketing campaign should include an IPS module.
Malls are essential locations for retailers. Mall owners often have marketing teams in charge of increasing the frequency of visits, and once they have managed to increase traffic within the mall, they try to influence purchasing decisions to maximize spending. Marketing teams, with a marketing manager at the forefront, are responsible for defining, planning, and managing both online and offline marketing campaigns using contextual marketing platforms. Consequently, malls are obvious environments in which to apply IPSs. As we describe in this work, an IPS is one of the most distinguishing elements of a contextual marketing platform, although we must not forget other technological tools that are necessary to implement a complete and useful solution. The components that must integrate a complete platform to learn about online/offline behaviors are the following: (i) a customer relationship management (CRM); (ii) a content manager system (CMS); (iii) a key performance indicator (KPI) panel; (iv) a mailing and push notification service; (vi) the IPS itself; and (vii) a mobile app. As the first contribution of this work, we delve into the implementation of all the aforementioned elements of a contextual marketing platform, their interconnections, and how their use can increase knowledge about potential customers. The IPS is the module which obtains the location of users inside the mall. Once the position of each visitor is determined, it is an input parameter of the contextual marketing platform. Expressed in a different way, the output of the IPS is employed as input in the contextual marketing platform, providing so-called location-as-a-service functionalities. To this end, customer location and tracking through the use of smartphones inside infrastructures are studied in this work.
As can be observed, the IPS is the engine of the contextual marketing platform. Thus, one of more important aspects is how to accomplish its implementation. In this sense, there is no standard solution that determines position in indoor environments, and this situation provides new engineering challenges. Different alternatives have utilized technologies like Wi-Fi [
4], Bluetooth [
5], ultrawide band (UWB) [
6], or radio frequency identification (RFID) [
7]. As Wi-Fi technology is widespread in malls, we discard others, such as Bluetooth, UWB, and RFID technologies. These technologies require the deployment of new infrastructure in the mall with high operational and maintenance costs, increased by the use of beacons and other ad hoc hardware. Moreover, these technologies are out of the coverage range to locate people through a mobile phone.
With respect to the inputs of the location algorithm to implement the IPS module, diverse Wi-Fi smartphone IPS solutions can be found in the scientific literature, such as [
8,
9], where the so-called inertial measurement unit (IMU) sensors, such as barometers, accelerometers, gyroscopes, or magnetometers, are employed. Merging the data from these sensors with Wi-Fi fingerprint information, precise pedestrian dead reckoning (PDR) can be generated. This system is capable of detection of the direction of movement [
8], stop/walk detection, and step counting [
9], among other things. The IPS obtained by these sensors along with Wi-Fi fingerprint data can be extremely precise. However, these solutions have a definite disadvantage: they consume a lot of battery power. A system conceived to operate in a large infrastructure during extended time periods must guarantee robust smartphone functionality. Under these premises and prioritizing sustainable battery consumption, one of the design principles of our system is to employ only Wi-Fi received signal strength indicator (RSSI) fingerprint information. Taking this requirement into account, better performance related to battery use and sustainability in comparison with those that employ IMU sensors is ensured, however, location is compromised. In our case of use, a mall, the automatized actions of the contextual marketing platform do not require centimetric accuracy; marketing activities are effective at shop-level granularity. Consequently, we established shop-level precision as a design requirement for our IPS solution.
The drawbacks associated with our proposal are related to the limitations of employing only a Wi-Fi signal: (i) the IPS will be affected by access point (AP) density variations because it depends on the network infrastructure, (ii) smartphone heterogeneity (different manufacturers, models, etc.) implies different RSSI values, even when using the same chipset [
10], (iii) the variability of mall occupancy, which impacts aspects such as signal propagation [
11], and (iv) the Wi-Fi fingerprint, which frequently changes due to the variation in the number of active APs as time goes by [
12]. To the best of our knowledge, there are no studies in the scientific literature comparing Wi-Fi fingerprint algorithms considering these premises in a large, real infrastructure. With the aim of filling this gap, we evaluate three well-known algorithms to be employed as Wi-Fi inputs for our IPS module. These algorithms are the following: K-nearest neighbors (KNN) [
13], support vector machines (SVMs) [
14], and Freeloc [
15]. In a recent study [
16], different RSSI fingerprinting indoor location algorithms were compared. The results show that KNN offers appropriate performance through a dataset composed of 4648 fingerprints acquired by using a static procedure during one day in a university building. SVM has been evaluated, showing reasonable accuracy in several indoor scenarios at the shop level [
17]. Freeloc can face the problem of dealing with heterogeneous devices and has been extensively evaluated in [
18]. Having these inconveniences in mind, as a second contribution, we have conducted an analysis and comparison of IPSs implemented with these three algorithms. Moreover, we have considered the aforementioned problems in malls, such as hardware heterogeneity, variations in AP density over time, and how different occupancy levels affect signal propagation.
The accuracy of the algorithm employed in indoor locations is crucial but the rest of the processes and tools required for the creation of the entire IPS are no less important. In general, the stages deployed to achieve an accurate indoor location solution are defined in [
19] and are the following: (i) digital map generation, (ii) the offline calibration phase, (iii) the system build-up/positioning algorithm for localization tasks, and iv) launching and testing. In this work, all the resources and methodology employed in each phase of our IPS, together with every one of the specific processes and tools used to obtain a practical and scalable solution from the perspective of a small–medium enterprise (SME) IT company, are described in detail. Therefore, as our third contribution we have designed, implemented, and thoroughly evaluated a complete IPS module, taking into account the following premises: (i) it employs the already deployed Wi-Fi access points in malls for connectivity purposes, (ii) it considers current market smartphone heterogeneity, (iii) our solution is deployed by non-trained personnel, (iv) it is scalable in all the aspects of the deployment as will be further detailed, and (v) it provides information referring to user location to the contextual marketing platform, resulting in location-as-a-service functionalities.
Concerning the already published works about the implementation of IPSs in large infrastructures, there are studies which provide positioning services in scenarios similar to ours. The COEX mall [
19] was the first to be supplied with an indoor location and navigation service based on the analysis of RSSIs. However, several dozens of calibrators along with some weeks of work were required for its implementation. Similar to this work, Li et al. [
20] tested their technology in large malls, considering the heterogeneous mobile device problem in these scenarios, offering an accuracy below 2 m. However, ad hoc APs have to be installed in the mall. Mathisen et al. [
21] showed a comparative evaluation of several indoor positioning techniques in a large university hospital. In this work, the authors evaluated several IPS solutions from data gathered on different days and at different times, both in static locations and moving throughout the building. The results illustrated the challenges involved in transferring data from a small, controlled environment, such as a small office environment, to a large, dynamic building complex. Furthermore, this work employed samples acquired only on two days and, therefore, did not evaluate the evolution of the different IPSs over the time. Another interesting approach is in [
22]. The authors proposed a zone-level IPS by applying neural network techniques, and they compared it against the KNN algorithm, achieving a slight change in accuracy. However, it was deployed in a university building with only 24 APs, and the evolution of the IPS over time was never tested. There are also commercial products for the generation of IPSs employing Wi-Fi signals, like Ekahau RTLS (now renamed to AiristaFlow) [
23]. However, these IPSs require an exhaustive calibration and the use of specific and complex calibration tools. Dealing with the drawbacks of the aforementioned IPS solutions is our fourth contribution.
Finally, as a fifth contribution of this paper, we publish a complete dataset fingerprint which provides details of our proposed solution. The database is open to the scientific community including all the information collected during the calibration phase together with the fingerprints acquired in the validation stage. Note that other datasets available for indoor positioning have also been published in the past. In [
24], datasets and supporting material employed during the Indoor Positioning and Indoor Navigation (IPIN2017) competition can be found. Another interesting work is by Lohan et al. [
16], which provided a full measurement package composed of crowdsourced Wi-Fi fingerprints as well as an intensive review of other indoor Wi-Fi datasets, pointing out their availability, features, and limitations. Mendoza-Silva et al. [
25] created a dataset containing measurements collected in a large building for an extended period of time. In this work, the results were obtained by employing a single device. Although this is an important drawback, the time dedicated to this experiment was long enough to detect variations in the Wi-Fi environment and to validate other algorithms. The dataset provided by [
26] is also worth mentioning. It includes the following two aspects: (i) RSSI values obtained in an office environment of around 200 m
2, and (ii) a fingerprint database of magnetic measurements. The main disadvantages of this dataset are the reduced size of the modeled space and the fact that only a single device was used to perform the tests. Some of the authors of this paper also published another dataset [
27] which contains RSSI data collected in a large, multi-floor building, by employing several devices carried by different users. This implies collecting geotagged samples which are useful to validate other IPS solutions.
The dataset included in the current paper is available at Zenodo [
28], a general-purpose open access repository developed under the European OpenAIRE program. This repository allows researchers to store and publish datasets, software, reports, and any other research-related digital artifacts with the main aim of presenting this information to the rest of the community. Our dataset allows other researchers to carry out their own validation tests since it includes the RSSI measurements collected during the offline phase to build fingerprint models at the shop level. Additional information about our dataset and best practices is provided in the rest of the paper.
To summarize, aside from the extensive comparison and discussion of three of the most well-known algorithms employed in IPS generation (SVM, Freeloc, and KNN), we detail all the considerations necessary to provide a complete and effective contextual marketing platform. The interactions between modules, their architecture, and functionalities are also detailed. This work is structured as follows.
Section 2 details the methodology followed for: (i) the architecture of the contextual marketing platform including all the information related to the generation of the IPS, (ii) the malls, smartphones, and validation routes employed in the comparison of the different algorithms, and (iii) the Wi-Fi fingerprints generated and shared in an open access repository.
Section 3 illustrates the results of the comparison among the aforementioned algorithms, which are later discussed in
Section 4. Finally,
Section 5 concludes the paper.
3. Results
We have evaluated and compared, in terms of stability over time, device heterogeneity, and the influence of occupancy on shop-level accuracy, different IPSs generated following the procedure depicted in
Section 2.1.5 by means of the three indoor positioning algorithms: SVM, KNN, and Freeloc. They are described in
Section 2.1.5. The evaluation was carried out in two malls, employing the validation routes and smartphones described in
Section 2.3. To this end, we have generated five different IPSs with each algorithm using the labeled samples acquired in the offline calibration phase with each one of the #1–#5 smartphones. The validation routes were then processed with each of the IPSs generated, where the inputs are the fingerprint samples to be located, and as output, the placement where they were acquired is estimated. In our case, the output was a shop-level area where the samples were collected.
The performance metric employed to search for the best optimization value is the Euclidean distance from the centroid of the area estimated by the IPS to the centroid of the area labeled in the sample. In the case of a sample for which the target zone is correctly estimated, the Euclidean error is zero. Thus, the shorter the Euclidean distance, the better the performance of the IPS. Moreover, we have employed the cross-validation technique. This means that to verify the performance of the IPS generated from the fingerprints collected by smartphone #1, the validation routes obtained from this smartphone were discarded. This procedure was applied for the remaining smartphones.
The parameters to optimize each of the algorithms are different. In the case of KNN, these parameters are the number of neighbors (APs) employed and the value of the RSSI when an AP is not found in the Wi-Fi scan. In the case of Freeloc, the optimization parameters are the δ and RSS
peak [
15] employed for the generation of the vector. RSS
peak is the RSSI value of the highest frequency during the measurement, while δ is a parameter that must be computed. Finally, in the case of SVM, the parameters are the kernel [
17] employed as a classification function and the regularization values (coefficient assigned in the misclassification). For each of the offline fingerprints using smartphones #1–#5 in both malls, we carried out a sweep of the optimization parameters with the aim of finding those with better performance. We tested from 1 to 15 neighbors and 0, 100, and 200 as RSSI values when an AP is not found. In the case of Freeloc, we tested δ from 1 to 10 and RSSpeak from 0 to 16. Regarding SVM, we checked the following kernels: linear, poly, rbf, and sigmoid, with regularizations valued at 0.1, 1, 2, and 3. The values with the lowest errors for the KNN algorithm in both malls and for all the smartphones are three neighbors and 100 for the RSSI when there is no AP. Concerning Freeloc, the optimal values for δ and RSSpeak are shown in
Table 5. Regarding the kernel and the regularization parameters for SVM, the optimal values are illustrated in
Table 6.
In the rest of the section, we compare the IPSs generated by means of the three indoor positioning algorithms, KNN, Freeloc, and SVM, employing the parameters previously optimized, with the premises of stability over time, device heterogeneity, and the influence of occupancy rates.
3.1. Variability of Performance over Time
To evaluate the three IPSs, the first comparison we conducted was to observe their evolution and performance over time. In works [
12,
46], they showed great variability with respect to the evolution of the Wi-Fi fingerprint in a large environment. This occurs because APs disappear and others appear. Since offline fingerprint acquisition is one of the most time-consuming tasks and is performed once every several months, the stability of the IPS as time goes by is one of the most important performance indicators. For this purpose, with the samples generated in the offline calibration phase, we generated five different IPSs for each mall with the samples from smartphones #1–#5, to later test the routes labeled #1–#4 over days 1–74 for mall 1, and days 1–60 for mall 2. As has been previously mentioned, we employed the cross-validation procedure and the Euclidean distance error calculated by the difference between shops estimated by the algorithms and the real labeled shop in the sample. The zero-error value is set when a sample is estimated in the area where it was acquired. The evolution of the IPS over time can be observed in
Figure 6a for Freeloc,
Figure 6b for KNN, and
Figure 6c for SVM.
3.2. Heterogeneity Diversity Performance
As described in [
10] and validated in
Section 2.4, each smartphone, even those with the same chipset, obtains a very different fingerprint. That is, the data generated by each smartphone in the offline calibration phase differ in RSSI levels, scan rate, and the number of APs detected. Therefore, it is crucial to explore whether the IPS generated by a smartphone from the fingerprint samples gathered has similar accuracy. To this aim, we tested our IPS solution with the routes labeled #1–#4 generated by different devices on day 74 for mall 1 and day 60 for mall 2 after calibration. Note that these days are the last days with evaluation routes. As we have established, ten different smartphones were employed in the routes defined for mall 1 and nine for mall 2. With this process, we can find the indoor location algorithm which has the greatest accuracy regardless of the device employed in the calibration procedure. The results are illustrated in
Figure 7 (Freeloc),
Figure 8 (KNN), and
Figure 9 (SVM) by means of the cumulative distribution function (CDF) metric. CDF depicts the error (measured in meters) of the percentile of samples with a lower error than the indicated one on the x-axis. As in the rest of the Results section, the cross-validation procedure is employed with Euclidean distance as a metric error. In the CDF, the percentage of samples with 0 m errors is the percentage of samples which have successfully estimated the right shop.
3.3. Influence of Occupancy Rates
Another factor to be carefully considered is the influence of the fading caused by the number of people in the mall. It is well known that people’s movements affect RSSI levels [
11]. In a mall, occupancy ostensibly changes according to the day of the week or even the hour of the day. Typically, there are more people at the weekend or on holidays than on working days. As the calibration phase must not disturb the regular operation of the mall, we cannot control the number of visitors and their frequency. We have studied the performance of Freeloc, KNN, and SVM in terms of mall capacity by considering a single calibration carried out over a few hours.
Therefore, to test the influence of fading on the algorithms, several pre-established routes were followed on the same day but at different times. Specifically, we took routes #2 and #3 for mall 1 on days 11 and 18 after the calibration day, and routes #1 and #4 for mall 2 on day 74. In both malls, the process was completed in the morning when the number of visitors was small, and in the afternoon, when there was a larger number of potential clients. Results employing the CDF metric are illustrated in
Figure 10a for Freeloc,
Figure 10b for KNN, and
Figure 10c for SVM.
3.4. Processing Time
Another interesting feature of the IPS involving scalability and the ability to precisely locate a large number of visitors is the processing time of the three different algorithms. To this end, we have evaluated the time employed in determining one of the validation routes generated. The time needed to generate the IPS has been ignored because it is carried out every few months and, therefore, does not affect the overall process. The most critical issue for the appropriate operation of the IPS is to compute the precise location of day-to-day visitors, which can be in the range of a few thousand. Processing time was measured on a PC with an AMD Ryzen 5 and 8 GB of RAM. We selected the route labeled #1, day 1 with smartphone #2, and employed the different IPSs (one per algorithm) generated with smartphone #1. It is a route with 701 fingerprint samples acquired over 1 h and 20 min. The processing time also includes all the routines/code to feed the IPS, like the database extraction of the validation route data, the computation time of the algorithm, and the time spent saving the results in a database. Therefore, the processing time depicted in
Table 7 is the amount of time required by the IPS module to compute a regular user visit for each one of the algorithms.
4. Discussion
Figure 6 shows the evolution of the performance of the IPS generated by the three algorithms for both malls using smartphones #1–#5. In the case of Freeloc, the results illustrate a flat slope referring to the mean errors (in meters) as time passes, showing an error of around 5 m in the best case (smartphone #2 in mall 2) and 11 m in the worst case (smartphone #4 in mall 1). This means that for most of the IPSs generated using Freeloc, the performance of our IPS on day 74 for mall 1 and on day 60 for mall 2 (that is, the last day of measurement for each mall) is similar to the performance on day 1. In the cases of KNN and SVM, the slope of the mean errors increases as time passes for the entire IPS. Moreover, the mean values of the errors on day 1 for the best performance are, in both cases, around 10 m for smartphone #2 in mall 1. These values are worse than in the case of Freeloc. Therefore, we can conclude that in the case of Freeloc, the error does not increase as time goes by and the system is stable and robust over time. In contrast, in the cases of KNN and SVM, the errors increase over time for the five IPSs generated in both malls. Additionally, it should be noted that in the case of Freeloc, the engines generated for each smartphone in mall 2 present lower errors than the ones generated for mall 1, although mall 2 has a larger surface and a greater number of shops. This is due to the larger number of APs detected in mall 2, which involves a greater number of representative fingerprints for each area.
Regarding the results for device heterogeneity,
Figure 7 (Freeloc),
Figure 8 (KNN), and
Figure 9 (SVM) illustrate the CDF outcomes of the last day of the validation routes for each mall/smartphones #1–#5. It is worth mentioning that errors valued at 0 m on the x-axis are the percentage of samples successfully determined on the right side of the y-axis. In the case of Freeloc, this percentage is around 60% for smartphone #1 and around 40% in the cases of smartphones #3, #4, and #5 (best cases). Concerning KNN, more irregular results are observed than in the case of Freeloc. Therefore, in all the IPSs generated using the five smartphones, the worst performance is around 10% of the samples determined correctly in the right zone. However, 90% of the samples are properly determined in the case of the IPS generated with smartphone #1 for the test device smartphone #2 in mall 2. Similar results can be found for SVM and the five IPSs generated, with values of 0 m errors from 5% to 80% in the case of smartphone #1. Another interesting result to analyze in the case of the CDF is the 80th and 90th percentile. These values illustrate the errors in 80% and 90% of the samples, respectively. In the case of Freeloc, the calculated CDFs show an error of around 30 m in the 80th percentile and 40 m in the 90th percentile, regardless of the smartphone used. It is worth mentioning that misclassification of a sample in our solution entails an error in meters from the centroid of an area to the centroid of another, and the larger the area with errors, the greater the distance of the error. In the case of KNN, the values obtained show a greater error than in the case of Freeloc, with an 80th percentile of around 60 m for smartphone #3 but very poor performance for smartphones #1, #2, #4, and #5. A similar result to that of KNN can be observed with SVM. The results show a similar percentile of error, regardless of the model of smartphone used in the calibration phase, only for the Freeloc algorithm. This demonstrates that any smartphone can perform the calibration process, resulting in a solution that is not dependent on hardware if this algorithm is employed in IPS generation.
With respect to the influence of occupancy levels,
Figure 10 illustrates that there are no significant variations in the mean metric error between high and low occupancy once the validation routes have been executed. An exception is smartphone #3, which shows greater error in low occupancy than in high occupancy in samples below the 20 m percentile in both malls; however, for the percentile of distances greater than 30 m (x-axis), a similar error is attained. In the case of KNN, some differences can be found for all the smartphones in both malls and for low/high occupancy. The cases of smartphones #2 and #5 for mall 1 are remarkable as they present better accuracy with low occupancy. In the case of SVM, the most remarkable situation is with smartphone #2 for mall 2, with better results for high rather than low occupancy and without differences in high/low occupancy in other scenarios. Therefore, we can conclude that the IPSs generated by the three algorithms are not affected by occupancy rates.
Finally, the values obtained for the processing time of a route simulating a 1 h 20 min visit to the mall show the worst results in the case of Freeloc, at 11.93 s. In the case of KNN, the processing time is 7.97 s, and for SVM, 8.51 s. As previously mentioned, the computing of the validation route was performed in a non-optimized final user PC, not in a specific computing server. This means that processing times can be drastically reduced. It should be kept in mind that localizing a few thousand users per day is an excellent result in terms of the mean value of numbers of visitors in a large mall. We can conclude that the three algorithms are scalable from a processing time perspective.
In summary, IPS solutions implementing Freeloc have better performance as time passes and in comparison with KNN and SVM, regardless of the type of device used. In terms of occupancy levels, the results show similar accuracy no matter the location algorithm employed. Finally, keeping processing time in mind, the best results are for KNN, although SVM and Freeloc do not show significant differences.
5. Conclusions
In this work, a complete contextual marketing platform including the capacity to automatize marketing campaigns, merging online and offline customer behavior, is proposed. The purpose of the system is achieved with a scalable and straightforward Wi-Fi-based indoor positioning system (IPS) at the shop level. Concerning the generation phases of the IPS, one of the most important challenges of the deployment of our IPS-based solution, the fingerprint calibration phase, has been designed and developed using the (i) random walking survey method, (ii) an easy-to-use ad hoc Android app, and (iii) a tool based on Jineditor to generate digital maps of the malls. This combination of technologies and procedures could allow us to extend our system to any mall by simply employing the existing APs for connectivity purposes.
Regarding IPS performance, three different indoor positioning algorithms, Freeloc, KNN, and SVM, have been evaluated and compared under the premises of only using Wi-Fi fingerprints as input parameters and without adding new hardware. Regardless of the smartphone employed in the calibration phase, it has been demonstrated that Freeloc has enough accuracy and stability to be employed as a KPI provider in a marketing campaign. Moreover, our solution has been validated under different occupancy scenarios, demonstrating its feasibility as time passes. The performance of our solution remains stable during a substantial time period, with occupancy levels not affecting the results.
The Wi-Fi fingerprint dataset is open access and was generated by means of 10 different smartphones over several weeks in two malls. Furthermore, our dataset provides the RSSI data of movement in shops/corridors of two malls, simulating the movement of visitors during a shopping trip.
Finally, as future work, the implementation of some preprocessing techniques related to calibration fingerprints together with the enforcement of realistic movement models between shops/corridors, such as the hidden Markov model, could improve the performance of the system without increasing computational requirements. Furthermore, thorough studies to discern the preferences of customers split into age groups as well as by their location privacy should be included in future studies.