1. Introduction
Sun-and-beach hotel performance has been widely studied in tourism literature [
1]. However, among the two possible growth profit strategies, researchers have focused on expansion, leaving aside diversification [
2]. Expansion strategies imply income growth through the addition of hotel establishments or rooms. Different expansion strategies, which involve the use of property, leasing, and franchise and management contracts, represent different levels of effort in terms of management and investment [
2]. Diversification takes advantage of underused resources and economies of scope to obtain resources and create synergies between departments [
3]. However, diversification strategies have received less interest. A chief reason may be that the industry growth model is based on Fordism (see
Section 2) until the destination reaches the maturity stage [
4]. Nevertheless, as destinations reach a certain degree of maturity, these growth strategies are no longer viable [
5]. Additionally, the sun-and-sea Fordism model assumes that tourists traveling to sun-and-beach destinations only look for sunny weather and idyllic beaches ([
6,
7]).
Motivation is the starting point of consumption, the basis of the consumers’ behavioral analysis. That is, the research field on how and why different groups of consumers behave as they do [
8]. The other capstone of behavioral analysis is the individuals’ characteristics. In this way, the different consumption behaviors of different segments are referred to in tourism literature as tourism consumption patterns [
9]. Tourist consumption patterns are analyzed from different scales and perspectives: macro-, micro-, and nano-scale [
7]. Although there is already a vague border between scales it can be stated that the macro-scale comprises consumption choices that tourists make at the origin country before traveling. The micro-scale focuses on the different tourist choices between destinations or within a destination [
10]. Finally, the nano-scale compromises the consumption patterns of tourists of a specific attraction or local business, such as a beach or hotel. Thus, hotel managers must focus on the behavior of tourists lodged at their hotels to allocate resources to satisfy the needs of the most profitable segment. Under the assumption of the stability of tourism consumption patterns, its study has been set aside. Recently, tourists’ preferences and demands have become more complex [
11]. In addition, Mediterranean sun-and-beach destinations are in an advanced maturity state which, among other factors, implies a high degree of competence between hotels. These two factors may affect the hotels’ performance ([
5,
12]). Tourists do not only want to stay all day at the beach and then go back to the hotel room, but they also seek other activities beyond sunbathing. They seek hotels offering rooftops, spas, restaurants, bars, or sky bars. Therefore, via nano-scale analysis, hotel managers may develop strategies focused on differentiating their product from the large number of competitors in mature destinations. In fact, there is a high degree of competence derived from the number of establishments that compete for the same segment with the same productive model [
4,
13].
In this context, at mature destinations the number of luxury hotels has increased ([
14,
15]). These kinds of establishments can satisfy the needs and demands. However, this wide range of services ranging from a spa to a gastronomic restaurant causes different costs and contributions to the profitability of the hotel establishment, which is why proper customer segmentation becomes essential to increase profitability. In [
15], it is stated that luxury hotels have unique operational characteristics such as person-to-person interactions, diverse amenities, and high staff–customer ratios. Therefore, to maintain service quality while maintaining performance, luxury hotels may focus on diversification, operational efficiency, and diversification strategies [
16]. To set the proper diversification, it is essential for hotels to implement an accurate segmentation technique that identifies those services that can be more profitable. In this way, following [
17], in superior hotel establishments, Food and Beverage (F&B) services, beyond contributing to their Gross Operating Profit (GOP), favor room sales, which translates into improvements in occupancy ratios, Average Daily Rate (ADR), Revenue Per Available Room (RevPAR), and Gross Operating Profit Per Available Room (GOPPAR). Therefore, understanding the new consumption patterns of tourists entails the incorporation of new services aimed at improving the service offered to the hotel establishment’s customers, with the aim of improving its profitability ([
18,
19,
20]). With this aim, a classical segmentation technique widely used in hotel companies for consumption patterns of tourists is the segmentation algorithm known as K-means. However, classical segmentation techniques, among them K-means, seem to be limited in their ability to segment luxury hotel consumption patterns. Among other reasons, the aforementioned algorithm is based on the use of the Euclidean distance as a dissimilarity measure and this fact can cause a main handicap. Concretely, in many situations the Euclidean distance is a measure that is insensitive to the coordinate-to-coordinate differences of the variables involved in the measure because it is able to produce a compensation between the differences in different coordinates even when data are normalized. This may result in individuals being identified with the wrong centroid. Thus, the Euclidean distance provides a global difference measure between the values of the descriptive variables when dissimilarities between centroids and objects are measured. This can blur the relative differences in each component separately and, hence, the cluster technique can assign an object (a tourist) to an incorrect cluster. In order to avoid this drawback, in [
21] the Euclidean distance has been replaced by a new distance constructed by means of the use of an Ordered Weighted Averaging (OWA) operator in the sense of [
22]. It must be stressed that such a distance does not require normalization of the data because it calculates relative distances and, in addition, it is sensitive to the coordinate-to-coordinate differences (see
Section 3).
The aim of this paper is to apply the OWA-based K-means in order to cluster customers staying at a real five-star hotel, located in a mature sun-and-beach area, according to their propensity to spend. The experimental results are obtained from real data provided by a luxury hotel located in the city of Palma in Mallorca. The obtained results show that the use of the OWA operator provides better segments than classical K-means, improving its performance up to , and reduces the number of convergence iterations up to . Such an improvement has been tested through a ground truth, designed by the marketing department of the firm, which states the cluster to which each tourist belongs. Moreover, the customer classification is achieved regardless of the season in which the customer stays at the hotel. All these facts confirm that the OWA-based K-means could be used as an appropriate tool for classifying tourists in purely exploratory and predictive stages. Furthermore, the novelty of the OWA-based methodology is given by the fact that it can be implemented without requiring radical changes in the implementation of the classical methodology (an easy modification of the classical K-means) and in data processing which is crucial so that it can be incorporated into the control panel of a real hotel without additional implementation costs, which can allow improvement in the performance of the hotel establishment significantly, both in the short and medium term, and its profitability. In addition, this is seen without having to create models for low, high, and mid-season.
The remainder of the paper is organized as follows. In
Section 2 the notion of the nano-scale is introduced and the need for such a scale in the analysis of consumption patterns at the hotel level is justified.
Section 3 is devoted to recalling the basics about aggregation functions and the OWA operators. Moreover, the construction of those distances based on OWAs that will play a central role for our target is described. Furthermore, the OWA-based K-means algorithm is also shown. In
Section 4, the data description is provided, i.e., the variables considered in order to describe the tourists to be classified. In
Section 5, the obtained experimental results are described in detail. In addition, an illustrative numerical example that allows us to show the functionality of the OWA-based clustering technique and its advantage with respect to that based on the use of the Euclidean distance is given. Finally, conclusions and further work are given in
Section 6.
2. The Need for Consumption Patterns at the Nano-Scale: The Hotel-Level Case
Hotel establishments, like destinations, go through different product phases [
4]. Up to the maturity stage, they base their revenues and profitability on the Fordist model. Under this approach, performance efficiency is measured by comparing observed and optimal costs and revenues subject to price and quality constraints [
23]. Since the 1960s, this has allowed the creation of multinational hotel chains that are extremely efficient in cost and quality management [
2]. However, in mature destinations, where cost reduction is difficult and tourist preferences have evolved towards multipurpose travel, expansion and cost control strategies can be combined with service diversification. In this scenario, hotels may choose to develop new businesses that are to a greater or lesser extent related to their existing lines of business. However, few studies consider product diversification strategies in a hotel [
24].
In [
25], diversification of F&B services was identified as a key variable of hotel performance. In [
26], it was highlighted that the effect of diversification on performance is based on the combined effect of synergies and the possibility of sharing resources and knowledge between different business units that can lead to higher performance. However, the costs may outweigh the benefits generated by synergies at some level of diversification (see [
27]). In [
18], the effects of Taiwanese hotels’ diversification in F&B strategies on their growth and earnings stability were examined. In particular, it was found that hotels with total revenue generated mainly from F&B service tend to have higher profit margin growth, but also suffer from higher instability. Along these lines, in [
28] a trend towards service diversification was also found when examining data from the hotel sector in Turkey, an example of a mature sun-and-sea destination. Concretely, the firm size and sector-specific knowledge (intra-industry investments and experience of hotel workers) are shown to be important variables in determining the success of diversification strategies.
Taking a holistic view, in [
29] it was evidenced using stochastic frontier analysis that revenue diversification in the rooms and F&B department and the efficiency of other services are explained by the overall structure, technological efficiency, workers’ capabilities, and hotel characteristics. One step further, in [
30] non-linearity in the profitability analysis of diversification was introduced. In particular, it was found that unrelated diversification increases profitability up to a certain level. However, beyond that level, unrelated diversification decreases profitability, implying that at high levels of unrelated internal diversification there is a loss of control and effort due to distance from the core business. They also found that at low levels of related diversification, the synthetic related business risk is larger than the risk reduction effect. This means that at low levels of related diversification, the synthetic related business risk is higher than the risk reduction effect.
A better understanding of customer preferences and behavior can be key for a hotel when implementing diversification strategies. Therefore, fuzzy segmentation techniques can provide a better understanding of their consumption patterns that prevent internal transaction costs from being greater than the synergies created between departments [
30]. This is especially important in the initial aspects of diversification as the learning curve appears to exhibit diminishing marginal returns [
19].
However, there is a gap in the literature regarding the analysis of tourists’ consumption patterns at the hotel where they stay. The existing literature has focused on tourism product choice, routes, itinerant cognition, spatio-temporal distribution, and mental maps. This may be due to the fact that in order to obtain meaningful results it is necessary to obtain large and precise datasets on tourist behavior. In this sense, in [
31] tourists were segmented in terms of their behavior. The authors combined traditional interviews with socio-demographic questions (age, gender, size of travel group, etc.) with information from Global Positioning Systems (GPSs) (length of trip, duration of trip, number of attractions visited, average speed, etc.). By delivering a GPS device that tracks tourists’ movements during their visit to the city, researchers obtain a higher response rate than using a travel diary, as well as more accurate data.
New technologies allow hotel managers to interact with their customers almost immediately. This is why the aggregated use of information on tourist preferences and characteristics at the hotel level allows for a better understanding of how the customer interacts with the hotel establishment [
32], allowing the company to develop and communicate targeted and differentiated diversification strategies for each customer typology ([
28,
32]). These more precise segmentation techniques make it possible to adapt the classic segmentation models, resulting in more precise models, without having to make radical changes in data processing (such as normalization processes). All this will result in an improvement in the performance of hotel establishments, while allowing them to better understand their customers and diversify their sources of revenue. In fact, in [
33] it was already indicated, in terms of the profitability that the restaurant service of a hotel establishment can provide, that the average satisfaction with the service provided to these customers positively influenced the performance of the hotel establishment.
Based on all of the above and given the few works that analyze consumption patterns within a hotel, this paper focuses on advancing the analysis on this scale (nano-scale). As already indicated, the nano-scale can be defined as the interaction between the tourist and a local business offering different services. In this sense, new technologies currently allow hotel managers to interact with their customers almost immediately. This is why the aggregate use of information on tourist preferences and characteristics at this level allows for a better understanding of the way in which a customer interacts with the hotel establishment [
32], enabling the company to develop and communicate differentiated strategies focused on each type of customer ([
28,
34]). Regarding the use of aggregation methodologies applied to segmentation and classification and taking the fact that companies prefer to advance by adapting classic segmentation models without having to make radical changes in data processing and, thus, without incurring many implementation costs, we introduce OWA-based K-means, which allows adapting the classical K-means while avoiding some of its shortcomings in classification mentioned in
Section 1 in [
21]. This adaptation can be carried out without having to make radical changes in the implementation of classical methodologies on the one hand and, on the other hand, in data processing (in particular, there is no need to normalize incoming data).
In the light of the information above, the objective is to apply the aforementioned methodology to the nano-scale analysis of a hotel establishment of a superior category in a mature sun-and-beach destination, for which it is desired to segment customers according to their consumption potential regardless of the time of year in which they visit the establishment. On the one hand, we aim to achieve an improvement in the performance of hotel establishments and, on the other hand, allow them to get to know their customers better and, thus, diversify their sources of income.
3. The OWA-Based K-Means
In the market segmentation literature, partitional clustering algorithms are used to find the patterns of customers in such a way that those assigned to the same group (cluster) are more similar to each other than the patterns of those customers contained in the other clusters ([
35,
36]). Among these algorithms, K-means is one of the most popular in the social sciences (see, for instance, [
35,
37,
38]). However, K-means has a significant probability of not converging to a solution. Moreover, it is very sensitive to outliers and statistical noise ([
39]). Furthermore, another disadvantage of the aforementioned algorithm was shown in [
21]. Specifically, the Euclidean distance does not allow, in general, one to obtain a dissimilarity measure that takes into account the information provided by the explanatory variables coordinate-to-coordinate. In fact, this distance dilutes the aforementioned information by providing, in a way, a measure that is insensitive to the coordinate-to-coordinate differences of the variables involved in the measure because it is able to produce a compensation between the differences in different coordinates. This may result in objects (individuals/customers) being identified with the wrong centroid (for a deeper discussion, we refer the reader to [
21]).
According to [
40], a function
is an aggregation function provided that it is monotone (
if
and
for all
) and it satisfies the so-called boundary conditions
and
. Of course, Euclidean distance can be understood as a measure of dissimilarity obtained by means of the aggregation of the information coming from each coordinate whose numerical value is normalized. In fact, such a measure aggregates a collection of squared distances computed coordinate-to-coordinate. Following [
40], aggregation functions play a crucial role in decision-making processes. Thus, if
is an aggregation function and
is the
ith-coordinate function of
A with
, then each
can be interpreted as the different criteria to be taken into account for decision making. Indeed, if
represents the (non-empty) set of alternatives, then for every
the value of
can be interpreted as the degree to which
x satisfies the criterion represented by
. Thus, the aggregation function
A can be understood as a tool to produce an overall degree to which alternative
x satisfies at the same time the
n criteria under consideration.
A special case of aggregation function is the so-called Ordered Weighted Averaging (OWA) operator which was introduced in [
22] (see also [
40]). As mentioned before, in [
21] a new distance was introduced as a possible replacement for the Euclidean distance in the K-means algorithm which is obtained by means of aggregation of distances computed coordinate-to-coordinate and merged via an OWA operator and it is called Ordered Weight Distance Relative (OWDr for short). It must be stressed that such a distance generalizes the ordered weighted distance introduced in [
41]. In order to introduce its constructions, let us recall that, given a weighting vector
such that
, an OWA aggregation operator of dimension
n is an aggregation function
such that
where
denotes the
ith largest element in the collection
. Based on the notion of the OWA operator, given a weighting vector
such that
, the OWDr
is given as follows ([
21]):
where
(features vectors or centroids) with
,
,
denotes the
ith largest element in the collection of distances
and
is the distance given by
Observe that the input vectors represent the data instances involved in the clustering process (centroids/features vectors). Moreover, given input vectors and assigning specific weighting vector W, the associated OWDr generates an overall degree of dissimilarity in such a way that the information from the different scales is all incorporated into that measure via the values , that measure differences only coordinate-to-coordinate, in such a way that the weighting vector is able to intensify (or diminish as appropriate) the most notable differences.
The OWDr was shown to be sensitive to differences in the scales of the variables involved in the measurement coordinate-to-coordinate, avoiding the aforementioned possible drawbacks between these differences. The most salient feature of the OWDr is that it can diminish (or intensify) the influence of excessively large or excessively small deviations in the data to be aggregated by assigning them low (or high) weights.
Consider , a dataset with n dimensions () for all , to be divided into k clusters. The objective of K-means is to obtain a partition of the data in which the mean square error between the cluster centroid and the cluster points is minimum. The process of the OWA-based K-means is as follows:
An initial partition with k clusters is selected and k initial clusters are set.
For each step , each is in the cluster such that its distance, measured via the OWDr, from the centroid of is minimum for all .
The centroid of each cluster is recalculated for the next step by calculating the arithmetic mean of each cluster until step t.
If the algorithm does not converge, then repeat step 2.
It must be stressed that the algorithm is considered convergent when in one step the centroids of the clusters, after recalculation, remain unchanged. Thus, such a fact is considered as a stopping criterion.
In the light of the exposed facts, it is worth mentioning that the introduction of the OWDr allows fixing greater weights to be assigned to those observations that are further apart, causing those distances with a larger value to contribute more to the overall measure and, hence, providing enough quantitative information in order to be able to discriminate if the datum differs enough from the centroid to be discarded from the cluster.
4. Data Description
As mentioned before, the objective is to segment the customers staying at a hotel according to their consumption potential regardless of the time of the year when they visit the establishment. Following the clustering, the bookings of the marketing department of the firm were divided into three segments depending on their propensity to spend: low, 780 (
%), medium, 946 (42%), and high, 529 (
%). Under this approach, the company takes into account the vagueness of the subjective opinion of the reception managers and the F&B manager. The preceding segmentation is used as ground truth in order to confirm or reject the tourists’ classification that both algorithms provide and, thus, to check their performance by means of the accuracy metric detailed in
Section 5.
As can be seen in
Table 1, the variables considered in order to describe the tourists are days of stay, the price paid by the customer for the accommodation (cost of stay), number of customers per visit to the rooftop bar (number of diners), total amount spent at the bar for each booking (expenditure per reserve), and total number of visits made to the rooftop bar during the stay (number of visits).
Tourism literature has recognized that length (days) of stay arguably is a key determinant of the success of a destination as well as its firm’s success [
42]. The price paid by the guest for the room (cost of stay) is a primary filter as well as the main source of income of hotels. Higher room prices exclude low purchasing power segments [
43,
44]. The expenditure per reserve and the number of diners and the number of visits determine the success of the diversification strategy because no diversification can be conducted if there is not enough revenue to, at least, cover its costs [
17,
45]. The understanding of the interaction of the customer with the different outlets is the core of the analysis of consumption patterns at the nano-scale.
In this way, we have a database with information on 2256 bookings that spent at least one night in the hotel between 1 March 2019 and 31 October 2019. This information has been obtained directly from the database of the hotel chain.
As can be seen in
Table 2, the average stay per booking is
nights, which usually coincides with the weekend. Thus, there are tourists who spend one night or tourists who spend their entire holiday in the hotel, reaching a maximum of 33 nights. Similar to the average cost provided by the Spanish Instituto Official de Estadística (INE), the average cost of the stay for this period is just over a thousand euros, with some cases where the cost is zero due to commercial or operational reasons (see [
46]). The number of diners per visit at the rooftop bar is about five people and they spend an average of EUR
. Moreover, they visit the bar about 3 times during their stay.
5. Experimental Results
The segmentation has been carried out independently of classical categories such as seasonality and nationality which add complexity to the process and do not always provide useful information.
In order to make the results meaningful, 1500 experiments have been executed for both the Euclidean distance and each choice of weights of the OWDr measure. All experiments have been executed in Python. As a measure to compare the goodness-of-fit between the two distances applied to the K-means, we used the accuracy, a standard measure in the literature ([
47]). Observe that, as pointed out before, we rely on a ground truth and, thus, each tourist has been previously classified by the marketing department of the firm and, hence, after the execution of both algorithms we can analyze whether the K-means can be useful for the segmentation that we want to do by comparing the marketing department classification against that given by the algorithms. Hence, following [
47], in order to evaluate the performance of the algorithms we have used a confusion matrix. Such a matrix includes, on the one hand, the True Positive (TP) and the True Negative (TN) that correspond to well-made classifications and, on the other hand, the False Negative (FN) and the False Positive (FP) that correspond to those classifications that are incorrect. Therefore, the accuracy has been calculated using
. In addition, in order to optimize applicability, the average number of iterations that each measure needed to converge (centroids of the clusters remain unchanged after recalculation) to a solution was calculated. The weighting vectors in all experiments have been selected heuristically.
As can be seen in
Table 3 the K-means with the OWDr measure outperforms the Euclidean distance in all cases. The accuracy using the Euclidean distance is the same in all cases (all rows), this is because the weights defined have only affected the OWDrs considered, therefore the K-means with the Euclidean distance remains constant both in the average number of iterations and in effectiveness. Additionally, except for the vector weight
(last row), in all combinations (vector weights chosen), it can be seen that apart from having better effectiveness, the OWDr also has a faster convergence.
Table 3 shows the obtained results, on average, of the accuracy achieved and the number of iterations need to converge after 1500 experiments for each weight choice. As can be seen in
Table 3, the weight vector that work best is:
. Concretely, it provides both the best effectiveness and at the same time the best convergence. When the weight vector chosen is
(last row of
Table 3), we can see that the performance of the OWA-based K-means algorithm is far from those that heuristically have obtained the best goodness-of-fit and in this case it provides a better performance than the classical K-means.
We end this section with an illustrative numerical example that allows us to show the functionality of the two segmentation techniques used and their differences. For this purpose, three reservations have been taken into consideration (see
Table 4). In general, the first two, despite having a more different price of stay than the second from the third, it is observed that customers have a more similar behavior from the point of view of propensity to consume. In this sense, this is precisely the aim of this article, that is, to incorporate in K-means a measure that takes into account how far apart the values of the descriptive variables are from each other coordinate-to-coordinate (not only globally as the Euclidean distance does) in such a way that great coordinate-to-coordinate differences cannot be compensated for with other differences when the measure aggregates, providing the global difference. This was carried out without preprocessing the information normalization.
In the following, we show that in effect, the new proposed distance, the OWDr, on the one hand achieves this and, on the other hand, it also has a relativizing effect, since it takes into account the scale of each of the descriptive variables in order to know whether the differences between them are significant or not. At the same time, it is illustrated that this is not the case for the Euclidean distance.
According to
Table 5, when the Euclidean distance is applied to measure we obtain that the most similar reservations are
and
. Indeed,
,
and
. This could imply that the K-means classifies them in the same cluster. However, it is clear that the spending pattern of the customers is well differentiated (they should not belong to the same cluster). It must be pointed out that values of the variables that describe the reserves were previously normalized before computing the Euclidean distance. This done because the classical implementation of K-means with Euclidean distance is carried out in such a way that the data to be analyzed are those previously normalized in order to minimize scale differences.
However, using the OWDr with vector weight, for instance,
, the most similar reserves are
and
which is in line with expectations as the expenditure patterns are similar. Indeed, as shown in
Table 6,
,
, and
.
In order help the reader, we compute step-by-step the distance . The remaining distances can be computed analogously.
Step 1. Given
and
, we compute, applying (
2), the collection of relative distances
, where
,
.
Step 2. We sort the elements of collection obtained in Step 1 in such a way that the first component will be the -largest element of the collection, etc. This gives the new vector .
Step 3. Applying (
1), we obtain the global value
Note that the computing methodology of OWDr itself calculates relative distances (normalized distances) and therefore does not require preprocessing (normalizing) of the data to be treated. Moreover, unlike the Euclidean distance, the new methodology prioritizes those larger differences in the overall computation of the dissimilarity measure.
In the light of the exposed computations, the two procedures detect significantly different patterns in the data and, in addition, the normalization of the data does not manage to avoid the aforementioned problems in relation to the use of the Euclidean distance.
6. Conclusions and Further Work
The main objective of this study was to understand the consumption pattern of tourists staying in a five-star hotel located in a mature sun-and-beach area, as a good understanding of customer demands allows for improving hotel performance [
25]. Specifically, this involves being able to classify customers staying at the hotel according to their propensity to spend. The objective is achieved regardless of the season and in a way that the methodology used can be implemented without requiring radical changes in the implementation of classical methodologies on the one hand, and in data processing on the other hand.
From a theoretical point of view, this study contributes to the literature by providing a method for the categorization of expenditure of tourists visiting the hotel. Moreover, it considers the hotel as a consumption center beyond being where tourists go to rest, introducing the concept of the nano-scale. Thus, the analysis of the consumption pattern at the hotel level is essential for managers of establishments located in mature sun-and-beach destinations that are in an advanced stage of consolidation. In this sense, it is more important for entrepreneurs to understand how tourists interact with hotel offers than to know the number of tourist arrivals, which is more or less stable depending on the consolidation stage of the destination. Methodologically, this article shows how the introduction of a distance based on the use of OWDr improves the performance (accuracy) of classical K-means up to % and reduces the number of iterations needed for the algorithm to converge up to %. Moreover, OWDr provides some more relevant advantages with respect to the Euclidean distance. It does not require normalization of the data or calculating relative distances and, in addition, it is sensitive to the coordinate-to-coordinate differences. These improvements in customer segmentation and the cost of implementation lead to an improvement in the profitability of the establishment directly, adapting prices and services to each segment in order to increase prices. In addition, there are indirect benefits in terms of knowing the consumption pattern of tourists, optimizing spaces and services such as terraces, and generating synergies between departments, all of which increase customer satisfaction.
For hotel managers, a better understanding of customer consumption patterns can facilitate the implementation of strategies that increase hotel performance. It can be achieved by an increase in revenue or by improving efficiency through synergies between departments [
48,
49,
50]. However, the cost of implementing new technologies can be high and time-consuming, making it difficult to recover the investment made [
51]. For this reason, being able to incorporate the OWA operators and the aforementioned distances based on them into widely used algorithms not only improves their segmentation capacity but also reduces the implementation costs with respect to other machine learning techniques, since with a modification of K-means, a technique used in most hotel companies, the profitability is obtained. Although the method has been applied to a higher category hotel due to the availability of data, the technique is applicable to any tourist establishment.
All experiments have been executed in Python and they have shown that the OWA-based K-means generally outperforms the classical K-means endowed with the Euclidean distance. Such an improvement has been tested through a ground truth, designed by the marketing department of the firm, which states the cluster to which each tourist belongs. Therefore, it seems that the OWA-based K-means could be used as an appropriate tool for classifying tourists in purely exploratory and predictive stages. However, the weight vectors defining the OWDrs have been selected heuristically. A future line of research is the determination of the optimal weight vectors that define the OWDrs for the customer data to be used in experiments. In this direction, a comparison of OWA-based K-means and K-means will be made in terms of the running time of computing, also taking into account the time taken to select the aforementioned optimal weights by the former algorithm. Furthermore, the OWA-based algorithm will be tested on a wider selection of datasets coming from different hotels sharing similar characteristics to the hotel considered in the present work. However, in an early rejuvenation phase of a mature destination, few hotels are conceptualized to offer other complementary services than breakfast.