Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality

Kundu, Avisek; Kundu, Seeboli Ghosh; Sahu, Santosh Kumar; Badgayan, Nitesh Dhar

doi:10.3390/computers14020032

Open AccessArticle

Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality

by

Avisek Kundu

^1,2,

Seeboli Ghosh Kundu

³

,

Santosh Kumar Sahu

^4,*

and

Nitesh Dhar Badgayan

^5,*

¹

Technology Consulting (Data Science, ML & AI), Ernst & Young LLP, Gurgaon 122002, India

²

Department of Operations and IT, IBS, Hyderabad (A Constituent of ICFAI Foundation for Higher Education), Hyderabad 501203, India

³

Symbiosis Centre for Management Studies, Bengaluru Campus, Symbiosis International (Deemed University), Pune 560100, India

⁴

School of Mechanical Engineering, VIT-AP University, Besides A.P. Secretariat, Amaravati 522237, India

⁵

KPMG, Mumbai 400011, India

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(2), 32; https://doi.org/10.3390/computers14020032

Submission received: 23 November 2024 / Revised: 15 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

(This article belongs to the Special Issue AI in Its Ecosystem)

Download

Browse Figures

Versions Notes

Abstract

:

The importance of measuring service quality for business performance has been widely recognized in service marketing literature due to its pivotal influence on customer satisfaction and its long-term impact on customer loyalty. The SERVQUAL model, comprising five dimensions—reliability, assurance, tangibility, empathy, and responsiveness—provides a measurable framework for evaluating the overall customer satisfaction. This study endeavors to ascertain whether all SERVQUAL dimensions carry equal weight in their effect on the overall service quality and to estimate the service quality based on various input features. To achieve this, questions were framed to assess the impact of variables such as gender, age, marital status, highest level of education, and frequency of hotel stays. The importance of each feature relative to the five SERVQUAL dimensions was investigated using machine learning models, specifically, CatBoost and Microsoft Azure Automated Machine Learning (AutoML) studio. This study revealed that both CatBoost and Azure AutoML identified the frequency of hotel stays and age group as the dominant predictors of service quality. Additionally, Azure AutoML highlighted the marital status as a more significant factor, suggesting its potential influence on customer preferences. The comparative modeling results demonstrated a strong alignment between the feature importance derived from CatBoost and Azure AutoML, enabling decision-makers to identify which dimensions are influenced by specific predictors and focus on targeted improvements.

Keywords:

service quality; SERVQUAL model; CatBoost; Azure Automated Machine Learning

1. Introduction

SERVQUAL is a well-established framework for assessing service quality across five dimensions: reliability, assurance, tangibility, empathy, and responsiveness. These dimensions are critical in understanding and improving customer satisfaction within service industries. Traditional methods of measuring service quality often rely on subjective assessments or manual data collection, which can be time-consuming and prone to biases. Machine learning (ML) models, however, offer a more objective and data-driven approach to predict and assess service quality. To enhance the effectiveness of SERVQUAL, machine learning techniques offer a more objective, scalable, and efficient approach for modeling and predicting service quality across its dimensions. Cloud computing platforms have gained popularity recently due to their scalability, flexibility, accessibility, and collaboration. One cloud platform is Microsoft Azure, which offers various cloud services, including automated machine learning (AutoML). AutoML simplifies and accelerates the process of building, training, and deploying ML models. It also automatically selects the features and carries out hyperparameter tuning without extensive manual tuning of hyperparameters. This ability enhances its applicability across domains, from engineering to business applications [1]. It generally does not require coding as it enables users to explore multiple algorithms and configurations efficiently and effectively. It often results in high-performing models that align with best practices in machine learning [2]. Complementing the automated modeling process, gradient-boosting algorithms, such as CatBoost, can handle categorical data without extensive programming. One advantage of this algorithm is that it prevents overfitting through techniques like ordered boosting and oblivious trees [3]. It can handle categorical data directly without any encoding. Besides this, it can capture complex nonlinear relationships in data, enhancing accuracy. It also supports GPU acceleration, which makes it practical for handling big data [4]. Figure 1a,b show the architecture of Azure AutoML and CatBoost.

The comparative modeling of SERVQUAL dimensions comprises five distinct factors, as shown in Figure 2. It pertains to an organization’s capacity to address customer requisites [7,8,9]. Reliability relates to an organization’s proficiency in providing persistent and precise services [10]. Assurance relates to an organization’s positive declaration intended to give confidence, or, in other words, a promise [11,12]. Tangibility refers to the physical aspects of the service, which can include the appearance of the facilities, equipment, and personnel [13]. Empathy relates to the ability to understand and share the feelings of another [14,15]. Responsiveness pertains to reacting quickly and positively [12,15].

The following section summarizes the literature on the application of SERVQUAL in different service-based industries and the application of varying ML algorithms for predictive modeling. Park et al. [16] propose a decision-support framework to evaluate hotel service quality by analyzing online reviews. The framework identifies critical service attributes, conducts sentiment analysis to gauge guest satisfaction, benchmarks service quality across hotels, and assesses attribute-specific quality associations. German et al. [17] examined factors influencing consumers in the Philippines to select package delivery services during the COVID-19 pandemic, utilizing the pro-environmental planned behavior theory and SERVQUAL. Lizarelli et al. [18] propose an integrative framework combining SERVQUAL, Analytical Kano (A-Kano), and Quality Function Deployment (QFD) with fuzzy techniques to address imprecision and uncertainty in customer perception data. The framework involves four phases: identifying quality attributes, integrating SERVQUAL and A-Kano with a Fuzzy Inference System (FIS), linking outputs with QFD using a 2-tuple fuzzy linguistic representation, and prioritizing improvement projects. It was tested in an entrepreneurial education firm; the approach enhances service quality assessment and prioritizes technical requirements, offering a novel methodology for integrating SERVQUAL, A-Kano, and QFD to advance service quality improvements. Adler et al. [19] examined a fundamental limitation in Gradient Boosting Machines (GBMs), where base learners (usually decision trees) tend to bias feature importance (FI) toward high-cardinality categorical variables. Despite their predictive solid performance, this bias impacts FI measures. The study introduced a cross-validated, unbiased base learner approach that mitigated this issue with minimal computational overhead, improving the FI accuracy while maintaining the GBM’s predictive power across synthetic and real-world datasets. Rivero et al. [20] examined how customer satisfaction (CS) is influenced during disruptive events, using the SERVQUAL model to mediate the effects of service innovation (SI) and service recovery (SR) on CS. In the context of Typhoon Odette in the Philippines, the study analyzed responses from 584 participants, testing seven hypotheses with Partial Least Squares Structural Equation Modeling. The findings revealed that SERVQUAL mediates the relationship between SI and CS partially, and fully mediates SR to CS, but the direct SR-CS relationship is unsupported. The results indicated that during massive disruptions, CS relies more on human-centric SERVQUAL dimensions than product restoration alone, emphasizing that SR efforts should focus on SERVQUAL aspects to enhance CS effectively. Stefano et al. [21] discussed the importance of quality in determining both product or service performance and customer satisfaction, emphasizing how consumers’ perceptions shape their overall assessment of a service. Given the abstract nature of service quality—driven by intangibility, heterogeneity, and inseparability—this study defined it as perceived by the customer and highlighted the gap between expectations and actual perceptions. Customer expectations, rooted in perceived needs, can differ widely from real needs, thus shaping satisfaction. To address quality in service provision, this paper evaluated a large hotel’s service quality using fuzzy SERVQUAL and fuzzy AHP, revealing significant improvement areas in customer satisfaction. Rosário et al. [22] reviewed 74 studies on AutoML, highlighting its benefits for businesses. AutoML reduces time and resources for developing models, accelerates decision-making, and enables the creation of accurate predictive models. It also improves model performance, enhances accessibility, and democratizes innovation. As businesses grow, AutoML scales to handle larger datasets, driving efficiency, accuracy, and innovation. Paladino et al. [23] assessed the performance of three AutoML tools—PyCaret, AutoGluon, and AutoKeras—on heart disease datasets. The study compared these tools to traditional machine learning models and found that AutoML tools outperformed the conventional models, with AutoGluon achieving the highest accuracy (78–86%). While AutoML simplified model creation, the study highlighted the need to address its limitations. The findings suggested that AutoML could significantly improve heart disease diagnosis and prevention. Abdulrab et al. [24] emphasized the significance of each SERVQUAL dimension and discussed the challenges and technological integrations faced by the hospitality industry. Table 1 explains the major contributions from references.

The literature review underscores the extensive research conducted on assessing SERVQUAL dimensions using various machine learning (ML) algorithms. However, a significant gap exists in the comparative modeling of SERVQUAL dimensions using cloud platforms like Azure AutoML in conjunction with advanced boosted algorithms such as CatBoost. Existing studies have primarily relied on traditional ML approaches or fuzzy-based techniques, leaving unexplored the potential of integrating these advanced tools for a more streamlined and accurate analysis. This study addresses the identified gap by employing a novel approach to model SERVQUAL parameters using Azure AutoML and CatBoost with hotel customer data. Azure AutoML automates feature selection, hyperparameter tuning, and model evaluation, offering a coding-free and efficient solution for building high-performing models. CatBoost complements this by excelling in handling categorical data without the need for encoding and mitigating overfitting through techniques such as ordered boosting and oblivious trees. Together, these tools ensure robust and reliable results by enabling the cross-validation of findings and accurately identifying key predictors of service quality. The novelty of this study lies in its unique combination of Azure AutoML and CatBoost, an approach not previously explored in the literature. While prior research has evaluated SERVQUAL parameters through traditional ML algorithms or fuzzy-based techniques, this integration offers a new perspective by combining the automation and scalability of cloud platforms with the precision of advanced gradient-boosting algorithms. This approach not only streamlines the modeling process but also enhances accuracy, effectively addressing categorical data challenges and preventing overfitting. By bridging this gap, the study establishes a comparative dimension modeling framework using cutting-edge tools, paving the way for scalable, efficient, and accurate service quality assessments in the hospitality industry.

2. Methods

2.1. Survey Methodology

Using an online survey with 617 complete responses from respondents, this study leverages Azure AutoML and CatBoost regressor ensemble models to capture the complex patterns of different dimensions and demographic features. It also uses different predictors to estimate customer satisfaction scores. The final response rate was calculated as the percentage of completed responses relative to the total sample contacted. A response rate of 28.04% (617 complete responses out of 2200 targeted respondents contacted via online survey) was achieved for this study. The response rate of similar studies ranges between 25% and 40%. To improve response rates and minimize non-response bias, a structured follow-up process was implemented. Participants who did not respond received up to 3 reminders at intervals of 72 h (3 days) post-initiation of the survey. A reminder was sent each week for one month (4 additional reminders) to respondents who did not respond to the initial invitation. The reminders were sent to encourage participation while respecting respondent burden, aiming to maximize complete and reliable data collection. Partial responses were also tracked, where respondents answered only a portion of the survey’s questions. The partial response rate was 25.21% (208 out of 825), reflecting the proportion of respondents who initiated but did not complete the entire survey. There were 825 total respondents (partial responses and complete responses). A total of 208 respondents provided partial responses (post the total 7 reminders). For analysis, techniques such as imputation were not applied to partial responses to avoid introducing artificial consistency; only complete cases were used in primary analysis. The statistical power is calculated to be as high as 0.98, as shown in Table 2, to showcase robustness. The significance level was put as a default of 5%. This detailed approach to sampling and data quality monitoring provided a comprehensive and reliable dataset that supports the validity of our findings.

2.2. Analysis of Data

The dataset for analysis consists of 10 columns and 628 rows comprising gender, age group, marital status, the highest level of education, and frequency of hotel stays as inputs and the 5 dimensions of SERVQUAL as the dependent variables. The unique value counts of each field are shown in Table 3.

The influence of various parameters, such as gender, age, marital status, highest level of education, and frequency of stays, was analyzed to determine which factor had the most significant impact on each of the 5 SERVQUAL dimensions and the total customer satisfaction score. This investigation was conducted using AutoML and CatBoost, which were implemented within a local Python environment. The selected hyperparameters for CatBoost are presented in Table 4, and the workflow of AutoML in Azure AI studio is shown in Figure 3. The hyperparameters were selected on the basis of (i) highest R square on the validation data and on the k fold (k = 10) cross-validation method, (ii) optimized adjusted R square (the difference between R square and adjusted R square is least), and (iii) least MAPE (mean average percentage error of the predicted vs. actual satisfaction score on the validation data). This ensures the selected model is optimized without being overfit or underfit.

The configuration of Azure AI studio includes multiple elements that work in parallel with Azure ML and other Azure services. Setting up a managed network is required for setting up Azure AI studio. This helps in creating a boundary of strong network security for the different AI applications. The key components that are used for the setup are as follows: (a) Azure Machine Learning Workspace: this helps in providing the core service where the Azure AI studio is being set up; (b) Azure Key Vaults: this is set up to securely manage encryption keys required to ensure the security of the data and the models built on top of it; (c) Azure Storage: this is set up to ensure that the data marts, intermediate tables, models, and the aggregated output are stored securely in the cloud; (d) Azure Application Insights: this is set up to monitor the performance and usage of the applications to ensure the resources and the nodes are being used in an optimized way.

3. Results

3.1. CatBoost Results

The CatBoost model was trained with the hyperparameters shown in Table 3 to extract feature importance for each dependent variable: reliability, assurance, tangibility, empathy, and responsiveness. The feature importance showed that frequency of hotel stays, marital status, age group, and gender strongly influence the dependent variables, with frequency of stays contributing the maximum influence. Similarly, the influence of each of the dimensions, namely, reliability, assurance, tangibility, empathy, and responsiveness, is regressed against the total score. The CatBoost decision tree, as shown in Figure 4, and the splitting rule, shown in Table 5, helps us understand the features in depth. As shown in Figure 5, a correlation heatmap helps us understand the correlation between the SERVQUAL parameters and the total score. Figure 6a–e show the feature importance for each SERVQUAL parameter, and it is concluded that reliability and assurance have the maximum influence on the total score, as evident from the output.

3.2. Azure Auto ML Results

The feature importance extracted from Azure Auto ML showed that the frequency of stays, marital status, age group, and gender strongly influence the dependent variables, with the frequency of stays contributing the maximum influence, as shown in Figure 7a–e.

A comparison of both models revealed that both CatBoost and Azure AutoML identified the frequency of hotel stays and age group as the dominant features. However, Azure AutoML has assigned slightly higher importance to the marital status, suggesting that it might be more significant in customer preferences than initially expected. Azure AutoML identified additional features that influenced the dependent parameters. This could be an option where in-depth analysis is required, analyzing the weight of each variable against the dependent variable. These insights show how advanced machine learning techniques, like those used here, can also be valuable in other areas [25]. For example, similar models could be applied to materials analytics to study polymer-based materials. By analyzing the influence of different variables—such as stress, strain, or shape recovery—on material behavior, these models could help us better understand how these materials perform under various conditions [26,27]. Table 6 presents the performance metrics of two predictive models, the CatBoost Gradient Boosting Model and the Azure Automated ML Model, evaluated based on their R², RMSE, and MAPE values. The Azure Automated ML Model demonstrates superior performance, with an R² value of 98.34%, indicating a stronger correlation between predicted and actual values compared to the CatBoost Gradient Boosting Model, which has an R² of 91.50%. Additionally, the Azure model achieves a lower RMSE of 1.37 and MAPE of 1.17, highlighting its higher accuracy and precision in predictions. In contrast, the CatBoost model exhibits an RMSE of 2.10 and MAPE of 1.72, showing relatively higher error rates. These results underline the Azure Automated ML Model’s effectiveness in delivering more accurate predictions with minimal errors.

4. Conclusions

Several service marketing researchers have emphasized the need to measure service quality because it greatly impacts customer satisfaction and customer retention in the long run. The present study attempts to evaluate if all the dimensions of SERVQUAL can be considered equal in their impact on the overall quality of the service. The research findings confirm that reliability and assurance are the most important factors affecting the environment of the traveler’s decision-making process, especially in service quality. Both reliability and assurance are further influenced by the frequency of stays, marital status, and gender. The investigation results enable decision-makers to determine which dimension depends on specific predictors, allowing them to focus on targeted improvements. This lets the hotel manager focus on cardinal parameters like frequency of stays, gender, and marital status. For example, the hotel can launch a targeted marketing campaign focusing on a particular gender or marital status. This study also focuses on estimating the service quality of a respondent using the different predictor variables, leveraging different nonlinear machine learning models, including ensemble techniques. Its findings enrich the corpus of scholarly work centered on service quality by penetrating antecedents and subsequent repercussions within the Indian hotel industry. The revelations gleaned from this study beckon for further scholarly inquiry, particularly in shifting demographic and geographic parameters, their complex interplay, and their nuanced effect on service quality dynamics.

One of the limitations of this study is that a large-scale quantitative study with larger samples for validating the findings can be expensive and time-consuming, requiring significant resources for planning, data collection, and analysis. As a future study, the generalization of the findings could be tested by replicating the methodology across sectors to compare and validate the insights and findings. Explainable AI (XAI) is normally extensively used to bring AI to nontechnical end users including matrices like LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive explanations), TCAV (Testing with Concept Activation Vectors), DALEX (Descriptive Machine Learning Explanations), and DICE (Diverse Counterfactual Explanations). Since our research focuses on ML-based ensemble techniques and less on AI models, explainable AI matrices are not incorporated as part of this submission. However, this has been included as a suggestion for future research.

Author Contributions

Conceptualization, A.K. and S.G.K.; methodology, A.K. and S.G.K.; software, S.K.S. and N.D.B.; formal analysis, A.K., S.K.S., and N.D.B.; investigation, A.K., S.K.S., and N.D.B.; resources, S.K.S. and N.D.B.; data curation, S.K.S. and N.D.B.; writing—original draft preparation, A.K. and S.G.K.; writing—review and editing, S.K.S. and N.D.B.; visualization, A.K. and S.K.S.; supervision, S.K.S. and N.D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The permanent link to the code/repository used for service quality prediction in this study is publicly available at the following link. Repository name: Mendeley Data. https://data.mendeley.com/datasets/h489xpc53b/2 (accessed on 21 January 2025).

Acknowledgments

The authors acknowledge Rohan Das, Consultant in E&Y, for his help in the generation of AWS outputs including pipelines.

Conflicts of Interest

The author Avisek Kundu is employed by Ernst & Young LLP. The author Nitesh Dhar Badgayan is employed by the company KPMG, India. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The other authors declare that they have no conflicts of interest.

References

Rane, N.L.; Mallick, S.K.; Kaya, O.; Rane, J. Tools and frameworks for machine learning and deep learning: A review. In Applied Machine Learning and Deep Learning: Architectures and Techniques; Deep Science Publishing: London, UK, 2024; pp. 80–95. [Google Scholar]
Salehin, I.; Islam, M.S.; Saha, P.; Noman, S.M.; Tuni, A.; Hasan, M.M.; Baten, M.A. AutoML: A systematic review on automated machine learning with neural architecture search. J. Inf. Intell. 2024, 2, 52–81. [Google Scholar] [CrossRef]
Liu, B.; Sun, Y.; Gao, L. Enhancing Groundwater Recharge Prediction: A Feature Selection-Based Deep Forest Model With Bayesian Optimisation. Hydrol. Process. 2024, 38, e15309. [Google Scholar] [CrossRef]
Singh, P. Systematic review of data-centric approaches in artificial intelligence and machine learning. Data Sci. Manag. 2023, 6, 144–157. [Google Scholar] [CrossRef]
Choi, W.; Choi, T.; Heo, S. A Comparative Study of Automated Machine Learning Platforms for Exercise Anthropometry-Based Typology Analysis: Performance Evaluation of AWS SageMaker, GCP VertexAI, and MS Azure. Bioengineering 2023, 10, 891. [Google Scholar] [CrossRef]
Chang, W.; Wang, X.; Yang, J.; Qin, T. An improved CatBoost-based classification model for ecological suitability of blueberries. Sensors 2023, 23, 1811. [Google Scholar] [CrossRef]
Gazi, M.A.I.; Islam, M.A.; Masud, A.A.; Senathirajah, A.R.B.S.; Biswas, S.; Shuvro, R.A. The moderating impacts of COVID-19 fear on hotel service quality and tourist satisfaction: Evidence from a developing country. Cogent Soc. Sci. 2024, 10, 2331079. [Google Scholar] [CrossRef]
Sahin, A.; Imamoglu, G.; Murat, M.; Ayyildiz, E. A holistic decision-making approach to assessing service quality in higher education institutions. Socio-Econ. Plan. Sci. 2024, 92, 101812. [Google Scholar] [CrossRef]
Nur, H.R. The Competence in the Digital Era in Improving Public Service Performance. Entrep. J. Bisnis Manaj. Kewirausahaan 2024, 5, 61–72. [Google Scholar] [CrossRef]
Khan, A.; Talukder, M.S.; Islam, Q.T.; Islam, A.N. The impact of business analytics capabilities on innovation, information quality, agility and firm performance: The moderating role of industry dynamism. VINE J. Inf. Knowl. Manag. Syst. 2024, 54, 1124–1152. [Google Scholar] [CrossRef]
Elaigwu, M.; Abdulmalik, S.O.; Talab, H.R. Corporate integrity, external assurance and sustainability reporting quality: Evidence from the Malaysian public listed companies. Asia-Pac. J. Bus. Adm. 2024, 16, 410–440. [Google Scholar] [CrossRef]
Tuan, A.; Corciolani, M.; Giuliani, E. Being reassuring about the past while promising a better future: How companies frame temporal focus in social responsibility reporting. Bus. Soc. 2024, 63, 626–667. [Google Scholar] [CrossRef]
Ismail, A.; Bakri, M.H.; Rusli, N.B.; Bakar, M.A.B.A.; Othman, H. Relationship between Service Quality and Customer Satisfaction: A Systematic Literature Review. Resmilitaris 2023, 13, 262–281. [Google Scholar]
Kim, J.J.; Lee, Y.; Han, H. Exploring competitive hotel selection attributes among guests: An importance-performance analysis. J. Travel Tour. Mark. 2019, 36, 998–1011. [Google Scholar] [CrossRef]
Parasuraman, A.; Berry, L.L.; Zeithaml, V.A. Perceived service quality as acustomer-based performance measure: An empirical examination of organizational barriers using an extended service quality model. Hum. Resour. Manag. 1991, 30, 335–364. [Google Scholar] [CrossRef]
Park, J.; Lee, B.K. An opinion-driven decision-support framework for benchmarking hotel service. Omega 2021, 103, 102415. [Google Scholar] [CrossRef]
German, J.D.; Redi, A.A.N.P.; Prasetyo, Y.T.; Persada, S.F.; Ong, A.K.S.; Young, M.N.; Nadlifatin, R. Choosing a package carrier during COVID-19 pandemic: An integration of pro-environmental planned behavior (PEPB) theory and Service Quality (SERVQUAL). J. Clean. Prod. 2022, 346, 131123. [Google Scholar] [CrossRef]
Lizarelli, F.L.; Osiro, L.; Ganga, G.M.; Mendes, G.H.; Paz, G.R. Integration of SERVQUAL, Analytical Kano, and QFD using fuzzy approaches to support improvement decisions in an entrepreneurial education service. Appl. Soft Comput. 2021, 112, 107786. [Google Scholar] [CrossRef]
Adler, A.I.; Painsky, A. Feature importance in gradient boosting trees with cross-validation feature selection. Entropy 2022, 24, 687. [Google Scholar] [CrossRef]
Rivero, D.M.; Suson, R.; Arnejo, A.; Atibing, N.M.; Aro, J.L.; Wenceslao, C.; Burdeos, A.; Yamagishi, K.; Ocampo, L. Service recovery and innovation on customer satisfaction amidst massive typhoon-induced disruptions: The mediating role of SERVQUAL. Int. J. Disaster Risk Reduct. 2023, 99, 104130. [Google Scholar] [CrossRef]
Stefano, N.M.; CasarottoFilho, N.; Barichello, R.; Sohn, A.P. A fuzzy SERVQUAL based method for evaluated of service quality in the hotel industry. Procedia CIRP 2015, 30, 433–438. [Google Scholar] [CrossRef]
Rosário, A.T.; Boechat, A.C. How Automated Machine Learning Can Improve Business. Appl. Sci. 2024, 14, 8749. [Google Scholar] [CrossRef]
Paladino, L.M.; Hughes, A.; Perera, A.; Topsakal, O.; Akinci, T.C. Evaluating the performance of automated machine learning (AutoML) tools for heart disease diagnosis and prediction. AI 2023, 4, 1036–1058. [Google Scholar] [CrossRef]
Abdulrab, M.; Hezam, N. Service Quality and Customer Satisfaction in the Hospitality Sector: A paper review and future research directions. Libr. Prog. Int. 2024, 44, 7486–7503. [Google Scholar]
Kanaparthi, V. Transformational application of Artificial Intelligence and Machine learning in Financial Technologies and Financial services: A bibliometric review. arXiv 2024, arXiv:2401.15710. [Google Scholar] [CrossRef]
Pradhan, S.; Sahu, S.K.; Pramanik, J.; Badgayan, N.D. An insight into mechanical & thermal properties of shape memory polymer reinforced with nanofillers; A critical review. Mater. Today Proc. 2022, 50, 1107–1112. [Google Scholar]
Sahu, S.K.; Sreekanth, P.R. Artificial neural network for prediction of mechanical properties of HDPE based nanodiamond nanocomposite. Polymer 2022, 46, 614–620. [Google Scholar]

Figure 1. Architectures of (a) Azure AutoML [5] and (b) CatBoost [6].

Figure 2. SERVQUAL Model: factors influencing service quality and customer satisfaction [9].

Figure 3. Workflow of AutoML in Azure AI Studio.

Figure 4. Decision tree model for predicting service quality based on reliability and assurance.

Figure 5. Correlation heat map of service quality dimensions and total score.

Figure 6. Feature importance results from CatBoost for service quality dimensions: (a) reliability, (b) assurance, (c) tangibility, (d) empathy, and (e) responsiveness.

Figure 7. AutoML feature importance results for service quality dimensions for (a) reliability, (b) assurance, (c) tangibility, (d) empathy, and (e) responsiveness.

Table 1. Major contributions from references.

Major Contribution	Description	Reference
Innovative Framework	The study introduces a novel approach to modeling SERVQUAL dimensions by combining Azure AutoML and CatBoost for enhanced service quality prediction.	[1,2,3]
Automation of Service Quality Modeling	It highlights how AutoML automates processes such as feature selection, hyperparameter tuning, and model evaluation, making service quality assessments more efficient.	[1,4]
Handling Categorical Data	CatBoost is utilized to handle categorical data directly without the need for encoding, preventing overfitting and improving model accuracy.	[3,4]
Scalability and Efficiency	The integration of Azure AutoML with CatBoost provides a scalable and efficient framework for service quality modeling that can be applied across different industries.	[1,2,4]
Cross-Validation for Reliability	The approach emphasizes the use of cross-validation to ensure that findings are robust and reliable, improving the trustworthiness of results.	[4,5]
Improved Accuracy in Prediction	By combining AutoML with gradient-boosting algorithms, the study improves the accuracy of service quality predictions in comparison to traditional methods.	[2,4,6]
Categorical Data Handling in Service Quality	CatBoost’s ability to handle categorical data without the need for encoding and its prevention of overfitting are key features that benefit service quality modeling.	[3,6]
Suitability for Cross-Domain Applications	The use of Azure AutoML makes the framework applicable across various service industries, from hospitality to engineering, due to its flexibility and automation.	[2,5]
Modeling SERVQUAL Dimensions	The study successfully applies the SERVQUAL model to assess customer satisfaction, using a combination of modern ML techniques to predict and analyze service quality dimensions.	[7,8,9]

Table 2. Dependent variable dimension score.

Source	Type III Sum of Squares	df	Mean Square	F	Sig.	Partial Eta Squared	Noncent. Parameter	Observed Power
Corrected Model	166,179.8	4	41,544.96	2308.63	0.02	0.747	9234.518	0.98
Intercept	1,165,684	1	1,165,684	64,785.41	0.019	0.954	64,785.41	0.981
Category	166,179.8	4	41,544.96	2308.63	0.02	0.747	9234.518	0.98
Error	56,325.94	3130	17.996
Total	1,388,352	3135
Corrected Total	222,205.8	3134

Table 3. Value count in each field.

Category	Values
Gender	Male: 399, Female: 228
Age Group	25–34: 490, 18–24: 129, 35–44: 8
Marital Status	Married: 536, Single: 91
Highest Level of Education	Upto Graduate: 341, Masters and above: 286
Frequency of Hotel Stays (In 1 year)	1: 123, 3: 106, 0: 102, 2: 82, 6: 75, 4: 72, 5: 67

Table 4. Hyperparameters considered.

Hyperparameter	Values
Iterations	500, 600, 700, 800
Depth	3, 4, 5, 6, 7
Learning Rate	0.01, 0.03, 0.05
L2 Leaf Regularization	1, 3, 5, 7

Table 5. Decision tree splitting rules for service quality based on reliability and assurance.

Rule	DMReliable Condition	DMAssurance Condition	Value
1	DMReliable ≤ 30.5	DMAssurance ≤ 15.5	−0.565
2	DMReliable ≤ 30.5	DMAssurance ≤ 15.5	−0.137
3	DMReliable ≤ 30.5	DMAssurance > 15.5	−0.104
4	DMReliable ≤ 30.5	DMAssurance > 15.5	0.148
5	DMReliable > 30.5	DMAssurance ≤ 15.5	0.000
6	DMReliable > 30.5	DMAssurance ≤ 15.5	0.195
7	DMReliable > 30.5	DMAssurance > 15.5	0.000
8	DMReliable > 30.5	DMAssurance > 15.5	0.515

Table 6. Performance metrics of CatBoost Gradient Boosting and Azure Automated ML Model.

Sr. No.	Models	R Square	RMSE (Root Mean Square Error)	MAPE (Mean Average Percentage Error)
1	CatBoost Gradient Boosting Model	91.50%	2.10	1.72
2	Azure Automated ML Model	98.34%	1.37	1.17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kundu, A.; Kundu, S.G.; Sahu, S.K.; Badgayan, N.D. Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality. Computers 2025, 14, 32. https://doi.org/10.3390/computers14020032

AMA Style

Kundu A, Kundu SG, Sahu SK, Badgayan ND. Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality. Computers. 2025; 14(2):32. https://doi.org/10.3390/computers14020032

Chicago/Turabian Style

Kundu, Avisek, Seeboli Ghosh Kundu, Santosh Kumar Sahu, and Nitesh Dhar Badgayan. 2025. "Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality" Computers 14, no. 2: 32. https://doi.org/10.3390/computers14020032

APA Style

Kundu, A., Kundu, S. G., Sahu, S. K., & Badgayan, N. D. (2025). Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality. Computers, 14(2), 32. https://doi.org/10.3390/computers14020032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Azure Automated Machine Learning and CatBoost Gradient Boosting Algorithm for Service Quality Prediction in Hospitality

Abstract

1. Introduction

2. Methods

2.1. Survey Methodology

2.2. Analysis of Data

3. Results

3.1. CatBoost Results

3.2. Azure Auto ML Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI