1. Introduction
As cities grow in size and complexity, understanding and enhancing the well-being of urban residents has become a crucial objective for planners and policymakers [
1,
2,
3]. Urban happiness, or the general satisfaction of residents with their environment and living conditions, is shaped by a variety of factors, including traffic density, noise levels, air quality, green space availability, and the cost of living [
4,
5,
6]. Predicting urban happiness based on these variables poses significant challenges, due to the intricate and often nonlinear interactions between them [
7,
8,
9]. Consequently, advanced methods are needed to model these relationships and generate accurate predictions.
Traditional machine learning (ML) models, such as regression-based approaches, often fail to capture the complex interactions between urban factors. While decision trees and other models provide better performance, they still face limitations when dealing with highly nonlinear relationships [
10,
11]. Deep learning (DL) models, with their ability to learn intricate patterns, have shown promise in similar tasks. However, they typically require large datasets, and for tabular data, they may not always perform optimally without significant tuning [
12,
13,
14]. To address these challenges, gradient boosting machines (GBM) have emerged as a tool for structured data by building an ensemble of decision trees, iteratively refining predictions by correcting errors from previous iterations. This method effectively captures interactions between features and can handle both linear and nonlinear relationships in the data. However, GBMs can still fall short when tasked with recognizing more abstract patterns and the deeper relationships that neural networks excel at identifying [
15,
16,
17].
Neural networks (NN), particularly in the context of deep learning, are designed to capture complex, nonlinear relationships through layers of neurons that progressively learn from data [
18]. This allows NNs to model highly abstract features and latent variables [
19]. However, when applied to structured tabular data, standalone NNs can face difficulties in efficiently learning from the data unless carefully tuned and paired with extensive feature engineering [
20]. Given the complementary strengths of these two methods, we propose a GBM + NN hybrid model that combines the ensemble learning characteristics of GBMs with the representational capabilities of neural networks.
In this hybrid approach, the GBM serves as the primary model for generating the initial predictions by capturing interactions between urban variables. The neural network is then employed as a meta-learner, refining these predictions by learning in-depth relationships. This layered approach enables the model to handle structured data efficiently, while uncovering implicit patterns that would be missed by standalone methods. This hybrid GBM + NN model offers a novel solution for urban happiness prediction, leveraging the power of both ensemble learning and deep feature extraction. It is particularly well-suited to this task because it effectively captures both direct and indirect relationships between diverse urban indicators, such as traffic density, air quality, green space, healthcare access, and cost of living [
21]. These factors, often interdependently, influence urban happiness in complex ways, and the hybrid model’s ability to model both shallow and deep relationships provides a more nuanced understanding of their impact.
The use of such hybrid models in urban analytics is still relatively unexplored, with most previous studies relying either on traditional ML techniques or standalone deep learning models. Many studies have focused on individual factors, such as air quality or traffic congestion, and their impact on specific outcomes like health or economic productivity [
22,
23]. While these studies offer valuable insights, they fall short of capturing the multifaceted nature of urban happiness, which depends on a combination of environmental, infrastructural, and socio-economic factors [
24]. Furthermore, existing research has primarily applied either machine learning or deep learning in isolation, without exploring the potential of hybrid models that combine the strengths of both. This study addresses this gap by developing a GBM + NN hybrid model that integrates the structured data handling capabilities of GBM with the deep learning abilities of neural networks.
Our model improves prediction accuracy, while providing deeper insights into the key factors influencing urban happiness. In doing so, we contribute to both the urban analytics and machine learning fields by demonstrating the effectiveness of hybrid models for complex prediction tasks. Our contributions are threefold: First, we introduce a novel GBM + NN hybrid model that capitalizes on the strengths of both ensemble learning and neural networks to improve the predictive accuracy of urban happiness models. Second, we conducted a thorough performance evaluation, comparing the hybrid model against traditional machine learning models such as random forests and standalone neural networks. The results demonstrated the superiority of the hybrid model in terms of accuracy and generalization. Finally, we provide an in-depth analysis of the factors contributing to urban happiness, offering actionable insights that urban planners and policymakers can use to enhance the quality of life in cities.
The remainder of this paper is structured as follows:
Section 2 reviews existing research on urban happiness prediction and the application of machine learning models in urban analytics.
Section 3 discusses the architecture of the GBM + NN hybrid model.
Section 4 presents the research methodology, including dataset explanation, data preprocessing, model development, and evaluation.
Section 5 reports the experimental results and compares the performance of the hybrid model with other techniques.
Section 6 concludes with a summary and suggestions for future research.
2. Literature Survey
The prediction of urban happiness has gained increased attention in the field of urban analytics, due to its implications for public policy and urban planning [
25]. Researchers have long attempted to understand the factors influencing happiness, satisfaction, and overall well-being in urban settings [
26]. Traditionally, studies in this area have relied on social science methodologies, including surveys, statistical analysis, and econometric models. However, the complexity of modern urban systems, combined with the growing availability of large-scale urban data, has prompted a shift toward using ML and DL models to tackle this problem [
27]. This section reviews key developments in urban happiness prediction and discusses the role of ML and DL models in urban analytics, particularly in relation to urban well-being.
2.1. Urban Happiness Prediction: Traditional Approaches
Historically, urban happiness prediction was approached using conventional statistical methods. Early research predominantly utilized multiple linear regression and other basic econometric techniques to explore relationships between various urban indicators and happiness outcomes [
28]. In these studies, researchers typically focused on specific factors, such as economic performance, health services, housing quality, or pollution levels, and their direct influence on residents’ perceived happiness. One of the most widely recognized frameworks is the gross national happiness (GNH) index, which incorporates subjective well-being metrics to assess societal happiness across regions [
29]. While this index primarily focuses on national-level data, it has inspired urban-level studies, particularly those focused on sustainability and livability. These traditional approaches, however, have often been limited by their reliance on linear assumptions, which fail to capture the complex interdependencies between environmental, social, and economic factors that contribute to urban happiness [
30]. Several urban happiness models based on survey data, such as those used by the World Happiness Report, have provided insights into the effects of income, health, and social support. However, these models face limitations in terms of scalability and data availability, as they rely heavily on self-reported data, which may not fully capture the dynamic, multifaceted nature of happiness in urban settings [
31,
32,
33]. Additionally, these models often assume a linear relationship between independent and dependent variables, leading to oversimplified interpretations of the drivers of urban happiness.
2.2. Machine Learning in Urban Analytics: From Prediction to Insight
In recent years, machine learning has emerged as an effective tool in urban analytics, offering new possibilities for predicting complex outcomes, including happiness and well-being. ML models, particularly those that can capture non-linear relationships, have been increasingly applied to urban datasets to address a variety of challenges, such as traffic management, pollution control, and public health forecasting [
34]. Decision-tree-based models, such as random forest (RF) and GBM, have shown promise in capturing the complex, non-linear interactions between various urban features and outcomes. These models are well-suited to structured data, where the relationships between variables are not straightforward. In the context of urban happiness prediction, decision trees have been used to evaluate the impact of specific urban factors like air quality, green space, and noise levels on residents’ well-being. RFs provide an ensemble method that mitigates the risk of overfitting, while improving prediction accuracy, which is essential when dealing with highly interrelated urban factors [
35]. GBMs, an extension of this approach, improve model performance by iteratively adjusting the weak learners, reducing both bias and variance [
36]. One prominent study using RFs explored the relationship between urban green spaces and subjective well-being across multiple cities. The model successfully captured the complex interactions between environmental and social variables, highlighting the importance of non-linear ML models in urban analytics. However, while tree-based models are effective at managing interactions between structured data, they are still limited in their ability to capture implicit relationships in the data, which neural networks can provide [
37].
2.3. Deep Learning in Urban Analytics: Unlocking Complex Patterns
In addition to tree-based models, DL techniques have been applied in urban analytics to model more complex, non-linear relationships between features. Neural networks, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have become popular for their ability to handle large datasets and extract high-level feature representations [
38]. In the realm of urban analytics, DL models have been employed in a wide range of applications. For example, CNNs have been utilized in studies involving spatial data, such as predicting air quality and noise levels across urban regions. These models excel at capturing spatial correlations by learning from structured grid data. Likewise, RNNs and their variants, such as long short-term memory (LSTM) networks, have been used to model temporal dependencies, such as predicting traffic congestion or energy consumption patterns [
39]. Furthermore, recent studies have demonstrated the power of DL in capturing intricate patterns in urban data. For instance, the integration of DL models with environmental and energy datasets has been shown to enhance prediction accuracy significantly, such as in the work by [
40], which highlighted the potential of DL techniques in sustainability analysis. However, the use of deep learning models in urban happiness prediction has been relatively limited. In studies where DL models have been applied, such as predicting well-being based on social media data or sensor networks, the results demonstrated the capacity of these models to uncover hidden patterns in the data. Nevertheless, these models often require extensive computational resources, and their performance can be sensitive to hyperparameter settings and model architectures, making them less accessible for many urban datasets [
41].
2.4. Hybrid Models: The Rise of GBMs and Neural Networks
Recent developments in machine learning have seen the emergence of hybrid models that combine ensemble methods like GBMs with DL techniques. These hybrid approaches aim to take advantage of the strengths of both model types: the GBM’s ability to handle structured, tabular data and the neural networks’ power in learning deep, abstract relationships [
42]. In the context of urban analytics, hybrid models have been applied to tasks such as urban traffic flow prediction and pollution level forecasting, where they have consistently outperformed standalone models [
43]. For example, hybrid models combining GBMs with RNNs have been employed to predict air quality across cities, demonstrating improved accuracy and robustness compared to traditional models. Such approaches have mainly focused on a single or limited number of features.
The application of GBM + NN hybrid models for predicting urban happiness remains an underexplored area. This study builds on the growing trend in hybrid models by applying a GBM + NN hybrid approach to predict urban happiness, filling a critical gap in the current research landscape. The combination of GBMs’ ability to handle structured features and neural networks’ ability to extract implicit patterns offers a promising solution to the complex task of urban happiness prediction. Although significant strides have been made in applying machine learning to urban analytics, there remain several gaps in the literature, particularly in the prediction of urban happiness. First, much of the existing research on urban happiness relied on traditional statistical models that are limited in their ability to capture nonlinear interactions between urban features. Second, while machine learning models such as decision trees and deep learning models have been applied to a variety of urban analytics tasks, they have rarely been combined in the context of happiness prediction. Therefore, hybrid models that combine ensemble methods with deep learning, such as the proposed GBM + NN hybrid model, offer a novel opportunity to enhance the prediction accuracy and provide insights into the relationships between urban features and happiness outcomes.
3. Integration of a Gradient Boosting Machine (GBM) and Neural Network (NN)
The proposed hybrid model leverages the complementary strengths of a GBM and NN. The GBM excels at capturing structured, tabular data and modeling nonlinear feature interactions through its iterative boosting approach. It identifies patterns and corrects residual errors at each stage. However, it may struggle to model latent relationships within the data. The NN, on the other hand, is particularly adept at learning implicit representations from data, due to its multi-layered architecture. This allows it to further refine the results by capturing nuanced relationships overlooked by the GBM.
In the proposed model, the GBM operates as the primary learner, generating an initial prediction by iteratively improving its performance on structured data features. These predictions, while accurate in capturing general feature relationships, may leave unexplored residuals, representing errors or overlooked complexities. The NN is then employed as a meta-learner to process these residuals and uncover implicit patterns. This two-stage process ensures that the predictive capacity of the model benefits from both structured feature interactions (from the GBM) and deeper, hierarchical feature extraction (from the NN). The details of the hybrid models are explained in the following subsections.
3.1. Gradient Boosting Machine (GBM)
A gradient boosting machine (GBM) is a supervised learning algorithm based on ensemble methods that builds models sequentially to optimize a specific objective function. At each step, the algorithm aims to minimize the prediction error by iteratively fitting weak learners, typically decision trees, to the residual errors of the current model. This iterative process is designed to improve the performance of the model incrementally, as described in detail by [
44]. The objective of the GBM is to minimize a specified loss function by combining weak learners in an additive fashion. The process begins with the initialization of the model. The initial model
is defined to minimize the empirical risk, which is expressed as (
1).
In this equation,
represents the target value for the
i-th data point, while
c is a constant used to initialize the model. The loss function
L measures the difference between the predicted and actual values, such as the squared error for regression tasks. The total number of data points in the dataset is denoted by
N. This initialization step ensures that the model begins with a baseline prediction that minimizes the overall empirical risk. Following initialization, the GBM constructs an additive model by iteratively combining weak learners
with the current model
. This additive structure is mathematically expressed as (
2).
Here,
M represents the total number of iterations or weak learners, and
is the learning rate, which controls the contribution of each weak learner to the final model. The function
represents the weak learner fitted at the
m-th iteration, and
denotes the model from the previous iteration. At each iteration, pseudo-residuals
are computed to guide the learning process. These pseudo-residuals are derived as the negative gradient of the loss function with respect to the predictions of the current model
, as shown in the (
3).
In this context,
represents the pseudo-residual for the
i-th data point at the
m-th iteration. The variable
refers to the predicted value for the
i-th data point produced by the current model. The weak learner
is subsequently fitted to these residuals by minimizing the squared error, which is formalized as (
4).
Here,
is the function that best fits the pseudo-residuals
for all data points
in the dataset. This step identifies the weak learner that minimizes the squared error between the pseudo-residuals and the model’s predictions. Once the weak learner
has been fitted, the model is updated by incorporating the weak learner’s contribution into the existing model. The update rule is given by (
5).
In this equation,
represents the updated model at the
m-th iteration, and
is the learning rate that scales the contribution of the weak learner
. This iterative process continues until a predefined number of iterations
M is reached or the loss function
L converges to a satisfactory level. The overall objective of the GBM is to minimize the loss function
L over all data points, which is expressed as (
6).
Through this process, the GBM ensures incremental improvement by addressing the residual errors at each step. By combining the contributions of all weak learners, the algorithm produces a final model that effectively minimizes the loss function.
3.2. Neural Networks (NN)
Neural networks (NN) consist of layers of neurons, where each layer transforms the input using a set of weights and biases. Each neuron applies a non-linear activation function to its input. The forward pass in a neural network for layer
l is given by the transformation presented in (
7).
In Equation (
7),
represents the pre-activation output of layer
l, where
is the weight matrix connecting the neurons of the current layer
l to the previous layer
. The term
is the activation vector from the previous layer, and
is the bias vector for the current layer. Here,
and
denote the number of neurons in layers
and
l, respectively. The activation function
introduces non-linearity into the neural network and is applied to the pre-activation vector
, as presented in (
8).
Here,
represents the activation vector of layer
l after applying the activation function
. Common choices for
include ReLU (
), sigmoid (
), and tanh (
). These activation functions allow the network to model non-linear relationships in the data. For regression tasks, the loss function is typically defined as the mean squared error (MSE), which quantifies the difference between the predicted output
and the true target
y. The MSE is given as presented in (
9).
In (
9),
represents the MSE loss, where
N is the total number of samples,
is the true value for the
i-th sample, and
is the corresponding predicted value. Backpropagation is used to compute the gradients of the loss function
L with respect to the weights
of the neural network. The gradient for layer
l is calculated as presented in (
10).
Here,
represents the gradient of the loss function
L with respect to the weight matrix
. The chain rule of differentiation is applied iteratively from the output layer
L back to the target layer
l, propagating the error signals through the network. The weights are then updated using the gradient descent optimization algorithm presented in (
11).
In (
11),
denotes the learning rate, a hyperparameter that determines the step size for weight updates. By iteratively updating the weights
in the direction that reduces
L, the neural network learns to generalize from the training data.
3.3. Integration of the GBM and NN
As presented in the
Figure 1, the diagram represents the integration of a GBM and NN for predicting urban happiness. This integration leverages the strengths of both models to enhance the predictive accuracy and capture complex interactions within datasets.
In the GBM model, training begins by sequentially constructing an ensemble of decision trees, where each tree corrects the errors made by the previous trees. The objective is to minimize a specified loss function by adding weak learners iteratively. The trained GBM model generates predictions denoted as . These predictions are represented in the diagram as GBM predictions. Next, residuals are calculated by computing the difference between the actual target values and the GBM predictions . This residual is denoted as . The residuals represent the errors or differences between the predicted and actual values, which the neural network will learn to model. The neural network is designed to capture complex patterns and relationships that the GBM model might have missed. The NN model generates predictions based on the GBM predictions, denoted as . These are represented in the diagram as NN predictions. Finally, the final prediction is obtained by combining the predictions from the GBM model and the NN model. This is denoted as .
3.4. Collaborative Working Mechanism of the Proposed Model
The proposed model leverages the complementary strengths of a gradient boosting machine (GBM) and neural network (NN) to enhance the predictive accuracy. The GBM captures structured feature interactions in tabular data, while the NN models the complex, latent patterns left unexplained by the GBM. This section details the mathematical and computational workflow of the hybrid model, using the case of urban happiness prediction as an illustrative example.
The dataset includes urban indicators as features: air quality index (
), green space area (
), traffic density (
), healthcare index (
), and cost of living index (
). The target variable (
y) represents the urban happiness score. For this example, the dataset as presented in (
12) is used.
The GBM initializes the predictions by taking the mean of the target variable, which serves as the starting point for subsequent refinements. The initial prediction is calculated as
, as presented in (
13).
Residuals are then computed to quantify the differences between the actual values and the initial predictions, as expressed as (
14).
For the given data, the residuals are presented in (
15).
A weak learner, in this case, a decision tree, is trained to predict the residuals. Assume the tree splits based on the
feature. The weak learner is defined as (
16).
where
is the mean of the
values, computed as (
17).
Using this formula, the weak learner predictions are presented in (
18).
The GBM then updates its predictions using the formula as presented in (
19).
where
is the learning rate set to
. After updating, the predictions are presented in (
20).
This iterative process is repeated for multiple rounds, refining the predictions further. After
M iterations, the final GBM predictions are obtained as presented in (
21).
Residuals from the GBM predictions are calculated to capture the unexplained variance, using (
22).
For the given dataset, these residuals are presented in (
23).
These residuals are passed to the NN for further modeling. The NN takes the GBM predictions as input and applies a transformation through its layers. The architecture of the NN includes a single hidden layer with weights
, bias
, and ReLU activation, defined as presented in (
24).
The input to the NN is given by (
25).
The NN computes the transformation of (
26).
resulting in (
27).
Applying the ReLU activation yields the output as presented in (
28).
The NN minimizes the residual error using the loss function as presented in (
29).
Through optimization, the NN adjusts its weights and biases to reduce this error. The final hybrid prediction is obtained by combining the outputs of GBM and NN, expressed as in (
30).
For the given data, the combined predictions are presented in (
31).
Therefore, the final predictions are presented in (
32).
This collaborative mechanism allows the proposed model to harness the GBM’s ability to model structured interactions and the NN’s capacity to capture implicit relationships. By addressing both macro-level feature dependencies and micro-level residual complexities, the proposed model achieves superior predictive performance, particularly for challenging datasets such as urban happiness prediction.
4. Research Methodology
This research adopted a hybrid methodological framework that intricately blended descriptive and predictive analyses to systematically address the objectives. The methodology was structured to validate the integrity and accuracy of the findings through a thorough examination of the factors contributing to urban happiness. The process encapsulated the complete life cycle of the research, from data collection to the derivation of actionable insights.
4.1. Data Collection and Preprocessing
At the outset, the City Happiness Index dataset was procured, comprising extensive data attributes such as decibel levels, traffic density, and green space area, among others. This dataset was fully developed, originated, and exclusively created by Emirhan Bulut at
kaggle.com, aceesed on 14 July 2024. It contains essential features and measurements from diverse cities worldwide, emphasizing factors that influence each city’s overall happiness score [
45]. Preprocessing was a critical initial step, where the raw data underwent rigorous cleaning and normalization to ensure uniformity and accuracy in the subsequent analyses. The pseudocode provided in Algorithm 1 delineates the algorithmic steps involved in this phase, ensuring systematic execution of these tasks.
Algorithm 1 Data Collection and Preprocessing Pipeline |
Require: : Raw City Happiness Index Dataset, : Set of Features , where n denotes the number of features |
Ensure: : Preprocessed Dataset |
- 1:
Load - 2:
for each feature do - 3:
Handle missing values using , where represents the chosen imputation strategy - 4:
Normalize feature to obtain , where and denote the mean and standard deviation of - 5:
Perform feature engineering to derive , where represents the transformation or extraction function applied to feature - 6:
end for - 7:
Store the resulting preprocessed dataset
|
The success of any machine learning model significantly hinges on the quality of the data used and the effectiveness of the preprocessing techniques applied. This section provides a detailed overview of the dataset utilized in this study, covering its composition, sources, and key features. Additionally, it elaborates on the preprocessing methods applied to prepare the data, including the handling of missing values, feature scaling, and encoding categorical features, which are essential steps to ensure that a model performs effectively.
4.1.1. Dataset Overview
The dataset used in this study encompasses urban-level indicators from multiple cities across various months and years, capturing both environmental and socio-economic factors that influence urban happiness. Specifically, the data include the following features. The
City,
Month, and
Year serve as identifiers for each data record, enabling temporal and geographical analysis of urban happiness. The
Decibel Level represents the average noise pollution levels measured in decibels, reflecting the noise exposure experienced by city residents. The
Traffic Density is a categorical variable representing traffic conditions, such as low, medium, or high, which has a direct impact on mobility and quality of life. The
Green Space Area measures the amount of green space available per capita, in square meters, contributing to the residents’ physical and mental well-being. The
Air Quality Index (AQI) is a numerical value indicating the air quality level, where higher values represent more polluted environments. The target variable in this dataset is the
Happiness Score, which represents the overall happiness of residents based on surveys and various metrics, scaled from negative to positive values. Additionally, the dataset includes a
Cost of Living Index, which serves as an indicator of the relative cost required to maintain a certain standard of living in each city, and the
Healthcare Index, a numerical index reflecting the quality and accessibility of healthcare services available to residents. The dataset consists of 545 rows, each representing a unique city, month, and year combination, thereby providing a comprehensive temporal and geographical overview of urban well-being indicators. The diversity of features allows the hybrid model to capture complex relationships between socio-economic, environmental, and urban infrastructure variables, enabling an in-depth analysis of the factors influencing urban happiness. Detailed information of the dataset is presented in
Table 1.
4.1.2. Data Cleaning and Handling Missing Values
The initial step in the data preparation involved data cleaning to ensure the reliability of the dataset, which included the identification and handling of missing values. Let the dataset be represented by a matrix
, where
n is the number of instances and
m is the number of features. Missing values in features like
Air Quality Index,
Green Space Area, and
Healthcare Index were treated to avoid biased or incomplete model training, which could have resulted in unreliable parameter estimates. For continuous numerical features, such as
Decibel Level,
Air Quality Index, and
Cost of Living Index, missing values were imputed using the arithmetic mean of the observed values, as presented in (
33)
where
denotes the set of indices without missing values for feature
j, and
represents the value of the
j-th feature for the
i-th instance. This imputation technique preserves the central tendency of the data, ensuring that the statistical properties of the feature are maintained and the impact on variance is minimized.
For categorical features such as
Traffic Density, missing values were imputed using the mode of the observed values, as presented in (
34).
where
represents the frequency of occurrence of category
v. This strategy ensured that the categorical distribution remained unbiased, avoiding the introduction of artificial variability.
4.1.3. Feature Scaling
Feature scaling was applied to the numerical features in
to standardize them to a common scale, which is essential when different features have varying magnitudes and units. Let
represent the subset of numerical features in
. The
StandardScaler from scikit-learn was used to transform each numerical feature
, as presented in (
35).
where
is the mean of feature
j, as presented in (
36).
and
is the standard deviation of feature
j, as presented in (
37).
This transformation ensures that each feature has a mean of zero and a standard deviation of one, as presented in (
38).
This standardization is critical for the gradient-based optimization algorithms used in neural networks, which are sensitive to the scale of the input features.
4.1.4. Encoding Categorical Variables
Categorical features such as
Traffic Density, denoted by
, were encoded using
one-hot encoding to transform them into a binary representation suitable for machine learning models. Let
contain
k unique categories, denoted as
. One-hot encoding was performed by creating
k new binary columns
, where
This encoding ensured that no ordinal relationships were implied among the categories, preventing the model from assuming any unintended ranking or ordering.
The final dataset
was formed by concatenating the scaled numerical features
and the encoded categorical features
as presented in (
39).
This ensured that both numerical and categorical features were appropriately represented in the feature space for model training.
4.1.5. Splitting the Dataset
The processed dataset
was split into
training and
testing sets to evaluate the model’s performance. Let
represent the entire dataset, where
is the target vector (
Happiness Score). The dataset was partitioned as presented in (
40).
where
contains 80% of the instances and
contains 20%. The split was stratified based on the target variable
to maintain a consistent distribution of
Happiness Score across both sets, minimizing any potential bias during model evaluation.
4.1.6. Feature Engineering
Feature engineering was performed to improve the model’s capacity to learn from complex relationships within the data.
Polynomial features were generated for specific numerical variables to capture potential interactions between features, which are critical for modeling non-linear relationships. For two numerical features,
and
, an interaction term was created, as presented in (
41).
This polynomial transformation allowed the model to represent relationships of higher order, providing a richer hypothesis space for learning complex patterns that contribute to urban happiness.
Additionally,
temporal features such as
Month and
Year were transformed into cyclical features to account for periodicity. For a temporal variable
Month, the transformation was carried out using sine and cosine functions, as presented in (
42).
This transformation ensured that the cyclical nature of the data was preserved, thereby allowing the model to understand that the end of the year and the beginning are adjacent.
The final dataset used for modeling consisted of scaled numerical features, one-hot encoded categorical features, polynomial interaction terms, and cyclical temporal features. This comprehensive feature space was designed to enable the GBM + NN hybrid model to effectively leverage both ensemble learning and deep learning capabilities for the prediction of urban happiness.
4.2. Model Development and Integration
The core of the predictive analysis involved the development and training of two distinct models, the GBM and NN. As described in
Section 3, the integration of these models was a nuanced process where the outputs from the GBM served as inputs to the NN, creating a synergistic model that harnesses the predictive power of both methodologies. Algorithm 2 shows the respective pseudocode sections for the model development and integration.
Algorithm 2 Hybrid Model Development and Integration |
Require: : Preprocessed Training Dataset, : Gradient Boosting Machine (GBM) Model, : Neural Network (NN) Model |
Ensure: : Integrated GBM-NN Model |
- 1:
Train on , optimizing for , where is the loss function associated with GBM - 2:
Generate predictions - 3:
Use as the input features for - 4:
Train on , optimizing , where is the loss function associated with NN - 5:
Construct the hybrid model , where is a function combining the GBM and NN models - 6:
Return the integrated model
|
4.3. Evaluation and Interpretation
To comprehensively evaluate the efficacy and reliability of the integrated GBM + NN hybrid model, a robust assessment using k-fold cross-validation was employed, as outlined in Algorithm 3. This methodology divided the dataset into k disjoint subsets, enabling iterative training and testing, to ensure that every instance contributed to both phases. Such an approach not only validated the model’s performance on various subsets but also provided a robust measure of its generalizability to unseen urban settings. The performance metrics derived from this evaluation phase played a critical role in assessing the predictive capabilities and robustness of the model. Four key metrics were utilized: root mean squared error (RMSE), mean absolute error (MAE), coefficient of determination (), and mean absolute percentage error (MAPE). These metrics provided a comprehensive view of the model’s predictive accuracy, error magnitude, and explanatory power.
The RMSE, as shown in (
43), quantifies the standard deviation of the residuals, representing the average magnitude of prediction errors. This metric is particularly effective in penalizing large errors, making it sensitive to significant deviations between the predicted and actual values.
Furthermore, the MAE, as presented in (
44), measures the average absolute difference between predicted and actual values. Unlike RMSE, it treats all errors equally, providing a straightforward interpretation of prediction accuracy.
Then, the
metric, defined in (
45), evaluated the proportion of variance in the target variable explained by the model. A value closer to 1 indicated that the model accounted for most of the variability, reflecting strong predictive power.
Finally, the MAPE, as shown in (
46), computes the average percentage difference between predicted and actual values, normalized by the true values. It provides an intuitive measure of prediction accuracy in relative terms.
Each metric complemented the others, offering a holistic understanding of the model’s strengths and limitations. For example, while RMSE penalizes larger errors and highlights significant outliers, MAE provides an unbiased average error magnitude. Meanwhile,
assessed the explanatory power of the model, and MAPE contextualized the errors in percentage terms, enhancing the interpretability for decision-making in urban analytics. In addition, the research culminated in the interpretation and reporting stage, where the results were analyzed to extract meaningful and actionable insights. This analysis focused on understanding the significance of the different predictors and their impact on urban happiness, facilitated by detailed visualizations and comprehensive discussions.
Algorithm 3 Model Evaluation via k-Fold Cross-Validation |
Require: : Integrated GBM-NN Model, : Complete Dataset, k: Number of folds |
Ensure: : Performance Metrics (e.g., Accuracy, Precision, Recall, F1-Score) |
- 1:
Partition into k disjoint subsets , where for and - 2:
for each fold do - 3:
Set and - 4:
Train on by minimizing the objective function , where denotes the model loss function - 5:
Test on to generate predictions - 6:
Compute performance metrics , where represents the evaluation metric functions and denotes the true labels - 7:
end for - 8:
Compute the average performance - 9:
Return : Average Performance Metrics
|
4.4. Statistical Analysis
This section describes the detailed experimental framework used to quantitatively assess the relationship between the urban features and happiness, based on rigorous statistical testing and model interpretability techniques. The goal of these experiments was to determine the individual and joint effects of urban features on the happiness score. These experiments employed cross-validation, hypothesis testing, and regression analysis to derive robust and interpretable results.
4.4.1. Experiment Design and Setup
The dataset , where is the feature matrix of urban indicators and is the vector of happiness scores, served as the basis for the experiments. The objective was to quantify how the individual features influenced the target variable y. The urban features included indicators like Air Quality Index (AQI), Traffic Density, Green Space Area, Healthcare Index, and Cost of Living Index, among others. The experiments were structured to evaluate each feature , or combinations of features, in predicting happiness. The testing procedure involved comparing the predicted happiness scores against the actual values and conducting hypothesis testing to establish the statistical significance of the relationships. Formally, the experiments tested the null hypothesis (that a feature has no significant effect on happiness, i.e., ) against the alternative hypothesis (that the feature does have a significant effect, i.e., ).
4.4.2. Data Splitting and Cross-Validation
To ensure the robustness of the experiments and prevent overfitting, we used
k-fold cross-validation with
. The dataset was divided into
k equally sized subsets or
folds, denoted
. At each iteration, the model was trained on
folds and tested on the remaining folds. This process was repeated
k times, with each fold serving as the test set once, thereby ensuring that each instance in the dataset was tested exactly once. In addition, for the hyperparameter tuning, we employed a grid search method to find the most optimal parameter for each model. The overall
cross-validation errorE was calculated as the average error across all folds. For each fold
, the error
was computed as presented in (
47).
where
is the actual happiness score for instance
j, and
is the predicted happiness score from the model. The final cross-validation error
E was the mean of the errors from all folds, as presented in (
48).
This approach helped mitigate overfitting by ensuring that the model was evaluated on unseen data in each fold, providing an unbiased estimate of its performance.
4.4.3. Feature Importance and Impact Quantification
The first step in understanding the impact of individual urban features on happiness was to compute
feature importance scores using the GBM part of the hybrid model. A GBM constructs an ensemble of decision trees, and the feature importance is derived based on how often a feature
is used for splitting and the resulting reduction in the loss function. For each feature
, the importance score
was calculated as (
49).
where
represents the set of decision trees in the ensemble where the feature
was used, and
is the reduction in the loss function
at tree
t. The loss function
used in this regression task was the
Mean Squared Error (MSE), defined as (
50).
The feature importance scores provided a preliminary understanding of which features had the most significant impact on happiness.
4.4.4. Pearson Correlation Analysis
To further examine the linear relationships between urban features and happiness, we performed
Pearson correlation analysis. The Pearson correlation coefficient
was used to measure the linear relationship between each feature
and the happiness score
y. The Pearson coefficient is defined as (
51).
where
represents the covariance between feature
and the target variable
y, and
and
are the standard deviations of
and
y, respectively. The covariance
was calculated as (
52).
where
and
represent the mean of the feature
and the mean happiness score, respectively. A Pearson correlation coefficient
close to 1 or −1 indicates a strong positive or negative linear relationship, respectively, between the feature and happiness.
4.4.5. Hypothesis Testing and Significance Analysis
To establish the statistical significance of the relationship between urban features and happiness,
t-tests were conducted. The
t-test was used to compare the means of two groups, such as cities with high air quality versus cities with low air quality, to determine if the difference in happiness scores was statistically significant. The t-statistic for comparing two groups was calculated as (
53).
where
and
are the mean happiness scores of the two groups,
and
are the sample variances, and
and
are the sample sizes for each group. The
degrees of freedom (df) for the
t-test were calculated as (
54).
The resulting p-value from the t-test was compared to a significance level . If , the null hypothesis (that there was no effect) was rejected, indicating that the feature had a statistically significant effect on happiness. For example, we conducted a t-test comparing happiness scores between cities with high air quality (AQI ≤ 50) and cities with low air quality (AQI > 100). The result showed that improving air quality had a significant positive effect on happiness, with .
4.5. Regression Analysis for Marginal Effects
To quantify the magnitude of the effect of each feature, we applied
linear regression analysis. The linear regression model is given by (
55).
where
is the happiness score for instance
i,
is the value of feature
for instance
i, and
is the regression coefficient representing the marginal effect of
on
y. The error term
represents the residual, or the difference between the predicted and actual happiness score. The regression coefficients
were estimated by minimizing the
Residual Sum of Squares (RSS) as presented as (
56).
where
represents the predicted happiness score for instance
i. The statistical significance of each coefficient
was assessed using
t-tests on the regression coefficients, with corresponding
p-values used to determine if the effect of each feature was significant. For example, a 10% improvement in air quality led to an estimated 5% increase in happiness, with a
p-value
, confirming the significance of the result.
5. Result and Discussion
The performance of the various machine learning models for the prediction task was evaluated using 10-fold cross-validation, and the results are summarized in
Table 2. Key performance metrics included the average root mean square error (RMSE), average mean absolute error (MAE), average coefficient of determination (R
2), and average mean absolute percentage error (MAPE). First, the GBM + NN hybrid model achieved the best overall performance across all metrics, with an RMSE of 0.3332, MAE of 0.2633, R
2 of 0.9673, and MAPE of 7.0082%. The low RMSE and MAE values indicated high predictive accuracy, while the R
2 value showed that 96.73% of the variance in the target variable was explained by the model. The low MAPE further highlighted the model’s robustness in minimizing percentage errors. This superior performance can be attributed to the hybrid nature of the model, which combines the structured data handling capabilities of GBM with the non-linear feature extraction capabilities of neural networks. Furthermore, tree-based models such as the random forest, gradient boosting machine (GBM), and CatBoost performed competitively, with the random forest achieving an RMSE of 0.4063, MAE of 0.3173, R
2 of 0.9524, and MAPE of 11.86%. CatBoost achieved slightly better RMSE and MAE values compared to the GBM but lagged behind GBM + NN and random forest in overall performance. With an RMSE of 0.8189 and R
2 of 0.8120, the GBM demonstrated good predictive capability but was surpassed by GBM + NN and random forest. On the other hand, CatBoost achieved the lowest RMSE (0.3486) among the individual tree-based models, reflecting a strong predictive accuracy. However, its MAPE (8.4328%) was slightly higher than GBM + NN, indicating room for improvement in capturing percentage-based errors.
Among neural network models, the dense neural network and convolutional neural network (CNN) showed a competitive performance. The CNN achieved an RMSE of 0.4923, MAE of 0.3673, and R2 of 0.9227, outperforming many other neural network models. The dense neural network exhibited an RMSE of 0.5837 and R2 of 0.8949, suggesting a good overall performance, but not as strong as the CNN. The other neural network architectures like GRU (RMSE: 0.4931, R2: 0.9226) and ResNet (RMSE: 0.6677, R2: 0.8655) showed moderate results, indicating their potential for handling temporal and spatial data, albeit less effectively for this task. The standalone ensemble model performed poorly compared to its counterparts, with an RMSE of 1.5114, MAE of 1.2648, and R2 of only 0.3398. The high MAPE (48.8259%) suggests that this approach struggled to generalize effectively on the dataset. Furthermore, the inclusion of temporal structures in models such as LSTM and LSTM + CNN did not yield favorable results. LSTM had an RMSE of 1.0239 and R2 of 0.5992, indicating limited effectiveness in capturing patterns in this dataset. LSTM + CNN performed worse, with an RMSE of 1.2188 and R2 of 0.3955, suggesting that the combination of temporal and spatial features did not synergize well for this task.
Next, traditional regression approaches, such as linear regression, showed respectable results, with an RMSE of 0.5485, MAE of 0.4280, R
2 of 0.9136, and MAPE of 10.9827%. This indicates that linear models can capture significant patterns in data but fall short compared to more advanced methods. TabNet showed the poorest performance across all metrics, with an RMSE of 5.6100 and a negative R
2 value (−8.5989), indicating that the model failed to fit the data effectively. Autoencoder + Regression performed moderately, with an RMSE of 0.6566 and R
2 of 0.8679, but did not outperform the tree-based or hybrid models. The results demonstrate the significant advantage of hybrid models like GBM + NN, which combine the strengths of traditional tree-based methods and deep learning architectures. Models like the random forest and CatBoost consistently delivered a strong performance, highlighting their effectiveness in handling structured, tabular data. While the CNN and dense neural networks showed strong performance, architectures like LSTM and ResNet were less effective, emphasizing the importance of choosing the right neural network for specific tasks. The poor performance of TabNet suggests that it may not be well-suited for this dataset, possibly due to overfitting or difficulties in feature representation. The GBM + NN hybrid model was the most effective approach for this task, achieving the best performance across all metrics. Future research could explore optimizing hybrid architectures further and investigating feature engineering techniques to enhance model performance. Additionally, understanding the limitations of the underperforming models like TabNet could provide insights into dataset-specific challenges. Beside the comparison of the machine learning and deep learning models, we also have the results of the statistical experiments,
Table 3 demonstrates that several key urban features had a statistically significant and substantial impact on happiness. A 10% improvement in air quality led to a 5% increase in happiness, with a
p-value of 0.01, confirming its significance. Reducing traffic density from high to medium resulted in a 4% increase in happiness, while increasing green space by 1 square meter per person was associated with a 3% increase in happiness, both with
p-values below 0.05. These results were validated through cross-validation and hypothesis testing, providing robust evidence for the relationships between urban features and happiness.
6. Conclusions
This study proposed a novel hybrid approach combining GBM and NN models for the prediction of urban happiness. By leveraging the capabilities of ensemble learning in GBMs and the deep feature extraction in neural networks, the GBM + NN hybrid model achieved significant improvements in predictive accuracy compared to other traditional machine learning and deep learning models. The experimental results demonstrated that the hybrid model outperformed all other models tested, achieving the lowest RMSE of 0.3383. The effectiveness of the hybrid model can be attributed to its ability to effectively capture complex feature interactions and refine predictions through a two-stage learning process. This approach not only improved the accuracy of predictions but also provided valuable insights into the key factors influencing urban happiness, such as air quality, traffic density, green space availability, healthcare quality, and cost of living. These insights can serve as a valuable resource for urban planners and policymakers in developing evidence-based interventions aimed at enhancing the quality of life in cities.
The comparative analysis of the GBM + NN hybrid model against models such as DeepGBM, CNN, ResNet, and TabNet further highlighted the advantages of integrating ensemble learning with deep learning techniques. Models like CNN and DeepGBM performed reasonably well, but the absence of an integrated learning structure limited their predictive capabilities relative to the hybrid model. Traditional models like linear regression and random forest failed to capture the non-linear relationships between urban features adequately, leading to higher prediction errors. The findings of this study emphasize the importance of adopting hybrid models for complex prediction tasks, where a combination of structured feature handling and deep representation learning is required. The GBM + NN hybrid model presents a new benchmark in urban happiness prediction, showcasing a promising direction for future research that involves the integration of different machine learning paradigms to enhance model performance. Future research could explore the extension of this hybrid approach by incorporating additional contextual features, such as real-time social media data, mobility patterns, and climate information, to further improve the model’s predictive capabilities. Additionally, the interpretability of the hybrid model could be enhanced by applying feature importance techniques and explainable AI methods to provide a more transparent understanding of the impact of each predictor on urban happiness.