Next Article in Journal
Enhanced Deblurring for Smart Cabinets in Dynamic and Low-Light Scenarios
Previous Article in Journal
Three-Dimensional Dynamic Positioning Using a Novel Lyapunov-Based Model Predictive Control for Small Autonomous Surface/Underwater Vehicles
Previous Article in Special Issue
Forecasting Flower Prices by Long Short-Term Memory Model with Optuna
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Factors Influencing Depression: Socioeconomic Perspectives Using Machine Learning Analytics

Office of Research, ASSIST University, Seoul 03767, Republic of Korea
Electronics 2025, 14(3), 487; https://doi.org/10.3390/electronics14030487
Submission received: 23 December 2024 / Revised: 21 January 2025 / Accepted: 24 January 2025 / Published: 25 January 2025
(This article belongs to the Special Issue New Advances in Machine Learning and Its Applications)

Abstract

:
Depression is a widespread mental health disorder with significant societal impacts, and while socioeconomic status (SES) is a well-established determinant, limited research has explored the unique factors influencing depression in South Korea, such as educational pressure, long working hours, and traditional gender roles. Using data from the Korean National Health and Nutrition Examination Survey (KNHANES) collected in 2014, 2016, 2018, 2020, and 2022, this study analyzed 24,308 participants to examine the relationship between SES and depression. Depression was measured using the Patient Health Questionnaire-9 (PHQ-9), and twelve socioeconomic variables, including income, education, marital status, and working hours, were assessed using logistic regression models. The findings revealed that monthly income, age, marital status, and weekly working hours were significant predictors of depression, with higher income levels unexpectedly associated with greater depression scores, potentially due to increased stress. Gender, household size, and educational attainment were also notable contributors. This study underscores the complex interplay of SES factors and depression in South Korea’s distinct sociocultural context and highlights the need for mental health policies addressing both economic and psychological stressors, particularly for higher income individuals and women. Future research should further explore these dynamics to develop culturally sensitive mental health interventions.

1. Introduction

Depression is a pervasive and debilitating mental health disorder that has significant repercussions for individuals and societies alike, affecting millions of people globally. According to the factsheet of the WHO [1], approximately 3.8% of the global population had experienced depression and about 280 million people in the world have depression, as of 2023. Also, according to recent research [2], the nationwide depression rate in South Korea in 2024 is 7.2%, which is almost double the global average. It is characterized by persistent feelings of sadness, loss of interest in daily activities, and a variety of physical and cognitive impairments that can severely hinder daily functioning [3,4]. The burden of depression extends beyond the individual, leading to considerable social and economic costs, including lost productivity and increased demands on healthcare systems [5]. It can lead to significant deficiencies in personal, social, and occupational functioning. Thus, understanding depression is critical for developing effective prevention and treatment strategies. Socioeconomic status (SES) is contemplated as one of the key determinants of mental health, including depression. SES encompasses a range of factors such as income, education, occupation, and living conditions, which collectively influence individuals’ access to resources, exposure to stressors, and overall life circumstances [6]. Numerous studies have shown that lower SES is associated with a higher incidence of depression. For instance, individuals with lower income levels often experience greater financial stress, housing instability, and limited access to healthcare, all of which contribute to the development of depressive symptoms [6,7,8,9]. Education and occupation are also critical elements of SES that influence mental health. Higher levels of education are generally linked to better mental health outcomes, possibly due to increased cognitive resources, better problem-solving skills, and better access to information and support networks [10,11]. Similarly, occupational status can influence mental health through various mechanisms, including job security, work-related stress, and the availability of social support at the workplace [12,13].
Several recent studies have explored the relationship between socioeconomic status and mental health, particularly depression, using various theoretical frameworks and methodological approaches [14,15,16,17]. Despite these findings, much of the research on SES and depression has relied on traditional statistical methods, which may not fully capture the complexity and non-linear interactions of multiple socioeconomic factors. For example, traditional linear regression models may oversimplify the relationship between SES variables and depression by assuming constant effects across different population groups and time periods [18]. Additionally, while many studies have established a general link between lower SES and higher depression rates, most of the studies focus on Western culture-based socioeconomic factors [7,8,10,11,19,20,21]. These gaps, in terms of traditional statistical method [14,16,17] and cultural perspectives [7,8,10,11,19,20,21], in previous studies highlight the need for deeper a understanding of culture and more advanced analytical methods that can uncover the intricate patterns among various SES factors. In response to this gap, machine learning techniques, including logistic regression, have gained popularity for their ability to identify complex, non-linear relationships in large datasets. These methods offer significant advantages over traditional statistical models by allowing researchers to examine a wider range of predictor variables and by providing more nuanced insights into how these variables interact to influence mental health outcomes [22,23]. These advanced analytical methods allow for a more granular analysis of the data, uncovering hidden relationships and providing deeper insights into the predictors of depression. For example, recent studies using machine learning techniques have revealed previously unnoticed relationships between SES and depression, offering new perspectives on the pathways through which socioeconomic factors contribute to mental health disparities [24]. Also, previous research about the relationship between socioeconomic factors and depression [14,15,16,17] has failed to consider unique cultural and regional traits. For instance, the Confucian heritage of South Korea caused substantial caregiving obligations for women [25], extreme expectation for educational success [26], and burnout affected by long working hours [27], which could become factors causing depression.
In response to this gap, this study utilizes data from the Korean National Health and Nutrition Examination Survey (KNHANES), a large-scale, nationally representative survey that collects detailed information on health, nutrition, and socioeconomic factors. This dataset is particularly well suited for examining the intersection of SES and depression in South Korea due to its breadth and depth, including detailed demographic, health, and socioeconomic information [28]. By employing machine learning-based analytical methods, such as logistic regression, this research seeks to provide a more nuanced understanding of how various socioeconomic factors interact to influence depression, ultimately contributing to more targeted and effective interventions. A comprehensive approach to understanding the impact of socioeconomic factors on depression involves examining various dimensions of SES. Income inequality, for instance, was shown to correlate with mental health disparities, where greater income disparity within a community can lead to increased feelings of social exclusion and stress [24]. Furthermore, the intersectionality of socioeconomic factors such as gender, age, and marital status can provide deeper insights into the nuanced ways these factors interact to affect mental health outcomes [15,29,30].
This study aims to build on previous research by using KNHANES data to explore the relationship between various socioeconomic factors and depression in South Korea. The study utilizes logistic regression, a widely used method in epidemiological research, to model the probability of depression as a function of multiple predictor variables, including income, education, occupation, and family composition. By focusing on these factors and using advanced statistical methods, this research seeks to provide a more detailed and comprehensive understanding of how SES influences depression in the South Korean population. The insights gained from this study will be valuable for informing mental health policies and interventions aimed at reducing depression rates, particularly in low-SES populations. Therefore, the objectives of this study are, first, to identify the key socioeconomic factors associated with depression in South Korea using logistic regression analysis, and, second, to evaluate the potential policy implications of these findings, particularly in terms of designing targeted interventions to reduce the burden of depression among vulnerable populations. By addressing these objectives, this study aims to fill important gaps in the literature and contribute to the ongoing efforts to improve mental health outcomes in South Korea and beyond. To summarize, much of the existing work relies on traditional statistical methods and focuses on Western contexts, often overlooking the unique cultural and societal nuances present in countries like South Korea. Therefore, this study addresses these gaps by leveraging data from KNHANES to explore how SES factors, including income, education, and family composition, influence depression in South Korea. Using advanced machine learning techniques, particularly logistic regression, this research aims to uncover nuanced relationships between SES and depression within South Korea’s distinctive cultural and societal framework. The study’s objectives and contributions are summarized as follows:
  • Objective 1: Identify key socioeconomic factors associated with depression in South Korea using logistic regression analysis.
  • Objective 2: Examine the interplay of SES factors with South Korea’s unique cultural and societal characteristics.
  • Objective 3: Provide actionable insights for policymakers to design targeted interventions for vulnerable populations, reducing the burden of depression and improving mental health outcomes.

2. Related Works

2.1. Socioeconomic Status and Depression

Several recent studies have explored the relationship between socioeconomic status and mental health, particularly depression, using various theoretical frameworks and methodological approaches. Reme, Wörn, and Skirbekk [14] investigated the impact of the COVID-19 pandemic on mental health disparities, finding that individuals with lower socioeconomic status in Norway, including lower income and education levels, faced higher depression rates during the pandemic. The research highlights the exacerbating effects of global crises on vulnerable SES groups, underlining the necessity for targeted interventions during such events. In another study, Kong and Zhang [15] examined the relationship between income inequality and mental health. Their findings showed that regions with higher income inequality exhibited more frequent cases of depression. However, the study also pointed out that social support plays a moderating role, potentially reducing the negative effects of income disparity on mental health. A study by Pangarkar, Paigude, Banait, Ajani, Mange, and Bramhe [16] analyzed how occupation and related stressors affect mental health. The findings revealed that high-stress jobs, particularly those involving emotional labor and long working hours, were significantly associated with increased depression rates. This underscores the need for occupational health interventions, especially in high-pressure work environments. Also, Zheng, Lyu, Pan, and Chen [17] explored gender-specific factors affecting depression in China. The research found that women were more likely to report depressive symptoms, largely due to the dual burden of professional responsibilities and domestic roles. This study calls for gender-responsive policies to address the mental health disparities between men and women, particularly in work-related contexts. These findings underscore the complex interplay between socioeconomic conditions and social support in determining mental health outcomes. These studies collectively underscore the strong association between lower socioeconomic status and higher risks of depression, while also highlighting the mitigating role that social support and effective policies can play in improving mental health outcomes.

2.2. Techniques in Depression Research

With the availability of diverse datasets, such as electronic health records, neuroimaging, and wearable device data, recent studies are increasingly adopting data analytic approaches based on machine learning and artificial intelligence. Utilizing these approaches could provide a holistic view of depression and its underlying mechanisms. For instance, Bader et al. [31] used machine learning to classify the severity of major depressive disorder (MDD) based on oxidative stress biomarkers and sociodemographic factors. Also, recent research by Zhao and Tlachac [32] utilized the XGBoost model to identify key depression-related factors, with anxiety emerging as the top risk factor. Significant variations in these factors were observed between depressed and non-depressed groups. Flores et al. [33] used a multimodal deep learning model designed for depression screening using audio and temporal facial features from clinical interviews with a virtual agent, and it enhances unimodal representation through encoder–transformer layers over pre-trained models and explicitly aligns the audio and facial modalities. Also, Chen et al. [34] proposed CoKE, a model for detecting depression in social media posts by integrating trusted commonsense knowledge using three-way decision theory, and it enhances pre-trained language models through modules for trusted screening, knowledge generation, and dynamic knowledge fusion, improving the accuracy of depressive tendency detection. Further, Thirupathi et al. [35] claimed that the integration of AI and IoT is transforming mental healthcare by enhancing diagnosis, treatment, and support. Another interesting study, although not directly related to depression, by Liu et al. [36] proposed a novel vision–language pre-training model designed to utilize the hierarchical structure of medical reports, which could potentially be adapted for depression research in the near future.
Research methods leveraging imaging, AI, and IoT for depression are gaining significant attention and are expected to achieve exceptional performance in the future. However, for survey-based data analysis, such as KNHANES data, machine learning methods like logistic regression remain highly effective. Survey-based data typically involve structured and standardized variables, which are well suited for algorithms like logistic regression; logistic regression offers interpretable insights, essential for public health and policymaking, and efficiently handles categorical outcomes and imbalanced datasets frequently encountered in survey research [37,38]. While advanced machine learning algorithms such as Random Forest, Gradient Boosting Machines (e.g., XGBoost, LightGBM), and deep learning models like Transformers have shown remarkable performance in many domains; they often require large, high-dimensional, and unstructured datasets to fully demonstrate their advantages [22,39,40]. These algorithms are computationally intensive and prone to overfitting, especially in datasets with limited sample sizes or imbalanced distributions, as is often the case with survey data [41,42,43]. Conversely, logistic regression provides a robust and computationally efficient approach that avoids these pitfalls and ensures stability in parameter estimation, even with smaller datasets [44]. Hence, for structured datasets like KNHANES, logistic regression remains a competitive and practical choice despite the availability of more complex algorithms.

2.3. Cultural and Regional Characteristics of South Korea

Despite the extensive research on the relationship between SES and depression, there remains a significant gap in understanding how these factors interact with cultural and regional characteristics, particularly in societies like South Korea, where unique social dynamics are at play. South Korea, shaped by its Confucian heritage, maintains traditional gender roles, which place distinct pressures on men and women. Men are often seen as primary providers, which may contribute to the high levels of stress and depression observed in male-dominated occupational sectors. Similarly, women, though increasingly participating in the workforce, still bear significant caregiving responsibilities, contributing to mental health challenges [25]. Additionally, the intense societal emphasis on educational achievement and the high college admission rates seen in South Korea can exacerbate stress, leading to higher levels of depression, particularly among students and young adults [26]. Furthermore, South Korea is known for its long working hours, often cited as among the highest in the OECD, which can contribute to work-related stress and burnout [27]. These cultural and societal factors are critical in understanding the unique ways SES affects depression in South Korea, yet they have not been thoroughly examined in previous studies. Most existing studies focus on Western contexts, and while these offer valuable insights, they do not fully account for the cultural and societal nuances present in East Asian countries like South Korea. The influence of Confucian values, the competitive educational environment, and the work culture in South Korea may uniquely shape the relationship between socioeconomic factors and depression. To date, few studies have explicitly examined how these distinctive cultural and societal elements interplay with SES to influence mental health outcomes. This research seeks to fill that gap by incorporating these dimensions into the analysis of depression in South Korea, offering a more culturally contextualized understanding of how SES factors contribute to mental health issues.

3. Materials and Methods

3.1. Data Source

The data for this study was obtained from the KNHANES dataset, (https://knhanes.kdca.go.kr/knhanes/eng/main.do, accessed on 13 November 2024) from 2014 to 2022. Since depression data were collected every two years, the analysis was structured using repeated cross-sectional data from the years 2014, 2016, 2018, 2020, and 2022, covering five survey periods. This dataset includes a wide range of variables, such as demographic information, health status, nutritional status, and various socioeconomic factors. Repeated cross-sectional data, in contrast to panel data, involves collecting data from different samples of the population at each survey wave, rather than tracking the same individuals over time. Each cross-section provides a snapshot of the population at a specific point in time, allowing for a broad analysis of population-level trends and associations. This structure enables researchers to observe how socioeconomic factors are associated with depression across different years, while also providing flexibility to study shifts in the population as a whole. By leveraging the breadth and depth of these repeated cross-sectional snapshots, this research can capture dynamic changes in the relationship between socioeconomic conditions and depression over time, despite not tracking individual-level changes directly.

3.2. Data Preprocessing

From the KNHANES, we selected relevant variables indicative of socioeconomic status and potential predictors of depression. The dataset was anonymized to protect participants’ privacy, and all analyses were conducted in accordance with ethical standards set forth by the institutional review boards of the Korea Disease Control and Prevention Agency (KDCA). The dataset included demographic variables such as age, gender, and marital status, along with economic indicators like monthly income and house ownership status. Additionally, educational factors (both individual and parental education levels), occupation, household composition, number of family members, and weekly working hours were also included as independent variables. Categorical variables were converted to numeric representations using label encoding to facilitate their use in statistical models. The target variable was depression, derived from the Patient Health Questionnaire-9 (PHQ-9) scores, a widely accepted tool developed by Kroenke, Spitzer, and Williams [45] for screening depression. PHQ-9 is a widely used screening tool for depression, consisting of nine questions that assess the frequency of depressive symptoms [46]. For this study, the total PHQ-9 score, ranging from 0 to 27, was used. A cutoff score of 10 was applied to classify individuals with moderate to severe depression, based on the guidelines of Kroenke et al. [45], Manea, Gilbody and McMillan [47], and Costantini et al. [48]. From 2014 to 2022, a total of 74,162 participants took part in the survey, which included 1312 questions, including those related to depression. This cutoff is expected to capture 88% of patients with major depressive disorders. Participants under 19 years of age and those with missing data were excluded, as the survey did not assess depression in this age group. Consequently, the final dataset comprised 24,308 participants, with 12 independent variables (age, gender, marriage status, number of family members, household composition, house ownership status, monthly income, education, father’s education, mother’s education, occupation, and weekly working hours) and 1 dependent variable (depression), yielding a total of 339,178 data points (i.e., each data point represents a specific value for a variable for a given participant). Data preprocessing involved dropping all missing values and encoding categorical variables (e.g., gender, household composition, and education) using label encoding to convert them into numerical formats suitable for machine learning algorithms. The descriptive statistics of the dataset used in this research are presented in Table 1. This study adhered to ethical standards for research involving human subjects, and the KNHANES data used in this study were obtained following these ethical considerations.

3.3. Building Logistics Regression Model and Analysis Process

To build the logistic regression model, the categorical variables were first transformed into numerical representations using LabelEncoder() to ensure compatibility with the logistic regression algorithm. This transformation enabled the model to interpret categorical variables such as gender, marital status, and occupation as numerical features. The data were then preprocessed by removing records with missing values, ensuring a clean dataset for analysis. The dataset was split into a training set and a testing set in a 70:30 ratio, ensuring that the model could be trained on a substantial portion of the data while retaining enough data for evaluating the model’s performance [18]. The logistic regression model was built using Python’s LogisticRegression function from the sklearn library. The model was configured with a maximum iteration limit (max_iter = 1000) to ensure convergence and was set to balance class weights (class_weight = ‘balanced’) to handle any class imbalances in the dataset. The training set was used to fit the model, which involved estimating the parameters that maximized the likelihood of observing the data [37]. For evaluation, the performance of the model was measured using several key metrics. The Hosmer–Lemeshow (HL) p-value was calculated to assess the goodness-of-fit, ensuring that the predicted probabilities closely aligned with the observed outcomes. The Brier score was used to evaluate the accuracy of probabilistic predictions, where lower scores indicated better model calibration. Finally, the Area Under the Curve (AUC) was computed to measure the model’s discrimination ability, highlighting how well it could distinguish between cases of depression and non-depression [49]. Additionally, the model’s performance was validated using bootstrapped confidence intervals for the Brier score and AUC, providing a robust estimate of its predictive power. This bootstrap approach ensured a reliable evaluation of the model by generating multiple resamples from the test data to account for variability in the estimates [50]. Scatter plots were also used to explore the relationships between continuous and categorical variables and depression, providing visual insights into potential trends in the data before model building [51]. To interpret the results of the logistic regression model, the odds ratio was used. The odds ratio translates the coefficients of the logistic regression into an interpretable measure of how much more or less likely the outcome is to occur with each unit change in the independent variables. This aids in understanding the impact of predictors on the likelihood of depression [37,52].

4. Results

4.1. Model Performance

Prior to selecting the final methodological approach, this study performed performance evaluations using Python 3.10 with Jupyter Notebook 7.0.8 to confirm that logistic regression would serve as a reliable model. In health informatics, model performance metrics such as the Hosmer–Lemeshow (HL) p-value, Brier score, and Area Under the Curve (AUC) are essential tools for evaluating predictive models. The HL p-value is widely used to assess the goodness-of-fit in logistic regression models by indicating whether there is a significant difference between observed and predicted outcomes [37]. A small HL p-value suggests poor fit, prompting closer attention to the model’s ability to represent the underlying data accurately. The Brier score measures the accuracy of probabilistic predictions by evaluating the difference between predicted probabilities and actual outcomes. Lower Brier scores indicate better calibration, showing that the model’s predictions are closer to true probabilities [53]. The AUC, on the other hand, is a robust metric for measuring the discriminatory power of a model, quantifying its ability to differentiate between classes, such as distinguishing between those with and without depression [54]. These metrics, especially in the context of health informatics, ensure that models are both well calibrated and discriminative, which is crucial for making accurate and reliable predictions in healthcare decision-making [55]. Table 2 presents a comparison of several predictive models for identifying depression, including logistic regression (LR), Random Forest (RF), Gradient Boosting (GB), Adaboost (AB), Decision Tree (DT), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve Bayes (NB), Linear Discriminant Analysis (LDA), Extreme Gradient Boosting (XGB), Tabular Data Transformer (TabTran), and Tabular Prior-Functional Networks (TabPFN). Each model is evaluated based on its Brier score, which assesses calibration, and its AUC, which measures discriminatory ability. Logistic regression, for instance, yields a Brier score of 0.218, with a 95% confidence interval (CI) of [0.216–0.220], indicating moderate error in predicting probabilities. The AUC of 0.686 [0.669–0.703] reflects that the model can correctly distinguish between depressed and non-depressed individuals about 69% of the time. Although its calibration is weaker compared to other models, LR offers reasonably strong classification performance, making it appropriate for binary outcome predictions in this research. Also, according to the other performance measurements, such as accuracy and recall, LR shows competitive performance compared to the other models. Therefore, this study ultimately chose logistic regression for its interpretability and solid classification performance. While models like Gradient Boosting and Linear Discriminant Analysis provide slightly better calibration and discrimination, logistic regression’s transparency and ease of interpretation are crucial for decision-making in health informatics. Logistic regression’s balance between moderate calibration and classification power makes it a practical and reliable model for predicting binary health outcomes, ensuring that predictions are both understandable and actionable in real-world healthcare settings.

4.2. Scatter Plot Visualization

Scatter plots are effective tools for identifying and visualizing relationships between variables, detecting patterns or anomalies, and communicating findings in an accessible way. Especially before conducting formal statistical tests like logistic regression, scatter plots provide a useful way to obtain a preliminary sense of how variables interact. Thus, this study used scatter plots as an initial visual analysis tool to explore continuous variables in the dataset before proceeding with statistical analyses, such as calculating odds ratios, as illustrated in Figure 1. Age shows a non-linear relationship with depression. The depression rate is highest for younger individuals (approximately 7.5%) and decreases in middle-aged groups (to around 4.5%), before rising again in older individuals, reaching approximately 7.5% in the oldest age groups. This suggests that both younger and older populations may be more vulnerable to depression, with middle-aged individuals having lower rates of depression. In terms of income, a clear inverse relationship is observed. Those in the lowest income group exhibit a depression rate of approximately 7%, while the rate drops to about 3% for those in the highest income group. This indicates that higher income is associated with a lower likelihood of depression, with a marked decrease in depression rates as income increases. For the number of family members, a U-shaped pattern emerges. The depression rate is lowest for individuals with three family members (approximately 4%), while those living alone or in families with more than five members experience higher depression rates, around 6.5%. This suggests that moderate family sizes may provide a protective effect against depression, while living alone or in larger households might increase vulnerability. Household composition also shows a significant relationship with depression. Single-generation households, particularly single-person households, have a depression rate of around 6%, while two-generation households, particularly those with a couple and children, have lower rates (about 4.5%). However, multi-generational households exhibit the highest depression rate, around 6.5%, suggesting that complex family structures might be associated with higher levels of stress and, consequently, higher depression rates. House ownership follows a clear trend, where individuals with no property have the highest depression rate, approximately 6%. Those who own one property show a lower rate of about 5%, and those who own two or more properties experience an even lower rate, around 4%. This indicates that property ownership, particularly owning more than one property, may provide a protective effect against depression. Finally, weekly working hours show a complex relationship with depression. Those working very few hours or excessively long hours have higher depression rates (around 7%), while those working moderate hours (around 40 h per week) have the lowest depression rate (about 4%). This suggests that both underemployment and overwork are associated with higher depression rates, whereas moderate working hours appear to offer a protective effect on depression. Overall, the analysis of these scatter plots demonstrates that factors such as higher income, education, property ownership, and stable marital relationships are associated with lower depression rates, while younger and older age groups, single-person or multi-generational households, lower education levels, and extreme working hours are associated with higher depression rates.

4.3. Logistic Regression Results with Odds Ratio

Logistic regression is particularly useful for binary outcomes, such as success/failure or presence/absence of disease, and is widely applied in fields like epidemiology and clinical research [56]. In logistic regression, the odds ratio is used to measure the relationship between an independent variable and the likelihood, or odds, of a specific outcome. Also, the odds ratio provides an intuitive understanding of how a one-unit increase in a predictor variable influences the odds of the outcome [57]. An odds ratio greater than one suggests that the predictor increases the odds of the outcome, while an odds ratio less than one suggests a decrease [58]. Odds ratio is commonly used in health research to assess relationships between risk factors and outcomes [59]. Moreover, logistic regression, a non-linear model, allows for the analysis of complex, non-linear relationships between variables, with the odds ratio serving as a crucial interpretative tool in understanding these dynamics [60]. With these advantages, this research utilized the odds ratio to interpret the overall results regarding the relationship between depression and independent variables, as presented in Table 3.
The logistic regression analysis provides insights into the impact of various independent variables on depression. For gender, the odds ratio is 1.622, meaning that females have a 62.2% higher likelihood of experiencing depression compared to males. This result is statistically significant with a p-value of less than 0.001, supported by a z-value of 7.251, indicating a strong influence of gender on depression. Regarding age, the odds ratio is 0.881, indicating that for each additional year of age, the likelihood of depression decreases by 11.9%. This result is statistically significant, with a p-value less than 0.001 and a z-value of −4.629, confirming that age has a meaningful effect on reducing the likelihood of depression. Monthly income also has a negative association with depression, with an odds ratio of 0.832, suggesting that higher income is associated with a 16.8% reduction in the likelihood of depression. This finding is highly significant, with a p-value less than 0.001 and a z-value of −7.969, emphasizing the role of income in lowering depression risk. The number of family members is associated with a lower likelihood of depression, as each additional family member decreases the likelihood of depression by 12.9%, with an odds ratio of 0.871. This relationship is statistically significant, with a p-value of 0.001 and a z-value of −3.251. For household composition, the odds ratio is 1.060, indicating that certain household compositions increase the likelihood of depression by 6.0%. This effect is statistically significant, with a p-value of 0.020 and a z-value of 2.323, suggesting a modest impact on depression risk. The house ownership status shows a protective effect, with an odds ratio of 0.749, indicating that owning a house reduces the likelihood of depression by 25.1%. This result is highly significant, with a p-value less than 0.001 and a z-value of −5.736, supporting the importance of home ownership in reducing depression risk. For marital status, married individuals show an increased likelihood of depression, with an odds ratio of 1.187, indicating an 18.7% higher risk. This relationship is highly significant, with a p-value less than 0.001 and a z-value of 7.918, demonstrating a strong association between marriage and depression. Education level exhibits a protective effect against depression, with an odds ratio of 0.769, suggesting a 23.1% reduction in the likelihood of depression as education level increases. This result is statistically significant, with a p-value less than 0.001 and a z-value of −9.421, indicating that higher education is a crucial factor in reducing depression risk. The father’s education level does not have a statistically significant impact on depression, with an odds ratio of 1.019, a p-value of 0.198, and a z-value of 1.286, suggesting no influence on depression risk. However, the mother’s education level does have a statistically significant protective effect, with an odds ratio of 0.960, indicating a 4.0% reduction in depression risk. This result is significant, with a p-value of 0.004 and a z-value of −2.850, showing that a mother’s education level plays a modest but important role in lowering depression risk. In contrast, occupation does not significantly influence depression, as the odds ratio of 0.991 and the p-value of 0.403, combined with a z-value of −0.836, suggest no meaningful effect of occupation on depression. Finally, weekly working hours are associated with a slight reduction in depression risk, with an odds ratio of 0.994, indicating that each additional hour of work reduces the likelihood of depression by 0.6%. This effect is statistically significant, with a p-value of less than 0.001 and a z-value of −4.533, suggesting that longer working hours are associated with a reduced risk of depression.

5. Discussion

This study provides a comprehensive analysis of the relationships between various socioeconomic factors and depression in South Korea, using a large dataset collected over five survey periods (2014–2022). By examining 12 independent variables, including gender, age, marital status, income, household composition, education, parents’ education, and weekly working hours, this research sheds light on how different aspects of an individual’s life intersect with mental health. The use of logistic regression and odds ratio analysis allows for an interpretable understanding of these relationships, which is crucial for both policymakers and healthcare providers aiming to implement effective interventions.
The study confirms that women in South Korea are more likely to suffer from depression than men, a finding that resonates with global patterns but is particularly relevant in South Korea’s cultural context. The traditional Confucian role of women, who are expected to manage both professional responsibilities and familial duties, creates significant stress. Women, especially those balancing work and caregiving, may experience higher levels of emotional exhaustion, which leads to increased depression [61]. This finding suggests that gender-specific mental health services, including workplace flexibility and targeted support for female caregivers, are urgently needed. As expected, the study finds a negative association between age and depression, meaning that younger individuals are more likely to report depressive symptoms. Younger people in South Korea face immense societal pressure related to academic and career success, exacerbated by the country’s hyper-competitive educational system and job market [62]. The findings suggest that mental health services should be particularly targeted toward younger populations, including students and young professionals, to help them navigate these pressures. This could involve integrating mental health support into educational institutions and early-career workplaces. Also, a lower monthly income was strongly associated with higher rates of depression, a finding that aligns with numerous studies on the impact of economic insecurity on mental health [63]. However, the analysis with the scatter plots also indicated that even higher income levels were associated with increased depression, suggesting that financial success brings its own set of challenges, such as maintaining wealth or managing heightened expectations. This highlights the need for mental health interventions that address the psychological burdens of both poverty and financial stress in wealthier individuals. Mental health policies must be designed to support individuals across the entire socioeconomic spectrum.
Married individuals were found to have a lower likelihood of depression compared to single or separated individuals, reinforcing the protective role that marital and social support systems play in mental health [64]. In addition, the results show that individuals in larger, more complex households report slightly higher levels of depression. This reflects the dual pressures of caregiving and financial responsibility that come with supporting extended families. In South Korea, where Confucian values emphasize filial piety and caregiving obligations, the mental toll on individuals caring for both older and younger generations can be significant. Social policies should focus on providing support for multi-generational households, including financial aid and mental health services specifically designed for caregivers. Similarly, a greater number of family members was associated with higher depression levels. This likely reflects the caregiving and financial burdens that come with supporting large families. In the context of South Korea’s Confucian culture, where caregiving responsibilities are often borne by family members, individuals supporting large families face significant stress. Social policies should consider offering more substantial support for larger households, particularly those with dependents, to alleviate these pressures. On the other hand, homeownership is generally seen as a marker of financial stability, and the study confirms that individuals who own property are less likely to experience depression compared to those without property. However, the stress associated with maintaining property ownership, especially in the context of South Korea’s highly competitive real estate market, might explain why individuals with multiple properties still report depression. This finding indicates that while increasing access to homeownership can improve mental health, policies should also address the stresses of maintaining and financing real estate, especially in urban areas where housing costs are high.
Higher education levels were associated with increased depression, contradicting the traditional view that education serves as a protective factor [65]. This can be attributed to the immense pressures tied to academic and professional success in South Korea. Higher education is often linked to high expectations for professional achievement, leading to increased stress and burnout. Educational systems and workplaces should implement mental health support to help individuals manage the pressures associated with high-level academic and professional responsibilities. The education levels of both parents were found to have a less direct but still notable influence on depression. While not as strongly associated as individual education, the education levels of parents, especially the education level of mothers, reflect broader family expectations and socioeconomic background, which can create added pressure on individuals to succeed academically and professionally. This suggests that interventions targeting family dynamics and expectations could play a role in alleviating mental health issues, especially in households with higher educational attainment. Depression levels varied across different occupational categories, with service workers and professionals reporting higher levels of depression compared to manual laborers. This reflects the demanding nature of white-collar jobs in South Korea, where individuals face long hours and high expectations. Policies aimed at improving workplace mental health, such as reducing excessive working hours and promoting a healthier work–life balance, are essential. Employers should also offer mental health resources tailored to the specific challenges faced by different occupational groups. As expected, longer working hours were strongly associated with higher levels of depression. South Korea’s reputation for long working hours is well documented [27], and this study reinforces the negative mental health impact of excessive work. Policies that promote work–life balance, such as limiting overtime and encouraging flexible working arrangements, are crucial to improving mental health outcomes in the workforce.
Overall, this research contributes to both policy academic perspectives, as follows: From a policy perspective, the findings highlight the need for targeted interventions that address both psychological and socioeconomic factors. The association between income, education, and depression suggests that financial support programs and mental health services should be expanded, particularly for individuals experiencing the dual pressures of work and caregiving. Policies aimed at promoting work–life balance, reducing the demands of the educational system, and providing mental health resources in the workplace are essential. From a health informatics perspective, the application of logistic regression and odds ratio analysis demonstrates the potential of predictive analytics to inform mental health interventions. Logistic regression, with its ability to produce interpretable results, is a valuable tool for clinicians and policymakers. By incorporating predictive models into electronic health records (EHRs) systems, healthcare providers can identify at-risk individuals and offer early interventions, improving mental health outcomes. The use of odds ratios further allows for an accessible interpretation of how socioeconomic factors contribute to depression, facilitating actionable insights in clinical settings [66].
While this study provides significant insights, several limitations must be acknowledged. The use of repeated cross-sectional data limits the ability to draw causal inferences, as individual-level changes over time cannot be tracked. Future studies should employ longitudinal data to better understand the causal pathways between socioeconomic factors and depression. This study is grounded in the unique cultural and societal context of South Korea, characterized by Confucian values and intense educational and work-related pressures. The findings may not be generalizable to other cultural contexts. Replicating this study in other East Asian countries with similar societal pressures, such as Japan and China, could provide a broader understanding of the relationship between these variables and depression. The reliance on self-reported data for both socioeconomic factors and depression introduces the potential for bias. Future studies should incorporate more objective measures, such as clinical assessments of depression and verified socioeconomic data, to strengthen the validity of the findings. Also, future research could incorporate additional factors such as genetic predispositions, access to mental health resources, and urban–rural disparities to provide a more comprehensive understanding of depression. Qualitative data, such as interviews or focus groups, could add contextual richness to uncover the nuanced experiences of individuals facing socioeconomic pressures. Cross-cultural comparisons with East Asian countries like Japan and China, which share Confucian values, may refine policy recommendations by accounting for regional sociocultural similarities and differences. Additionally, longitudinal studies are needed to establish causal relationships and track changes in depression over time. Finally, while benchmarking results show some methods with comparable AUC values, such as GB and LDA, the lower brier scores suggest potential variability in calibration performance. Future research should explore how conclusions may differ when methods with lower Brier scores are prioritized, potentially uncovering insights for improving clinical or policy-related decision-making.

6. Conclusions

This study provides a detailed analysis of the socioeconomic determinants of depression in South Korea, utilizing a robust dataset and applying logistic regression to generate interpretable insights. The results underscore the importance of addressing both psychological and socioeconomic factors in mental health interventions. Future research should employ longitudinal designs to capture individual changes over time and replicate these findings in different cultural contexts. By leveraging predictive analytics and focusing on the socioeconomic dimensions of mental health, this study offers a comprehensive framework for improving mental health outcomes through targeted policy interventions and health informatics tools.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available in the Korea Disease Control and Prevention Agency (KDCA) repository (https://knhanes.kdca.go.kr/knhanes/eng/index.do, accessed on 13 November 2024).

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ABAdaboost
DTDecision Tree
GBGradient Boosting
KDCAKorea Disease Control and Prevention Agency
KNHANESKorean National Health and Nutrition Examination Survey
KNNK-Nearest Neighbors
LDALinear Discriminant Analysis
LRLogistic Regression
NBNaïve Bayes
PHQ-9Patient Health Questionnaire-9
RFRandom Forest
ROCReceiver Operating Characteristic
SESSocioEconomic Status
SVMSupport Vector Machine
TabPFNTabular Prior-Function Network
TabTranTabular Transformer
XGBoostExtreme Gradient Boosting

References

  1. WHO. Depressive Disorder (Depression). Available online: https://www.who.int/news-room/fact-sheets/detail/depression (accessed on 22 January 2025).
  2. Lee, K.-m.; Park, M.-b.; Kim, E.-a.; Lim, S.-c.; Kang, S.-h.; Kim, S.-h.; Kim, E.-s.; Kim, J.-h. A Study on Spatial Autocorrelation according to the Geographical Distribution of Major Health Indicators: Focusing on Regional Units in Chungcheong Province. Public Health Wkly. Rep. 2024, 17, 644–645. [Google Scholar] [CrossRef]
  3. Kumar, P.M.; Kumar, V.U.; Meenakshi, S.; Bahekar, T.N.; Narapaka, P.K.; Dhingra, S.; Murti, K. Epidemiology and risk factors of mental disorders. In A Review on Diverse Neurological Disorders; Elsevier: Amsterdam, The Netherlands, 2024; pp. 3–12. [Google Scholar]
  4. Moitra, M.; Owens, S.; Hailemariam, M.; Wilson, K.S.; Mensa-Kwao, A.; Gonese, G.; Kamamia, C.K.; White, B.; Young, D.M.; Collins, P.Y. Global mental health: Where we are and where we are going. Curr. Psychiatry Rep. 2023, 25, 301–311. [Google Scholar] [CrossRef] [PubMed]
  5. Cooper, R. Diagnosing the Diagnostic and Statistical Manual of Mental Disorders; Routledge: London, UK, 2018. [Google Scholar]
  6. Adler, N.E.; Boyce, T.; Chesney, M.A.; Cohen, S.; Folkman, S.; Kahn, R.L.; Syme, S.L. Socioeconomic status and health: The challenge of the gradient. Am. Psychol. 1994, 49, 15–24. [Google Scholar] [CrossRef]
  7. Lorant, V.; Deliège, D.; Eaton, W.; Robert, A.; Philippot, P.; Ansseau, M. Socioeconomic inequalities in depression: A meta-analysis. Am. J. Epidemiol. 2003, 157, 98–112. [Google Scholar] [CrossRef] [PubMed]
  8. Hudson, C.G. Socioeconomic status and mental illness: Tests of the social causation and selection hypotheses. Am. J. Orthopsychiatry 2005, 75, 3–18. [Google Scholar] [CrossRef] [PubMed]
  9. Wang, J.L.; Schmitz, N.; Dewa, C. Socioeconomic status and the risk of major depression: The Canadian National Population Health Survey. J. Epidemiol. Community Health 2010, 64, 447–452. [Google Scholar] [CrossRef] [PubMed]
  10. Miech, R.A.; Caspi, A.; Moffitt, T.E.; Wright, B.R.E.; Silva, P.A. Low socioeconomic status and mental disorders: A longitudinal study of selection and causation during young adulthood. Am. J. Sociol. 1999, 104, 1096–1131. [Google Scholar] [CrossRef]
  11. Reiss, F. Socioeconomic inequalities and mental health problems in children and adolescents: A systematic review. Soc. Sci. Med. 2013, 90, 24–31. [Google Scholar] [CrossRef] [PubMed]
  12. Siegrist, J.; Marmot, M. Health inequalities and the psychosocial environment—Two scientific challenges. Soc. Sci. Med. 2004, 58, 1463–1473. [Google Scholar] [CrossRef] [PubMed]
  13. Stansfeld, S.; Candy, B. Psychosocial work environment and mental health—A meta-analytic review. Scand. J. Work Environ. Health 2006, 32, 443–462. [Google Scholar] [CrossRef]
  14. Reme, B.-A.; Wörn, J.; Skirbekk, V. Longitudinal evidence on the development of socioeconomic inequalities in mental health due to the COVID-19 pandemic in Norway. Sci. Rep. 2022, 12, 3837. [Google Scholar] [CrossRef]
  15. Kong, L.; Zhang, H. Latent profile analysis of depression in non-hospitalized elderly patients with hypertension and its influencing factors. J. Affect. Disord. 2023, 341, 67–76. [Google Scholar] [CrossRef] [PubMed]
  16. Pangarkar, S.C.; Paigude, S.; Banait, S.S.; Ajani, S.N.; Mange, P.; Bramhe, M.V. Occupational Stress and Mental Health: A Longitudinal Study in High-Stress Professions. South East. Eur. J. Public Health 2023, XXI, 68–80. [Google Scholar] [CrossRef]
  17. Zheng, G.; Lyu, X.; Pan, L.; Chen, A. The role conflict-burnout-depression link among Chinese female health care and social service providers: The moderating effect of marriage and motherhood. BMC Public Health 2022, 22, 230. [Google Scholar] [CrossRef]
  18. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
  19. Lemstra, M.; Neudorf, C.; D’Arcy, C.; Kunst, A.; Warren, L.M.; Bennett, N.R. A systematic review of depressed mood and anxiety by SES in youth aged 10–15 years. Can. J. Public Health 2008, 99, 125–129. [Google Scholar] [CrossRef]
  20. Parnia, A.; Siddiqi, A. Socioeconomic disparities in smoking are partially explained by chronic financial stress: Marginal structural model of older US adults. J. Epidemiol. Community Health 2020, 74, 248–254. [Google Scholar] [CrossRef] [PubMed]
  21. Braveman, P.A.; Cubbin, C.; Egerter, S.; Williams, D.R.; Pamuk, E. Socioeconomic disparities in health in the United States: What the patterns tell us. Am. J. Public Health 2010, 100, S186–S196. [Google Scholar] [CrossRef] [PubMed]
  22. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  23. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  24. Patel, V.; Burns, J.K.; Dhingra, M.; Tarver, L.; Kohrt, B.A.; Lund, C. Income inequality and depression: A systematic review and meta-analysis of the association and a scoping review of mechanisms. World Psychiatry 2018, 17, 76–89. [Google Scholar] [CrossRef] [PubMed]
  25. Kim, S.-Y.; Shin, Y.-C.; Oh, K.-S.; Shin, D.-W.; Lim, W.-J.; Cho, S.J.; Jeon, S.-W. Gender and age differences in the association between work stress and incident depressive symptoms among Korean employees: A cohort study. Int. Arch. Occup. Environ. Health 2020, 93, 457–467. [Google Scholar] [CrossRef]
  26. Oh, B.C.; Yeon, J.-Y.; Lee, H.-S.; Lee, D.W.; Park, E.-C. Correlation between private education costs and parental depression in South Korea. BMC Public Health 2020, 20, 972. [Google Scholar] [CrossRef]
  27. Lee, Y. Norms about childcare, working hours, and fathers’ uptake of parental leave in South Korea. Community Work. Fam. 2023, 26, 466–491. [Google Scholar] [CrossRef]
  28. Kweon, S.; Kim, Y.; Jang, M.-j.; Kim, Y.; Kim, K.; Choi, S.; Chun, C.; Khang, Y.-H.; Oh, K. Data resource profile: The Korea national health and nutrition examination survey (KNHANES). Int. J. Epidemiol. 2014, 43, 69–77. [Google Scholar] [CrossRef]
  29. Bambra, C. Placing intersectional inequalities in health. Health Place 2022, 75, 102761. [Google Scholar] [CrossRef] [PubMed]
  30. Mohammadi, S.; Seyedmirzaei, H.; Salehi, M.A.; Jahanshahi, A.; Zakavi, S.S.; Dehghani Firouzabadi, F.; Yousem, D.M. Brain-based sex differences in depression: A systematic review of neuroimaging studies. Brain Imaging Behav. 2023, 17, 541–569. [Google Scholar] [CrossRef] [PubMed]
  31. Bader, M.; Abdelwanis, M.; Maalouf, M.; Jelinek, H.F. Detecting depression severity using weighted random forest and oxidative stress biomarkers. Sci. Rep. 2024, 14, 16328. [Google Scholar] [CrossRef] [PubMed]
  32. Zhao, T.; Tlachac, M. Bayesian Optimization with Tree Ensembles to Improve Depression Screening on Textual Datasets. IEEE Trans. Affect. Comput. 2024. [Google Scholar] [CrossRef]
  33. Flores, R.; Tlachac, M.; Shrestha, A.; Rundensteiner, E.A. WavFace: A Multimodal Transformer-based Model for Depression Screening. IEEE J. Biomed. Health Inform. 2025. [Google Scholar] [CrossRef]
  34. Chen, J.; Yao, H.; Zhao, S.; Zhang, Y. Trusted commonsense knowledge enhanced depression detection based on three-way decision. Expert Syst. Appl. 2025, 263, 125671. [Google Scholar] [CrossRef]
  35. Thirupathi, L.; Kaashipaka, V.; Dhanaraju, M.; Katakam, V. AI and IoT in Mental Health Care: From Digital Diagnostics to Personalized, Continuous Support. In Intelligent Systems and IoT Applications in Clinical Health; IGI Global: Hershey, PA, USA, 2025; pp. 271–294. [Google Scholar]
  36. Liu, C.; Cheng, S.; Shi, M.; Shah, A.; Bai, W.; Arcucci, R. Imitate: Clinical prior guided hierarchical vision-language pre-training. IEEE Trans. Med. Imaging 2024, 44, 519–529. [Google Scholar] [CrossRef]
  37. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  38. Peng, C.-Y.J.; Lee, K.L.; Ingersoll, G.M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
  39. Breiman, L. Random forests. MLear 2001, 45, 5–32. [Google Scholar]
  40. Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  41. Kuhn, M. Applied Predictive Modeling; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  42. Variš, D.; Bojar, O. Sequence length is a domain: Length-based overfitting in transformer models. arXiv 2021, arXiv:2109.07276. [Google Scholar]
  43. Park, Y.; Ho, J.C. Tackling overfitting in boosting for noisy healthcare data. IEEE Trans. Knowl. Data Eng. 2019, 33, 2995–3006. [Google Scholar] [CrossRef]
  44. King, G.; Zeng, L. Logistic regression in rare events data. Political Anal. 2001, 9, 137–163. [Google Scholar] [CrossRef]
  45. Kroenke, K.; Spitzer, R.L.; Williams, J.B. The PHQ-9: Validity of a brief depression severity measure. J. Gen. Intern. Med. 2001, 16, 606–613. [Google Scholar] [CrossRef]
  46. Liu, W.; Li, W.; Wang, Y.; Yin, C.; Xiao, C.; Hu, J.; Huang, L.; Huang, F.; Liu, H.; Chen, Y. Comparison of the EPDS and PHQ-9 in the assessment of depression among pregnant women: Similarities and differences. J. Affect. Disord. 2024, 351, 774–781. [Google Scholar] [CrossRef]
  47. Manea, L.; Gilbody, S.; McMillan, D. A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. Gen. Hosp. Psychiatry 2015, 37, 67–75. [Google Scholar] [CrossRef]
  48. Costantini, L.; Pasquarella, C.; Odone, A.; Colucci, M.E.; Costanza, A.; Serafini, G.; Aguglia, A.; Murri, M.B.; Brakoulias, V.; Amore, M. Screening for depression in primary care with Patient Health Questionnaire-9 (PHQ-9): A systematic review. J. Affect. Disord. 2021, 279, 473–483. [Google Scholar] [CrossRef]
  49. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  50. Tibshirani, R.J.; Efron, B. An introduction to the bootstrap. In Monographs on Statistics and Applied Probability; Chapman & Hall: New York, NY, USA; London, UK, 1993; Volume 57, pp. 1–436. [Google Scholar]
  51. Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
  52. Szumilas, M. Explaining odds ratios. J. Can. Acad. Child Adolesc. Psychiatry 2010, 19, 227–229. [Google Scholar]
  53. Brier, G.W. Verification of forecasts expressed in terms of probability. MWRv 1950, 78, 1–3. [Google Scholar] [CrossRef]
  54. Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
  55. Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2017, 2, 230–243. [Google Scholar] [CrossRef] [PubMed]
  56. Zhang, J.; Kai, F.Y. What’s the relative risk?: A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 1998, 280, 1690–1691. [Google Scholar] [CrossRef]
  57. Greenland, S. Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. Am. J. Epidemiol. 2004, 160, 301–305. [Google Scholar] [CrossRef] [PubMed]
  58. Norton, E.C.; Wang, H.; Ai, C. Computing interaction effects and standard errors in logit and probit models. Stata J. 2004, 4, 154–167. [Google Scholar] [CrossRef]
  59. Persoskie, A.; Ferrer, R.A. A most odd ratio: Interpreting and describing odds ratios. Am. J. Prev. Med. 2017, 52, 224–228. [Google Scholar] [CrossRef] [PubMed]
  60. Peterson, B.; Harrell, F.E., Jr. Partial proportional odds models for ordinal response variables. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 1990, 39, 205–217. [Google Scholar] [CrossRef]
  61. Albert, P.R. Why is depression more prevalent in women? J. Psychiatry Neurosci. 2015, 40, 219–221. [Google Scholar] [CrossRef]
  62. Byun, S.-y.; Schofer, E.; Kim, K.-k. Revisiting the role of cultural capital in East Asian educational systems: The case of South Korea. Sociol. Educ. 2012, 85, 219–239. [Google Scholar] [CrossRef]
  63. Witteveen, D.; Velthorst, E. Economic hardship and mental health complaints during COVID-19. Proc. Natl. Acad. Sci. USA 2020, 117, 27277–27284. [Google Scholar] [CrossRef]
  64. McCann, T.V.; Bamberg, J.; McCann, F. Family carers’ experience of caring for an older parent with severe and persistent mental illness. Int. J. Ment. Health Nurs. 2015, 24, 203–212. [Google Scholar] [CrossRef] [PubMed]
  65. Deary, I.J.; Whiteman, M.C.; Starr, J.M.; Whalley, L.J.; Fox, H.C. The impact of childhood intelligence on later life: Following up the Scottish mental surveys of 1932 and 1947. J. Personal. Soc. Psychol. 2004, 86, 130–147. [Google Scholar] [CrossRef] [PubMed]
  66. Bland, J.M.; Altman, D.G. The odds ratio. BMJ 2000, 320, 1468. [Google Scholar] [CrossRef]
Figure 1. Scatter plots showing observed risk of depression with dependent variables.
Figure 1. Scatter plots showing observed risk of depression with dependent variables.
Electronics 14 00487 g001
Table 1. Descriptive statistics.
Table 1. Descriptive statistics.
VariableValueDescriptionCount%
year2014Year 2014424717.5
2016Year 2016504020.7
2018Year 2018550322.6
2020Year 2020494220.3
2022Year 2022457618.8
gender1Male10,58443.5
2Female13,72456.5
age119 years old2170.9
220~29 years old281311.6
330~39 years old379115.6
440~49 years old434117.9
550~59 years old449218.5
660~69 years old448818.5
770~79 years old321113.2
880 years old or older9553.9
income_monthly191.92 or less Million KRW472219.4
291.93~188.08 Million KRW486620.0
3188.09~285.00 Million KRW497220.5
4285.00~423.39 Million KRW486220.0
5423.39 or more Million KRW488620.1
number_of_family_member11301612.4
22747330.7
33613025.2
44560723.1
5515646.4
66 or more5182.1
household_composition1Single-generation household: single-person301612.4
2Single-generation household: couple576323.7
3Single-generation household: others3541.5
4Two-generation household: couple and unmarried children10,02941.3
5Two-generation household: single parent and unmarried children20778.5
6Two-generation household: others11824.9
7Three-generation or more household18877.8
house_ownership_status1No property750230.9
2Owns one property13,40655.2
3Owns two or more properties340014.0
marriage_status1Married, living together16,67868.6
2Married, separated1530.6
3Widowed20028.2
4Divorced11114.6
88Not married436418.0
education1Traditional Korean school340.1
2No formal education7112.9
3Elementary school337513.9
4Middle school251710.4
5High school665727.4
62-year/3-year college338013.9
74-year university630225.9
8Graduate school13325.5
education_father1Traditional Korean school16406.7
2No formal education7363.0
3Elementary school509621.0
4Middle school283111.6
5High school463119.1
62-year/3-year college5862.4
74-year university21748.9
8Graduate school5092.1
88Not applicable391016.1
99Unknown21959.0
education_mother1Traditional Korean school340114.0
2No formal education3381.4
3Elementary school624325.7
4Middle school273211.2
5High school426117.5
62-year/3-year college4591.9
74-year university10864.5
8Graduate school1800.7
88Not applicable391016.1
99Unknown16987.0
occupation1Manager3761.5
2Professional and related worker419517.3
3Office worker421417.3
4Service worker244710.1
5Sales worker264210.9
6Skilled agricultural, forestry, and fishery worker16286.7
7Craft and related trades worker20158.3
8Plant, machine operator, and assembler18857.8
9Elementary occupation worker21308.8
10Soldier1040.4
88Not working267211.0
weekly_working_hour19 h or less875536.0
210~19 h16937.0
320~29 h18727.7
430~39 h498120.5
540~49 h366415.1
650~59 h19608.1
760~69 h8233.4
870~79 h3061.3
980~89 h1730.7
1090~99 h530.2
11100~109 h170.1
12110~119 h90.0
13120 h or more20.0
depression0~9Depression12865.3
10~27No depression22,94094.7
Note: To calculate the p-value, chi-squared tests were used to analyze the association between categorical variables (gender, income_monthly, household_composition, house_ownership_status, marriage_status, education, education_father, education_mother, occupation) and depression status, assessing whether the distribution of the variables differed between individuals with and without depression. For continuous variables (number_of_family_member, age, weekly_working_hour), independent t-tests were performed to compare means between the two groups.
Table 2. Model performance comparison results.
Table 2. Model performance comparison results.
ModelHL chi2HL pBrier [95% CI]AUC [95% CI]AccuracyRecallPrecisionF1
LR5820.4190.0000.218 [0.216–0.220]0.686 [0.669–0.703]0.9430.6320.7510.686
RF210.7490.0000.052 [0.050–0.054]0.644 [0.626–0.662]0.9440.5680.7450.645
GB303.0130.0000.049 [0.047–0.051]0.686 [0.668–0.702]0.9460.6120.6250.618
AB6538.7490.0000.235 [0.234–0.235]0.683 [0.667–0.697]0.9470.6050.7370.665
DT85.2090.0000.107 [0.102–0.111]0.526 [0.514–0.537]0.8930.5920.7490.661
KNN79.5950.0000.057 [0.054–0.060]0.553 [0.539–0.569]0.9430.6110.6310.621
SVM22.7070.0040.050 [0.047–0.052]0.537 [0.517–0.556]0.9470.6320.760.69
NB376.2700.0000.054 [0.052–0.056]0.672 [0.654–0.688]0.9380.6210.6850.651
LDA460.8150.0000.048 [0.046–0.050]0.684 [0.666–0.701]0.9470.5870.740.655
XGB190.1650.0000.054 [0.052–0.057]0.638 [0.623–0.655]0.9400.6140.6650.638
TabTran147.0990.0000.131 [0.128–0.134]0.662 [0.660–0.664]0.9450.6250.7550.684
TabPFN155.9840.0000.154 [0.152–0.156]0.614 [0.610–0.618]0.9480.6030.7350.662
LR: logistic regression; RF: Random Forest; GB: Gradient Boosting; AB: Adaboost; DT: Decision Tree; KNN: K-Nearest Neighbor; SVM: Support Vector Machine; NB: Naïve Bayes; LDA: Linear Discriminant Analysis; XGB: Extreme Gradient Boosting; TabTran: Tabular Transformer; TabPFN: Tabular Prior-Function Network.
Table 3. Logistic regression results with odds ratio.
Table 3. Logistic regression results with odds ratio.
VariableOdds RatioStandard Errorz-Valuep-Value
const0.4960.284−2.4740.013
gender1.6220.0677.2510.000
age0.8810.027−4.6290.000
income_monthly0.8320.023−7.9690.000
number_of_family_member0.8710.042−3.2510.001
household_composition1.0600.0252.3230.020
house_ownership_status0.7490.050−5.7360.000
marriage_status1.1870.0227.9180.000
education0.7690.028−9.4210.000
education_father1.0190.0151.2860.198
education_mother0.9600.014−2.8500.004
occupation0.9910.011−0.8360.403
weekly_working_hour0.9940.001−4.5330.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, C. Exploring Factors Influencing Depression: Socioeconomic Perspectives Using Machine Learning Analytics. Electronics 2025, 14, 487. https://doi.org/10.3390/electronics14030487

AMA Style

Kim C. Exploring Factors Influencing Depression: Socioeconomic Perspectives Using Machine Learning Analytics. Electronics. 2025; 14(3):487. https://doi.org/10.3390/electronics14030487

Chicago/Turabian Style

Kim, Cheong. 2025. "Exploring Factors Influencing Depression: Socioeconomic Perspectives Using Machine Learning Analytics" Electronics 14, no. 3: 487. https://doi.org/10.3390/electronics14030487

APA Style

Kim, C. (2025). Exploring Factors Influencing Depression: Socioeconomic Perspectives Using Machine Learning Analytics. Electronics, 14(3), 487. https://doi.org/10.3390/electronics14030487

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop