Next Article in Journal
The Impact of Earnings Announcements Before and After Regular Market Hours on Asset Price Dynamics in the Fintech Era
Previous Article in Journal
Unraveling the Impact of Product Market Competition and Earnings Volatility on Zero-Leverage Policies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Innovative Credit Risk Assessment: Leveraging Social Media Data for Inclusive Credit Scoring in Indonesia’s Fintech Sector

by
Andry Alamsyah
*,
Aufa Azhari Hafidh
and
Annisa Dwiyanti Mulya
School of Economics and Business, Telkom University, Bandung 40257, Indonesia
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2025, 18(2), 74; https://doi.org/10.3390/jrfm18020074 (registering DOI)
Submission received: 31 October 2024 / Revised: 2 January 2025 / Accepted: 29 January 2025 / Published: 2 February 2025

Abstract

:
The financial technology domain has undertaken significant strides toward more inclusive credit scoring systems by integrating alternative data sources, prompting an exploration of how we can further simplify the process of efficiently assessing creditworthiness for the younger generation who lack traditional credit histories and collateral assets. This study introduces a novel approach leveraging social media analytics and advanced machine learning techniques to assess the creditworthiness of individuals without traditional credit histories and collateral assets. Conventional credit scoring methods tend to rely heavily on central bank credit information, especially traditional collateral assets such as property or savings accounts. We leverage demographics, personality, psycholinguistics, and social network data from LinkedIn profiles to develop predictive models for a comprehensive financial reliability assessment. Our credit scoring methods propose scoring models to produce continuous credit scores and classification models to categorize potential borrowers—particularly young individuals lacking traditional credit histories or collateral assets—as either good or bad credit risks based on expert judgment thresholds. This innovative approach questions conventional financial evaluation methods and enhances access to credit for marginalized communities. The research question addressed in this study is how to develop a credit scoring mechanism using social media data. This research contributes to the advancing fintech landscape by presenting a framework that has the potential to transform credit scoring practices to adapt to modern economic activities and digital footprints.

1. Introduction

The increasing emphasis on innovative credit scoring reflects a growing recognition of its significance in addressing the limitations of the conventional approach. When measuring credit scores, the conventional approach based on a single universally applicable model is successful for individuals with considerable financial resources. However, the conventional approach frequently fails to evaluate the creditworthiness of those lacking a traditional financial history and collateral assets despite their demonstrated tendency to repay (Ahelegbey & Giudici, 2023). This limitation is particularly acute in emerging economies, such as Indonesia, where innovative financial technologies have significantly expanded access to credit, as evidenced by the growth of the peer-to-peer (P2P) lending sector (Annur, 2023). In August 2023, Indonesia’s P2P lending fintech industry distributed loans totaling IDR 20.53 trillion, representing an increase in the number of loans from the previous year (Annur, 2023). Conventional financial credit systems rely heavily on central bank credit information, which may not provide sufficient data to support personal credit assessments. Financial institutions place a significant emphasis on credit assessment because it measures individual default risk in a credit application. Moreover, financial institutions have the potential to generate substantial profits while minimizing losses from defaults (Niu et al., 2019).
In the evolving financial technology landscape, alternative approaches have become essential to ensure equitable credit assessment and broaden access to financial services. Recent innovations have focused on leveraging unconventional data sources such as rental payments, utility bills, online behavior, e-commerce transaction history, and even social media data (Niu et al., 2019; Wijaya, 2023). Alternative approaches provide a more comprehensive assessment of an applicant’s creditworthiness, extending beyond their financial history. An alternative approach using unconventional data has been incorporated into credit scoring procedures to address the limitations of the conventional approach and foster a more inclusive financial landscape (Jagtiani & Lemieux, 2019).
The challenge posed by the conventional credit scoring approach frequently excludes numerous individuals from credit assessment due to a lack of traditional credit histories and collateral assets (Djeundje et al., 2021; Knutson, 2020). This challenge highlights the pressing necessity to reconsider and redesign the credit scoring system, particularly for Generation Z, who may lack tangible assets but possess significant intangible assets such as education and digital presence (A. Hernández Kent et al., 2019; Arya et al., 2019; R. Fry, 2013). Despite these advancements, there remains a research gap in developing a comprehensive credit scoring framework that systematically integrates alternative data sources, such as social media analytics, with advanced machine learning models. The Pew Research Center revealed that Generation Z is adopting a distinct approach to education compared to earlier generations. This is evidenced by the fact that a more significant proportion of them attend college and live with parents who have completed their higher education. The educational background means they have a stronger foundation for handling financial matters, which could be important when assessing their creditworthiness without traditional collateral assets (Parker & Igielnik, 2020). To address this challenge, it is necessary to delve into big data, which presents a powerful solution due to its diverse array of data sources (Knutson, 2020). Individuals with inadequate credit history and collateral do not meet the criteria for borrowing in the conventional credit scoring approach. Nevertheless, these same individuals could qualify for a loan through innovative credit scoring methods that utilize alternative data and advanced algorithms (Carroll & Rehmani, 2017; Puteri Ramadhani et al., 2022).
Social media data in credit scoring models can offer valuable insights for lenders. Hence, we use four data features from social media, including demographics, personality, psycholinguistics, and social networks (Pennebaker et al., 2014; Ramadhani et al., 2022; Sinha, 2014; Tan et al., 2015; Zusrony et al., 2019). LinkedIn stands out as a professional social media platform for its focus on professional and economic information, which directly aligns with key creditworthiness indicators. Unlike general-purpose social media platforms, LinkedIn profiles are often self-verified and maintained with accuracy, as they are tied to users’ professional reputations and career prospects. The accessible data include user demographics, professional information, connections, and activity data (posts, comments, engagement), offering valuable insights into users’ economic and professional profiles. For example, demographic information, such as educational level, salary, occupation, and employment history, reflects financial stability and earning potential, both of which are critical factors in credit scoring (Bo Wen et al., 2013; Statistics Indonesia, 2023a, 2023b, 2023c). Furthermore, LinkedIn’s professional focus minimizes the risk of misleading or overly casual data entries, unlike other platforms where social engagement might not directly correlate with financial behavior. Demographics help us understand the characteristics of our target borrowers. LinkedIn’s professional environment offers insights into personalities and psycholinguistics through users’ textual content (Bradbury, 2011; Pennebaker et al., 2014; Pennebaker & Boyd, 2015; Ramadhani et al., 2022). Personality traits shed light on individual attitudes and motivations. Moreover, psycholinguistic analysis allows a deeper understanding of online communication’s language patterns and emotional expressions. Furthermore, the nature and quality of their professional network connections provide insight into their social network (Alamsyah et al., 2018; Tan et al., 2015; Zusrony et al., 2019). Social networks reveal connections between users and their influence on each other’s activities.
Machine learning methods have revolutionized credit scoring (Mokheleli & Museba, 2023). These technologies enable detailed analysis of complex datasets, including those from social media, to generate predictive insights about individuals or businesses. FICO Scores Research and Development has conducted empirical research on the advantages and disadvantages of utilizing recent AI and machine learning methods for credit scoring. One of the most significant advantages is the efficiency with which highly predictive models can be developed using machine learning methods (Fahner, 2019). Machine learning methods can adapt to changing financial behavior over time. Machine learning methods offer a more comprehensive credit assessment and improve model performance.
A candidate for solving the problem is stacking, an ensemble method in machine learning that combines various models to improve prediction performances. Stacking uses out-of-fold predictions from the base models as input for training the meta-model, allowing it to learn from general predictions (Wolpert, 1992). This method aims to leverage the strengths of various models by using their collective insights to train the meta-model, enhancing model performance through a broadened range of algorithms and minimizing generalization errors. Stacking finds applications in diverse real-world scenarios such as the finance sector, daily reference evapotranspiration estimation, and industrial fault detection (Kun et al., 2020; Moshrefi et al., 2024; Wu et al., 2021). Stacking is widely applicable and highlights its robustness and reliability in handling complex multidimensional datasets.
The strategic integration of credit into wealth management is essential and represents a significant innovation in the financial field (Royal Bank of Canada, 2024). In an individual context, credit scores can contribute to wealth accumulation by emphasizing how individuals can strategically utilize credit to enhance their assets and financial well-being. Initially, the role of credit was emphasized in financing education, purchasing vehicles, personal real estate acquisitions, as well as debt consolidation and repayment for many individuals. However, over time, credit can play a fundamental role in both creating and safeguarding wealth while enabling individuals to fully capitalize on business and investment prospects as they arise (Royal Bank of Canada, 2024). A high credit score enables access to financial products with more favorable terms, such as lower interest rates and higher credit limits, which can be utilized to invest in value-generating assets or expand a business (Fernandez Vidal & Barbon, 2019).
There is a significant methodology gap in existing research, reflected in the limited use of social media data in credit scoring models. This paper proposes a detailed approach to answer the methodology gap by exploring how data from specific social media sources can be leveraged to derive creditworthiness scores and labels. Our proposed methodology seeks to enhance existing research (Muñoz-Cancino et al., 2023; Niu et al., 2019; Orlova, 2021) that utilizes social media data. Niu et al. discovered that financial institutions can use alternative data sources like social networks in addition to conventional credit data, particularly when there is a lack of credit history information available (Niu et al., 2019). They incorporate social network information from mobile phones and use logistic regression analysis to reveal a strong correlation between social network information and loan default, emphasizing the predictive power of social network data in assessing creditworthiness.
Moreover, Orlova et al. present a novel approach integrating conventional financial indicators with digital footprint information, such as online behavior and social media activities (Orlova, 2021). Their study offers a strategy for clustering and classifying borrowers, enabling more refined segmentation according to risk profiles and personalized financial products. Additionally, Muñoz-Cancino et al. suggest that integrating credit history with social interaction features significantly enhances the predictive power of creditworthiness assessment models (Muñoz-Cancino et al., 2023). The study of Muñoz-Cancino et al. demonstrates that social interaction data can be exceedingly valuable, especially when borrowers’ historical credit data are sparse or unavailable. To address the gaps in existing research, our research presents LinkedIn data to generate creditworthiness scores using a classification and scoring approach that provides a new perspective in assessing the creditworthiness of individuals without traditional credit history and collateral assets; thus, our research question is how LinkedIn social media can be utilized to develop credit scoring models.
This study aims to innovate existing credit assessment systems by leveraging social media data with machine learning technology as an alternative approach for personal credit assessment. Our study contributes to the field of credit scoring and information management. First, this study provides actionable insights for financial institutions such as banks and fintech startups. It outlines how these entities can integrate and leverage social media data within their existing credit scoring frameworks, thereby enhancing their decision-making processes and adapting to the evolving digital landscape. Furthermore, the research increases financial innovation and inclusion by developing a framework that can more effectively evaluate the creditworthiness of underserved demographic groups, such as young individuals, newcomers, and others who may lack traditional financial assets but are actively engaged on social media platforms. The remaining part of this study is organized as follows: Section 2 presents a relevant literature review; Section 3 describes the methodology used; Section 4 explains the results; Section 5 presents a discussion of this study; Section 6 highlights the research limitations of this study; and Section 7 provides the conclusion, with recommendations for future research.

2. Literature Review and Theoretical Framework

This chapter integrates a literature review and a theoretical framework to provide a comprehensive foundation for the study. Section 2.1 examines the evolution of credit scoring, highlighting the transition from conventional methods to approaches incorporating non-traditional data sources. Section 2.2 explores the role of social media data as an alternative input for credit scoring models. Moving into the theoretical framework, Section 2.3 discusses machine learning models commonly applied in creditworthiness prediction, outlining their strengths and applications. Lastly, Section 2.4 introduces the concept of stacking ensemble learning, emphasizing its potential to enhance model performance and predictive accuracy.

2.1. The Evolution and Transformation of Credit Scoring System

Credit is a fundamental financial tool that facilitates economic transactions, consumer spending, and business investments through borrowing with the promise of repayment, often with interest. While credit systems have existed for centuries, their methodologies have significantly evolved. Early credit systems were based on trust and tangible collateral assets, with historical examples including goldsmith bankers in 17th-century London, who pioneered early forms of credit creation using customer deposits (Temin & Voth, 2006; Kim, 2011).
Modern credit-scoring methods began emerging in the 1950s as a response to increasing financial complexities, and these systems remain widely used today. However, despite their long history and proven effectiveness, traditional credit scoring methods face limitations in assessing individuals without established financial records or tangible assets. This challenge underscores the need for innovative approaches leveraging alternative data sources, such as social media, to bridge existing gaps in financial inclusion.
Traditional credit scoring systems rely on a variety of factors beyond just collateral assets. Financial institutions typically evaluate creditworthiness using indicators such as income stability, past loan repayment behavior, employability, existing debt obligations, and financial behavior patterns (e.g., spending habits and savings). Collateral, while significant, is only one component of a broader evaluation framework. Studies by (Fernandez Vidal & Barbon, 2019) and reports from [e.g., World Bank and OECD] emphasize the multi-dimensional nature of traditional credit assessment models. This study acknowledges these established practices while highlighting the limitations they face in assessing individuals without credit histories or formal financial footprints.
Advancements in technology have significantly impacted the field of credit scoring. The integration of machine learning has led to more sophisticated and dynamic models (Dastile et al., 2020; Mokheleli & Museba, 2023; Orlova, 2021). These technologies allow for the analysis of various data sources and improve the accuracy and reliability of predictions regarding credit risk (Niu et al., 2019; Orlova, 2021).
The conventional credit scoring system is effective for individuals with established financial histories but cannot fairly evaluate those who lack traditional credit histories and collateral assets. The reliance on a set of financial indicators perpetuates a cycle of financial exclusion and overlooks the richness of data that could reveal an individual’s creditworthiness. For instance, social media activities demonstrate professional networks, stability, and online behavior yet are often invisible in the conventional credit approach (Guo et al., 2016; Ramadhani et al., 2022). These systemic shortcomings affect young adults, immigrants, and others who have not engaged with traditional financial products, leaving them virtually invisible to financial institutions despite potentially being creditworthy. The current approach neglects a substantial population segment that could contribute to and benefit from the financial ecosystem.
On the other hand, alternative approaches consider a broader range of data sources such as rental and utility payment histories, social media engagement, online activity, and e-commerce transaction records to expand access to credit for customers lacking traditional financial histories and collateral assets (Knutson, 2020; Wijaya, 2023). The rise of innovative credit scoring has increasingly caused financial institutions to turn to alternative approaches to credit scoring that use diverse data sources to assess the creditworthiness of potential borrowers and reduce the risk of non-performing loans. Hence, an alternative approach is advised to address the gap that relies on conventional data. The shift towards alternative approaches underscores the financial sector’s adaptation to a data-driven landscape, emphasizing the value of comprehensive data analysis in identifying creditworthiness.

2.2. Leveraging Social Media Data for Credit Scoring: Insights and Methodologies

Using social media data in credit scoring signifies a considerable shift from conventional credit assessment approaches, which have predominantly relied on financial histories and collateral assets. This innovative approach leverages individuals’ extensive digital footprints on social media platforms, transforming them into valuable insights for the credit scoring process. By integrating social media information, a more comprehensive picture of an individual’s behavior and preferences emerges, providing a novel perspective on their creditworthiness. The potential of social media data to enhance predictive performance in credit scoring models is underscored by a study that fused traditional credit information with social media activity, demonstrating an improvement in classification accuracy and highlighting the value of alternative data sources in credit assessments (Y. Zhang et al., 2016). Gul et al. introduce a multiple criteria credit rating approach that combines social media data with traditional financial measures to evaluate companies’ credibility. By incorporating social media data through sentiment analysis on Twitter, the approach enhances the precision of credit evaluations. Pairwise comparisons are used to determine the relative importance of different criteria, and the method is tested on 64 companies. The findings highlight that social media data serves as a valuable complement to traditional credit assessment methods, although it generally results in lower credit ratings. A study on P2P lending in China (Gül et al., 2018) highlights the role of social media through the extraction of sentiment and topic features from stakeholders’ social media data. These features serve as supplementary soft information, providing a proof of concept to enhance and complement traditional financial risk prediction methods (Wang et al., 2022). Another study conducted social media data into credit scoring models within a peer-to-peer (P2P) lending environment to mitigate default prediction loans. The findings highlighted that neural networks (NN) and support vector machines (SVM) outperformed other methods in predicting loan defaults. Additionally, social media data improved the accuracy of default predictions (Faturohman et al., 2024).
LinkedIn is one of the most widely used platforms. LinkedIn provides valuable insights into an individual’s professional network, skills, and career stability. Our research used textual data from LinkedIn to gain insights into borrowers’ psycholinguistics and personalities and to analyze user profiles to explore connections (Bradbury, 2011; Pennebaker & Boyd, 2015). Social behavior offers valuable insights into an individual’s financial behavior and responsibility, which are essential factors in assessing creditworthiness. In addition, we analyze user LinkedIn profiles to gather information about their age, education, occupation, and salary. Analyzing user LinkedIn profiles can help us understand the demographic background of borrowers commonly used in conventional credit scoring approaches. Lenders can make informed decisions that potentially provide access to credit for those who would otherwise be deemed ineligible.
Our study concentrates on harnessing social media as an alternative method for predicting creditworthiness scores when traditional credit histories and collateral assets are unavailable. Guo et al. proposed three social data features derived from demographics, tweets, and user networks on Weibo (Guo et al., 2016). They employ a two-tier stacking and boosting enhanced ensemble learning framework. Experimental findings indicate that the method attains an AUC value of approximately 0.625 and surpasses conventional credit scoring approaches by as much as 17% for personal credit scoring based on social data. Niu et al. investigate the use of social network data obtained from mobile phones to enhance the prediction of loan defaults on a peer-to-peer lending platform (Niu et al., 2019). The results from their machine learning algorithm demonstrate that incorporating social network information can significantly enhance the performance of predicting loan defaults, indicating that this type of information holds value for credit scoring purposes. Jagtiani et al. leverage non-traditional data sources, including utility bills, insurance claims, social network activity, mobile phone usage patterns, online shopping behavior, and investment decisions (Jagtiani & Lemieux, 2019). Their findings indicate that utilizing alternative data has allowed individuals classified as subprime according to conventional standards to obtain more affordable credit options. This approach improves financial inclusion by addressing the lack of access to credit in areas with limited banking infrastructure and extends services to worthy borrowers whom traditional banks may overlook.
Oskarsdottir et al. use advanced representation learning methods and complex network analysis to create features for a pseudo-social network that captures individual similarities (Óskarsdóttir et al., 2019). Calling behavior, pseudo-social networks, and influence scores are used as features in a credit scoring task to evaluate the predictive accuracy of different types of features. Their study identified calling behavior as the most predictive feature of creditworthiness. Studies on the corporate bond market have examined the use of Twitter and its influence on investor behavior and market trends. Research shows that combined Twitter opinions (OPI) can help predict bond returns and changes in credit default swap (CDS) spreads. OPI is linked to future changes in credit ratings, highlighting its influence in shaping investor expectations about creditworthiness (Bartov et al., 2023). Another study highlights the potential of social media as an alternative data source for corporate credit rating prediction, demonstrating that social media-based models, particularly those using K-nearest neighbor (KNN), outperform traditional methods in accuracy (Chen & Chen, 2022). Yao et al. developed a method for credit risk prediction in the supply chain that incorporates social media data to enhance the accuracy of predicting listed enterprise credit risk (Yao et al., 2023). This method involves gathering social media information, analyzing text sentiment, creating text sentiment features, and combining them with conventional features for more precise credit risk prediction. Their research revealed that the performance of text sentiment features from social media data surpasses many traditional features.

2.3. Machine Learning Models for Credit Scoring

A selection of machine learning models is chosen due to their capability to handle complex datasets and reveal patterns that statistical methods may not capture. Our study focuses on addressing the deficiencies of existing credit assessment systems by leveraging social media data with machine learning technology as an alternative approach for personal credit assessment. The proposed methodology is well-suited for machine learning models as they excel at classifying and scoring applicants by handling the complex and nonlinear relationships between input variables and outcomes.

2.3.1. Classification Based Machine Learning Model

Classification models categorize individuals into discrete classes, such as “good” or “bad” credit, based on their predicted probability of default. These models transform input data into class labels, making them essential for decision-making processes where binary outcomes (like loan approval or denial) are required.

Categorical Boosting (CatBoost Classifier)

CatBoost is an advanced machine learning algorithm based on the gradient-boosting decision tree framework, specifically designed to handle categorical variables efficiently through its ordered boosting method. This approach minimizes risks of target leakage and overfitting while improving robustness against data distribution shifts by constructing decision trees on different permutations of the data. Additionally, CatBoost supports regression, binary classification, and multi-class classification tasks, making it suitable for complex predictive modeling (Dorogush et al., 2018; Hancock & Khoshgoftaar, 2020).

Support Vector Machine (SVM Classifier)

Support vector classifier (SVC) is a powerful machine learning model that applies the principles of support vector machines (SVM) for classification. Developed from statistical learning theory by Vapnik (2000), SVC constructs a hyperplane or set of hyperplanes in a high-dimensional space, which can be used for classification, regression, or other tasks like outlier detection. The core idea is to produce the best class separation by maximizing the margin between the nearest classes’ nearest points, termed support vectors. In its simplest form, the SVC utilizes a linear hyperplane to distinguish between two classes. However, for non-linearly separable data, SVC can be extended using kernel tricks to facilitate separation in a higher-dimensional space, effectively allowing for complex problems to be tackled with increased accuracy. This method helps handle small sample sizes and high-dimensional spaces and avoids local minima effectively, thereby providing robust prediction capabilities (Beskopylny et al., 2022; X. Zhang, 2024).

Adaptive Boosting (AdaBoost Classifier)

AdaBoost is a machine learning algorithm that aims to improve the performance of weak classifiers by boosting them into stronger ones. Yoav Freund and Robert Schapire proposed in 1997. The AdaBoost algorithm works by iteratively adjusting the weights of misclassified instances (Freund & Schapire, 1997). AdaBoost allows subsequent classifiers to focus on challenging cases within the training data. In each iteration, a new classifier is introduced to correct mistakes made by its predecessors. AdaBoost assigns weights to each classifier based on accuracy and then combines them using a weighted majority vote (Schapire, 2013). The weight updating process is responsive to the errors made by classifiers in each iteration, increasing their impact on the final decision when they perform well under the existing distribution of weights. The algorithm continues iterating until the desired level of accuracy is achieved or a specified number of rounds are completed.

Logistic Regression

Logistic regression, introduced by (Cox, 1958), is a statistical method commonly applied for predicting binary outcomes (e.g., yes/no, success/failure) using continuous or categorical variables. Unlike linear regression, it does not assume linearity, normality, or homogeneity of variance, making it suitable for analyzing relationships that deviate from linear patterns. By utilizing a logistic function, it estimates the probability of various outcomes and expresses the logit (natural logarithm of the odds ratio) as a linear combination of independent variables (Park, 2013).

2.3.2. Scoring Based Machine Learning Model

Scoring models output a numerical score that quantifies an individual’s creditworthiness. This score is a continuous variable derived from the aggregation and transformation of input features through machine learning algorithms. We selected support vector regression, CatBoost regressor, LGBM regressor, and elastic net because they effectively handle regression tasks that predict continuous outcomes. These models were chosen for their ability to manage complex relationships within the data, their robustness to overfitting, and their efficiency in training and prediction. These scores from each model help quantify potential borrowers’ risk levels, guiding lending decisions.

Support Vector Regressor (SVR)

Vapnik (2000) introduce the support vector machine as a widely used machine learning tool for classification and regression tasks. Support vector regression (SVR) is categorized as a nonparametric method due to its dependence on kernel functions (Kecman & Kopriva, 2006). Support vector regression (SVR) is a form of machine learning that focuses on predicting a continuous outcome variable based on a set of predictor variables. SVR is rooted in the principles of support vector machines (SVM), which are more commonly used for classification tasks. The primary goal of SVR is to find a function that deviates from the actual observed outcomes by a value no more than a specified threshold, ε, while simultaneously being as flat as possible, which means having a minimal norm of the coefficients. This flatness makes the model less sensitive to individual data points, thus enhancing its generalization capabilities to new, unseen data. Using kernels in SVR allows it to perform non-linear regression by implicitly mapping input data into higher-dimensional feature spaces, where linear regression is then performed. This capability makes SVR highly versatile and applicable to various regression tasks where the relationship between the input and output variables is not linear (Azeez et al., 2018; Gupta et al., 2019).

Categorical Boosting (CatBoost Regressor)

Dorogush et al. introduced CatBoost in late 2018 as a member of the gradient-boosted decision trees (GBDT) family (Dorogush et al., 2018). CatBoost is known for directly handling categorical variables and employing gradient boosting on decision trees. One key feature of CatBoost is its innovative approach to encoding categories. Hence, CatBoost effectively handles categorical data in classification and regression tasks. The algorithm also utilizes a new schema for calculating leaf values during tree structure selection, reducing overfitting. Hyper-parameter tuning plays a vital role in optimizing CatBoost’s performance due to its sensitivity to hyper-parameters (Hancock & Khoshgoftaar, 2020). Cat-Boost is applied as a regressor to analyze the factors influencing the frequency of road accidents (Li et al., 2023) and predict concrete strength in the construction industry (Beskopylny et al., 2022).

Light Gradient Boosting Machine (LGBM Regressor)

LightGBM, short for light gradient boosting machine, is a high-performing gradient-boosting framework known for its speed and efficiency. Introduced by Ke et al. in 2017, it originates from Microsoft research and serves as a faster alternative to other gradient-boosting methods (Ke et al., 2017). This is due to its unique techniques, such as gradient-based one-side sampling and exclusive feature bundling. LightGBM achieves enhanced efficiency through reduced data size requirements for accurate predictions and fewer model features without decreasing accuracy. LightGBM has found successful applications as a regression tool in the real world, particularly in the pharmaceutical industry for quantitative structure–activity relationship modeling (Sheridan et al., 2021). In their study, while any boosting algorithm comes with numerous adjustable hyperparameters, LightGBM can achieve predictions almost as accurate as those produced by single-task deep neural networks using a standard set of hyperparameters.

Elastic Net

Introduced by Zou and Hastie in 2005, elastic net is a regularization technique that combines the features of both lasso and ridge regression (Zou & Hastie, 2005). Elastic net effectively addresses multicollinearity and variable selection issues in datasets with numerous features. This method emerged as an enhancement of the lasso technique to overcome some limitations of ridge regression. Combining the benefits of lasso and ridge, elastic net can reduce the number of variables in a model by eliminating coefficients (performing variable selection) while also being effective at dealing with models suffering from multicollinearity, where independent variables are highly correlated. Elastic net is useful when multiple features are correlated. The method can handle such situations better than lasso and ridge individually by grouping and shrinking correlated variables together.
Elastic net often requires the tuning of two parameters for practical applications and a more in-depth exploration of its methodology: the mixing parameter, α, which balances between ridge and lasso; and the regularization parameter, λ, which controls the overall strength of the penalty. This dual-parameter tuning can make elastic net more versatile and computationally more intensive than lasso or ridge alone. Liu and Li (2017) demonstrate the effectiveness of building stable and accurate regression models through variable selection, emphasizing its practical applications in spectral data analysis.

2.4. Proposed Stacking Ensemble Learning for Credit Scoring

Stacking, also known as stacked generalization, is a machine learning technique that enhances prediction accuracy by integrating multiple models (Wolpert, 1992). The vast amounts of data we generate through our daily activities necessitate advanced methods to effectively handle and analyze these large-scale datasets. By leveraging the strengths of various base models, stacking can process and interpret complex patterns and relationships within the data, enabling more accurate predictions and insights (Khyani et al., 2021). This approach involves training a meta-model on the predictions from these base models, which have been trained on the complete dataset. The stacking technique improves the overall performance by combining the individual strengths of the base models.
Stacking ensemble learning involves two stages: the first for training base models and the second for training the meta-model (Lu et al., 2023). During the first stage, multiple base models are trained using different algorithms or data subsets. In the second stage, the predictions of the base models are used as input to train a meta-model that combines their outputs to make the final prediction. The training dataset undergoes training through k-fold cross-validation, where the data are divided into k subsets. Each subset uses the remaining (k − 1) subsets for model training and generating predictions. The next step is then to reconstruct the prediction dataset from the k-fold cross-validation of the base model and match it with the original training dataset to generate a new training set. The meta-model’s training set merges these newly formed sets from various base models. Afterward, predictions from the testing sets of the base models are merged to create the testing set for the meta-model, which is then trained using this combined dataset.
Numerous studies have utilized stacking learning models in many real-world cases, implementing different base models and datasets. In the same sector of finance, Muslim et al. (2023) demonstrated the significant enhancements in predictive accuracy achievable through the innovative use of stacking ensemble learning in peer-to-peer (P2P) lending default prediction. Their novel approach, integrating three base learner algorithms, including KNN, SVM, and random forest with an XGBoost meta-learner, resulted in exceptional performance improvements. The study specifically highlighted the LGBFS-stacking XGBoost model, which achieved near-perfect accuracy rates of 99.982% on one dataset and 91.434% on another, far surpassing traditional models. The model showcases a robust methodology for tackling default risks in P2P lending.
Additionally, Kun et al. (Kun et al., 2020) utilized the stacking model to produce an ensemble four base classifiers to improve network loan discrimination. The findings indicated that the performance of the stacking model surpassed that of base classifiers overall. Compared with the top-performing base model, XGBoost, the stacking model improved accuracy by 0.41%, precision by 0.45%, recall by 0.28%, F1-score by 0.37%, and AUC by 0.42%. Lu et al. (2023) also explored applying a stacking ensemble model that integrates various machine learning techniques with an attention mechanism to predict daily runoff. This model outperformed traditional single models and simple ensemble methods substantially. Specifically, their results showed a 10.22% improvement in Nash–Sutcliffe efficiency and reductions in root mean square error and mean absolute error by 18.52% and 28.17%, respectively, compared with the best-performing single models. Using an attention mechanism in the meta-model layer effectively captures the strengths of base models (Random Forest, AdaBoost, and XGB), optimizing the prediction process.
Furthermore, Moshrefi et al. (2024) exemplify the outstanding utility of stacking ensemble learning models in industrial fault detection. Their research, which uses stacking ensemble models based on ultrasonic signals from contact sensors, demonstrates a novel approach to identifying faults in industrial machinery. The stacking model integrates multiple machine learning classifiers to predict different types of industrial faults, achieving an accuracy improvement of about 5% over traditional methods across various testing scenarios.
This study uses four different models for each regression and classification task to create a stacking ensemble. The selection of the base model and the meta-model plays a pivotal role. As previously stated, the chosen base models for regression are the SVM regressor, LGBM, elastic net, and CatBoost. On the other hand, the selected base models for classification include AdaBoost, CatBoost, SVM classification, and logistic regression.
The selection of machine learning models in this study was driven by their proven strengths in handling complex datasets and capturing intricate patterns often missed by traditional statistical techniques. Specifically, the SVM regressor and classifier models were chosen for their robustness in managing non-linear relationships and smaller sample sizes, while CatBoost was selected for its efficiency with categorical data and minimal hyperparameter tuning requirements. Elastic net was employed due to its effectiveness in addressing multicollinearity and performing well with sparse datasets. Additionally, an ensemble stacking approach was adopted to integrate multiple base models, leveraging their collective strengths to enhance predictive accuracy and reduce overfitting. These choices align directly with the study’s objectives of developing a reliable and interpretable credit scoring framework using LinkedIn data, ensuring both methodological rigor and real-world applicability.

3. Methodology

This study adopts a comprehensive approach to developing machine learning models for credit scoring, incorporating a multi-phase methodology to ensure robustness, accuracy, and real-world applicability. The process begins with engaging domain experts in creditworthiness decision-making, where their insights contribute to shaping critical aspects of the framework, including profile selection, feature weighting, and model validation, establishing a strong foundation for subsequent technical processes.
The data gathering phase focuses on collecting diverse datasets, encompassing demographic, personality, psycholinguistic, and social network information from various reliable sources. These data undergo a feature extraction phase, where key attributes influencing creditworthiness are identified and prioritized. Following this, the classifying and scoring phase segments the data into two distinct pathways: one for predicting categorical outcomes (classification) and another for predicting numerical scores (scoring). This segmentation enables tailored machine learning approaches optimized for both discrete and continuous predictions.
Before model training, the dataset undergoes exploratory data analysis (EDA) to uncover hidden patterns, correlations, and potential anomalies, providing valuable insights for the feature selection and engineering process. The data then move through a preprocessing stage, where they are cleaned, normalized, and transformed to ensure consistency, scalability, and compatibility with machine learning algorithms.
The workflow then branches into two distinct modeling pathways: classification models and scoring models. In the classification model stream, algorithms such as AdaBoost classifier, CatBoost classifier, support vector classifier, and logistic regression are employed to predict categorical creditworthiness outcomes. Meanwhile, the scoring model stream generates numerical credit scores using algorithms like support vector regressor, LGBM regressor, elastic net, and CatBoost regressor. Both pathways culminate in stacking classification and stacking regression meta-models, where outputs from the base models are aggregated to optimize predictive accuracy and robustness.
The final phase, model evaluation, rigorously assesses the integrated models using multiple performance metrics to validate their predictive accuracy, reliability, and generalizability. This evaluation ensures alignment with real-world scenarios and industry standards. The framework demonstrates a robust integration of classification and regression methodologies by combining domain expertise, thorough preprocessing, and advanced stacking techniques. It enhances predictive performance and offers a scalable, adaptable approach to assessing creditworthiness. The accompanying diagram in Figure 1 systematically illustrates the entire workflow, from data gathering to model evaluation, highlighting the interplay between domain insights, data processing, and advanced modeling techniques.

3.1. Creditworthiness Domain Expert Selection and Tasks

Engaging domain experts in creditworthiness decision-making is a critical step to ensure the robustness and real-world applicability of our machine learning-based credit scoring model. To involve these experts, we adopted a structured approach that began with identifying professionals from key sectors, including banking, fintech, regulatory bodies, academia, and the Society of Actuaries, Indonesia. Initial outreach was conducted through invitations and collaboration proposals, clearly outlining the experts’ scope, objectives, and expected contributions to our research. This was followed by tailored discussions to match their expertise with specific research tasks, including profile selection for machine learning models, weighting input features, and model validation against real-world scenarios. Through this approach, we established a collaborative framework where each expert’s role was well-defined, fostering a productive and goal-oriented partnership.
The profile selection phase is a crucial initial step wherein domain experts help define the ideal borrower profile that the machine learning model should prioritize. This process involves identifying relevant demographic, financial, psycholinguistic, and social network attributes that are strongly correlated with creditworthiness. Experts from banking institutions highlighted conventional factors, such as income stability and employment history, while fintech professionals emphasized alternative metrics derived from social media activity and behavioral indicators. By aligning expert insights with data availability, our model was designed to focus on key borrower profiles that represent both traditional and emerging patterns of financial reliability.
In the feature weighting stage, domain experts contributed to refining the relative importance assigned to different features selected in the previous phase. While machine learning algorithms like ensemble models and CatBoost automatically assign weights, expert oversight ensured these weights made contextual sense. For example, a credit risk analyst emphasized prioritizing stable employment and income history, while regulators ensured compliance with ethical standards and transparency requirements. Additionally, fintech experts highlighted psycholinguistic and behavioral traits from social media as emerging predictors of financial behavior. This collaboration ensured that the final feature weight distribution was statistically significant and aligned with industry practices and real-world expectations.
Finally, domain experts were actively involved in validating the proposed model’s accuracy and performance against real-world scenarios. Financial practitioners and fintech experts tested the model outputs against historical credit approval and rejection records, while regulators ensured adherence to compliance and ethical standards in decision-making. Academic experts provided additional statistical validation, ensuring the robustness of the evaluation metrics. This iterative validation process established feedback loops, enabling model refinements to address discrepancies between predictions and real-world outcomes. The information about who are the domain experts is shown in Table 1, while their tasks involving profile selection, feature weight, and model validation are shown in Table 2.

3.2. Selection Criteria and Data Gathering

Our study commenced the data gathering stage, utilizing TexAu, an advanced tool for web scraping, to harvest demographic and activity-related data from LinkedIn profiles. Following this automated collection, a manual review was conducted to assess the usability of the data. During this review, certain LinkedIn profiles were considered inadequate, and those containing only a name without a profile picture, posts, or biography were excluded. This process ensured that only complete and relevant data were retained for further analysis. The dataset consists of 1000 entries, considered by experts to be sufficiently diverse to represent borrower profiles. We categorized the collected data from LinkedIn into three main classes: Demographic, User Activity, and Social Network. The details of each class are presented in Table 3.

3.3. Features Extraction

During the feature extraction phase, we initiated the feature engineering process, transforming the collected raw data into analyzable formats that align with the theoretical principles outlined in Table 4. Regarding demographics, we calculated estimated ages by converting the dates of the most recent graduation and duration of work experience. We directly extracted the highest educational qualifications and the latest job titles to define an individual’s educational background and professional role. Moreover, we formulated an estimated salary metric, integrating factors such as age, educational attainment, and present employment into a comprehensive indicator. This integration was further refined by seeking expert consultation and using the relevant literature as a guideline to ensure the accuracy and relevance of the derived metrics (Persolkelly, 2024).
Linguistic post and comment data analysis revealed psychological and linguistic attributes crucial for credit scoring. Our psycholinguistic features focus on how linguistic elements and psychological aspects are interconnected, particularly four key features: analytical thinking, clout, authenticity, and emotional tone (Pennebaker et al., 2014; Pennebaker & Boyd, 2015; Tash et al., 2024). Each feature was scored on a scale from 0 to 100, where higher scores denote more favorable attributes. Analytical thinking reflects logical and structured reasoning, indicating a disciplined and methodical approach to communication. Higher scores in clout reveal confidence and leadership potential, attributes associated with reliability and decisiveness in financial behaviors. Authenticity, which assesses the sincerity and genuineness of interactions, suggests transparency and honesty in personal disclosures. Emotional tone captures the mood of the communication; a higher score indicates a positive emotional disposition, which can be crucial for stability and optimism in financial dealings (Pennebaker et al., 2014; Pennebaker & Boyd, 2015). This psycholinguistic feature is derived from post and comment data processed using the LIWC.
In addition, text data from posts and comments provide insights into an individual’s personality (Alamsyah et al., 2021; Kunte & Panicker, 2019). The Big Five personality attributes, such as openness, conscientiousness, extraversion, agreeableness, and neuroticism, carry considerable implications in financial research (Tovanich et al., 2021). Openness reflects a person’s readiness to embrace new experiences that potentially impact financial decision-making. Conscientiousness is linked to the ability to focus on organizing one’s task. Extraversion is associated with higher levels of spending behavior (Tovanich et al., 2021). Moreover, agreeableness indicates a tendency to maintain positive financial relationships, while neuroticism suggests difficulty handling financial stress due to emotional instability (Kunte & Panicker, 2019). Integrating the Big Five personality attributes into credit scoring models has the potential to improve predictions regarding loan default risks by providing deeper psychological insights into borrowers’ behaviors (Ramadhani et al., 2022). This personality feature is derived from post and comment data processed using the personality measurement platform (kepribadian.labscbd.id).
Additionally, analyzing social network features reveals the structural dynamics within user connection networks. These features shed light on an individual’s social position and their level of engagement. Social network analysis allows financial institutions to assess these individuals’ creditworthiness by analyzing their social networks instead of relying solely on traditional credit information. This is achieved by considering an individual’s social connections’ strength, size, and quality. Elevated values in metrics like degree, betweenness, and closeness centrality reflect strong social ties, significant influence, and a central role in their networks, which are indicative of proactive and impactful social interactions (Sinha, 2014; Tan et al., 2015; Zusrony et al., 2019). Modularity serves to underscore the structure of the network’s communities, with higher values denoting more distinct groupings. On the other hand, a distinct characteristic of network analysis is the preference for a lower Density (Sinha, 2014; Tan et al., 2015; Zusrony et al., 2019). A reduced density points to a broader and more varied network, suggesting an individual’s capacity to connect disparate social circles, thereby boosting their social credibility and perceived reliability (Sinha, 2014; Tan et al., 2015; Zusrony et al., 2019). These social network features are derived from the connection data of each LinkedIn profile and processed using the Gephi software version 0.10.

3.4. Classifying and Scoring

In the classifying and scoring phase, experts from diverse fields—as explained in Section 3.1—provide critical assistance and insights into the domain, utilizing their deep understanding and experience of credit risk assessment, feature engineering, regulatory compliance, and financial modeling. They draw upon a wealth of existing research to inform their judgments, ensuring each score reflects expert knowledge and historical data insights. Experts assign scores within a range of 1 to 100 in the “Score” feature, forming the continuous “Y” variable against the collected “X” variables. Following the assignment of scores, a categorical feature, “Class”, is introduced based on a threshold of 65, as determined by the experts. Scores of 65 or below are classified as “Bad”, while those above 65 are classified as “Good”. The dataset now includes two variables: “Score” for developing a scoring model and “Class” for a classification model, enabling the training of two distinct models.

3.5. Exploratory Data Analysis

In exploratory data analysis, we first engage with the dataset through descriptive statistics to summarize its key characteristics. After gaining insights from the statistical summary, we employ various visual tools to elucidate the features’ patterns, relationships, and significance. This segment is designed to uncover the underlying structure of the data, providing a comprehensive overview that informs the subsequent steps of our analysis.
Table 5 provides a statistical summary of key variables utilized in credit scoring analysis, including age, salary (IDR/month), network metrics, and various psycholinguistic traits, detailing their mean, standard deviation, and other descriptive metrics. IDR stands for Indonesian Rupiah, the currency used in Indonesia. Table 6 categorizes the dataset by education level, occupation, and credit status, offering counts for each category to illustrate the statistical summary of demographic and professional backgrounds in the data. These tables establish a foundation for understanding distribution, trends, and demographic composition.
Figure 2 portrays a correlation matrix heatmap for the df_classified_raw dataset, visually exploring the relationships between features. The heatmap’s color gradient, which ranges from blue to red, quantitatively signifies correlation strength; blue represents negative correlations, and red indicates positive correlations. Notably, “Salary” and “Creditworthy” share a coefficient of 0.3, the highest visible correlation on the map, suggesting a possible trend where creditworthiness increases with age. Another discernible insight is the moderate correlation between “Creditworthiness” and traits such as “Analytic” and “Authentic”, with coefficients of 0.20 and 0.22, respectively, which could imply a link between these psycholinguistic features and financial trustworthiness.
Figure 3 is a correlation matrix heatmap from the “df_scored_raw” dataset, providing a quantitative analysis of the interrelationships between variables. This matrix employs a color gradient from blue to red to signify the strength and direction of the correlations, with blue indicating negative and red positive correlations. The matrix indicates a moderate positive correlation between “Final_Score” and “salary,” as shown by the coefficient of 0.42, which may suggest a trend where an individual’s score increases with salary. Additionally, “Final_Score” appears to have weaker positive correlations, with features such as “Age”, “Betweenness Centrality”, and “Density” having coefficients of 0.24, 0.29, and 0.19, respectively, implying that these factors could also have an impact on the final scoring outcome. Notably, the matrix shows negative correlations between “Final_Score” and “Modularity” at −0.21, indicating that certain features might inversely affect the final score. These insights highlight the complex relationships between variables in determining the final credit score.
Figure 4 displays a bar chart depicting the mean “Final Score” by education levels in the df_scored_raw dataset. It shows a clear trend where higher education correlates with higher scores; individuals with a “Doctor” degree score the highest average of 66.18, while “Master’s” degree holders follow with an average of 65.27. This descending pattern persists through to “Diploma” and “Bachelor” degrees, down to “High School” graduates, who record the lowest average score of 58.84. Error bars suggest variability within the educational categories, which are particularly notable among high school graduates. This could indicate a broader range of financial behaviors or creditworthiness within this group. This pattern underscores the dataset’s potential link between educational attainment and credit-scoring outcomes.
Figure 5 compares the mean “Final Score” across different occupational categories within the df_scored_raw dataset. This bar chart demonstrates that individuals in “White Collar” positions have the highest mean final score at 65.17, marginally outperforming those listed as “Public Officers.” There appears to be a general trend whereby those in professional and office-based roles score higher, on average, than those in the manual employment or non-employed categories. Interestingly, “Freelance” workers hold a middle ground with a score of 61.00, suggesting a diverse range of outcomes within this group. “Blue Collar” workers and individuals classified as “Other” or “No job” have progressively lower mean scores, with the unemployed showing the lowest average score at 54.70. The error bars indicate the variation within each group, with “No job” showing a particularly wide range, possibly reflecting a heterogeneous set of circumstances influencing the scores of the unemployed.
Figure 6 displays a pie chart representing the distribution of “Creditworthiness” within the df_classified_raw dataset. The chart divides the population into two categories: those deemed to have “Good” creditworthiness and those considered “Bad.” A substantial majority, i.e., 63% (629 individuals), fall into the “Bad” category, while the remaining 37% (370 individuals) are categorized as having “Good” creditworthiness. This significant disparity suggests that within this dataset, most individuals might face challenges in being deemed creditworthy according to the criteria used. The numerical counts provide a clear sense of scale to the proportions, reinforcing the visual impression given by the pie chart. This visualization is crucial for financial institutions or lenders using the dataset to understand the credit landscape of their clientele. It could have implications for decision-making in credit risk assessment and policy formulation.

3.6. Data Preprocessing

The preprocessing stage begins with a critical review of the dataset, identifying and removing features that do not contribute to the predictive model’s efficacy. A notable action in this phase is excluding the “Full Name” column. This decision is predicated on the understanding that personal names, while unique, do not offer predictive value for the model’s objectives and could potentially introduce bias. Removing this column ensures that the model focuses on features with genuine predictive significance, enhancing its ability to discern patterns relevant to the study’s goals.
After the initial cleaning, the dataset is categorized into numerical and categorical features to facilitate targeted preprocessing strategies. Numerical features undergo standardization to ensure uniformity in scale, an essential process for models sensitive to the magnitude of inputs. This step normalizes the data, allowing each feature to contribute equally to the model’s learning process without undue influence from outliers or variable scales.
Categorical features are transformed using one-hot encoding, which converts these variables into a binary matrix. This method is essential for incorporating categorical data into machine learning algorithms as it allows for the representation of categorical information without implying any ordinal relationship. This transformation is crucial for preserving the integrity of categorical variables, ensuring they contribute appropriately to the predictive models.

3.7. Model Development

The development of predictive models in this study revolves around two primary methodologies: classification and scoring. This dual approach allows for a comprehensive analysis of the model we developed, addressing different aspects of the data and ensuring robustness in predictive accuracy. Model development was carried out systematically, beginning with data preprocessing to provide optimal input quality for the modeling process. Following this, various machine learning algorithms were applied to construct models capable of handling both regression and classification tasks.

3.7.1. Classification Model

The classification model section details the application of classification algorithms designed to categorize each observation into discrete classes, specifically predicting “Good” or “Bad” outcomes. These models are essential for decision-making processes where binary classifications are required, such as assessing loan approval or determining product quality. This section discusses the deployment and evaluation of various advanced classification models used to predict these outcomes based on a transformed training dataset. The section includes AdaBoost, CatBoost, SVM classifier, and logistic regression.
The development began with configuring an AdaBoost classifier with 100 estimators and a fixed random state of 42 to ensure consistency in results across runs. AdaBoost is recognized for its ability to improve the performance of classification algorithms by combining multiple weak learners to form a robust classifier. Following AdaBoost, a CatBoost classifier was initiated with 250 iterations, a learning rate of 0.06, a depth of 6, and with verbosity set to 0, as well as employing a random state of 42. CatBoost is particularly notable for its ability to handle categorical data efficiently and is widely used for its robustness and speed. The support vector machine (SVM) was configured with an RBF kernel with a regularization parameter C set to 10 and gamma set to “auto”. The logistic regression model was then employed and adjusted to a high iteration limit of 1000 to ensure convergence even with potentially complex or large datasets.
These models were organized into a dictionary to facilitate systematic evaluation. Each model was trained using the transformed training data and subsequently assessed using transformed test data. The primary metric for evaluation was accuracy, which measures the proportion of correctly predicted instances. Upon determining the performance of individual classifiers, the model with the highest accuracy was selected as the meta-model for an ensemble approach. In this case, further exploration was conducted using a stacking classifier. This ensemble method leverages a meta-classifier to integrate predictions from multiple base classifiers, potentially enhancing predictive performance by capturing different aspects of the data patterns. The base classifiers configured for the stacking included AdaBoost, CatBoost, SVM, and logistic regression, with an AdaBoost classifier again chosen as the meta-model due to its robust performance in preliminary tests.

3.7.2. Scoring Model

The scoring model section explores the development of regression models designed to assign continuous scores, ranging from 1 to 100, to each observation. These scores are typically intended to quantify the likelihood or propensity of specific outcomes, such as creditworthiness or risk level, which are crucial for informed decision-making in various sectors.
Initially, a CatBoost regressor was configured with 200 iterations, a learning rate of 0.06, a tree depth of 6, and a random seed 42 to ensure reproducibility. This gradient-boosting model is recognized for its high performance with categorical and numerical data and was employed without verbose output to streamline the training process. Following the setup, the model was trained using the fit method on the transformed training data. Next, a LightGBM regressor was employed. LightGBM is well-regarded for its speed and efficiency in handling large datasets, and it was utilized in its default configuration to provide a baseline performance metric. The SVM regressor was initialized with an “rbf” kernel for non-linear patterns. The regularization strength, C, was set to 20, and the gamma parameter was configured to “scale”, enabling automatic complexity adjustment relative to the data features. An elastic net model was then instantiated with an alpha value of 0.01 and an l1_ratio of 0.5, combining L1 and L2 regularization methods to benefit from both ridge and lasso regression’s properties.
These base models were then stored in a dictionary for subsequent evaluation. Each model’s predictive performance was assessed on a transformed test dataset, with the R-squared statistic serving as the performance metric. This involved calculating the R-squared values for predictions against the actual test data, which provides a measure of the variance explained by the model. The R-squared values were then sorted to determine the best-performing model. It was found that the SVM model exhibited the highest R-squared value, suggesting its superior performance in this specific setting. Consequently, the SVM was selected as the meta-model in the ensemble configuration.
In a further exploration of ensemble methods, a stacking regressor was defined. This ensemble model uses a meta-regressor approach, where the base regressors include the SVM regressor, LightGBM, elastic net, and CatBoost models, and an SVM regressor was used as the final estimator due to its top performance in the initial evaluations.

3.8. Model Evaluation and Expert Validation

The evaluation of predictive models forms a critical aspect of this study, ensuring that the developed models perform well on training data and generalize effectively to unseen data. Reflecting on the versatility and utility of cross-validation, cross-validation is a technique based on data splitting that makes predictive assessments of statistical models. Although the specific goal of statistical analysis, such as hypothesis testing or prediction, can constrain the set of models under consideration, predictive assessment is a broadly applicable and objective basis for both model comparison and selection across a range of modeling goals (Yates et al., 2023). This assertion underscores the rigorous procedures and metrics that are subsequently described and used to assess the performance of both regression and classification models developed in the preceding sections.

3.8.1. Classification Model Evaluation

For the classification model, the AdaBoost classifier, the CatBoost classifier, SVM, and logistic regression were assessed and computed to evaluate performance, including accuracy, precision, recall, F1-score, and AUC ROC. The use of a stacking classifier, which integrated inputs from the base classifiers, with an AdaBoost classifier as the meta-model, underwent rigorous cross-validation to ensure robust classification performance (Bowers & Zhou, 2019; Grandini et al., 2020).
  • Accuracy
    Accuracy measures the overall correctness of the model by calculating the ratio of true predictions (both true positives and true negatives) to the total number of cases examined (Tatachar, 2021).
    A c c u r a c y = T P + T N T P + T N + F P + F N
    • TP: True positive;
    • TN: True negative;
    • FP: False positive;
    • FN: False negative.
  • Precision
    Precision is the ratio of the correctly predicted positive observations to the total predicted positives. It measures the accuracy of positive predictions (Tatachar, 2021).
    P r e c i s i o n = T P T P + F P
    • TP: True positive;
    • FP: False positive.
  • Recall
    Recall is the ratio of the correctly predicted positive observations to all observations in actual class, i.e., “yes”. It measures the ability of a model to find all the relevant cases within a dataset (Tatachar, 2021).
    R e c a l l = T P T P + F N
    • TP: True positive;
    • FN: False negative.
  • F1-Score
    The F1-score is the weighted average of precision and recall. Therefore, this score takes both false positives and false negatives into account. It is particularly useful when the classes are imbalanced (Tatachar, 2021).
    F 1 S c o r e = 2 × P r e c i s s i o n × R e c a l l P r e c i s s i o n + R e c a l l
  • ROC and AUC
    The ROC curve is a graphical representation used to show the diagnostic ability of a binary classifier system as its discrimination threshold is varied. It plots the true positive rate (TPR, sensitivity) against the false positive rate (FPR, 1-specificity) at various threshold settings. The ROC curve is essential for visualizing the trade-offs between sensitivity and specificity in different threshold settings. ROC curve analysis is a staple in both medical diagnostic tests and machine learning model evaluation. AUC measures the area underneath the ROC curve and provides an aggregate performance measure across all possible classification thresholds. It simplifies classifier performance evaluation into a single value, which is particularly useful for comparing different models. An AUC of 0.5 suggests no discriminative ability, akin to random guessing, whereas an AUC of 1.0 represents perfect classification. The AUC is a widely accepted measure for evaluating classifiers, especially in contexts where the classes are imbalanced.

3.8.2. Scoring Model Evaluation

The scoring models evaluated were CatBoost regressor, LightGBM regressor, SVM regressor, and elastic net, each subjected to 10-fold cross-validation via the KFold method for thorough evaluation. Additionally, a stacking regressor was employed, integrating predictions from these models using an SVR as the meta-regressor, aiming to enhance predictive performance by leveraging the strengths of each individual model (Tatachar, 2021). This metric indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. It measures how well-observed outcomes are replicated by the model based on the proportion of total variation of outcomes explained by the model.
  • R-squared (R2)
    This metric indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. It measures how well-observed outcomes are replicated by the model based on the proportion of total variation of outcomes explained by the model (Tatachar, 2021).
R 2 = 1 S S R T S S
  • SSR: The sum of squares of residuals represents the sum of the squares of the residuals, which are the differences between the observed and predicted values.
  • TSS: The total sum of squares represents the total variance in the observed data. It is calculated as the sum of the squares of the differences between the observed values and their mean (the average of the observed values).
2.
Mean Squared Error (MSE)
MSE measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function corresponding to the expected value of the squared error loss (Tatachar, 2021).
M S E = 1 n i = 1 n y i ȳ i 2
n: Number of predictions;
yi: Observed values;
ȳi: Predicted values.
3.
Root Mean Squared Error (RMSE)
RMSE is the square root of the mean of the square of all errors. It measures the magnitude of deviation between predictions and the actual observations, providing a sense of the average distance from the predicted values. (Tatachar, 2021).
R M S E = 1 n i = 1 n y i ȳ i 2
n: Number of predictions;
yi: Observed values;
ȳi: Predicted value.

3.8.3. Experts Validation

In the final phase, experts were engaged to validate the model in real-world scenarios. Each expert independently assessed and assigned scores based on a predefined set of criteria derived from demographic, psycholinguistic, personality, and social network features. This ensured that the scoring process adhered to a consistent evaluation framework.
To evaluate the quality and consistency of expert assessments, we calculated key statistical metrics, including the mean, standard deviation, and inter-rater reliability (intraclass correlation coefficient, ICC) (Koo & Li, 2016; Liljequist et al., 2019). These metrics provided insights into the level of agreement among experts and the stability of their scoring patterns.
When scoring discrepancies emerged, they were resolved through structured discussions and consensus meetings, fostering alignment with the evaluation framework. By presenting these statistical measures and validation processes, we ensure transparency and methodological rigor in the expert scoring process, reinforcing the reliability of the assigned scores for subsequent machine learning model development.

4. Result

4.1. Classification Model Result

We assessed five predictive models in our classification model evaluation: AdaBoost, CatBoost, SVM, logistic regression, and a stacking classifier. The stacking classifier emerged as notably superior, achieving the highest levels of accuracy at 0.950. Precision was recorded at 0.947, recall at 0.947, and an F1-score of 0.947, with an impressive AUC ROC of 0.985. These metrics demonstrate the model’s exceptional ability to discriminate between classes, a crucial attribute in complex classification scenarios where reliability and accuracy are paramount. The strength of the stacking classifier lies in its integration of multiple underlying models, which collectively enhance its predictive power and robustness, making it an ideal choice for scenarios requiring nuanced class distinction.
Comparatively, logistic regression also displayed strong performance metrics, illustrating a well-balanced profile with an accuracy of 0.884, a precision score of 0.877, and a recall rate of 0.872. Its F1-score and AUC ROC stood at 0.874 and 0.948, respectively, underscoring its effective handling of class imbalances and reliability in prediction. SVM followed with respectable figures, posting an accuracy of 0.865 and precision and recall rates of 0.856 and 0.852, respectively. The SVM’s F1-score of 0.854 and AUC ROC of 0.938 further confirm its robust classification capabilities, particularly effective in environments where class distinctions are not starkly defined.
Meanwhile, CatBoost displayed a commendable mix of precision at 0.864 and recall at 0.815, with an overall accuracy of 0.851. Its F1-score of 0.830 and AUC ROC of 0.949 mark it as a reliable model, though slightly less consistent in identifying positive cases than its counterparts. This slight variability in performance highlights the importance of choosing a suitable model based on specific use-case requirements. The collective performance of these models underscores the advancement in classification technologies, offering robust solutions for varied applications in data analysis. Each model’s unique strengths and occasional limitations provide valuable insights for future improvements and application-specific tuning, ensuring that a model’s selection aligns perfectly with users’ operational demands and objectives. The classification model performance is shown in Table 7. We apply the classification model to predict creditworthiness decisions to the random data. The sample classification model results are shown in Table 8.

4.2. Scoring Model Result

Our study assessed five predictive models, namely, CatBoost, SVM, elastic net, stacking regressor, and LightGBM, using the following key regression metrics: R2, MSE, and RMSE. The CatBoost model demonstrated superior capability in fitting the data, with an R2 value of 0.9072, indicating that it explains approximately 90.72% of the variance in the target variable. This high degree of model fit is complemented by a relatively low MSE (4.3168) and RMSE (2.0748), underscoring CatBoost’s effectiveness in making accurate predictions with minimal error variance. Conversely, LightGBM, while still showing a good fit, with an R2 of 0.8011, had higher error rates, as evidenced by its MSE of 9.2424 and RMSE of 3.0381, suggesting less consistency in prediction accuracy compared to CatBoost.
Another model evaluated, SVM, posted an R2 of 0.8419, positioning it as a robust model. However, compared to CatBoost, it had a higher predictive error (MSE of 7.3189 and RMSE of 2.7037). Elastic net, with an R2 of 0.8475, displayed slightly better data fitting capabilities than SVM and had moderately low error metrics (MSE of 7.0115 and RMSE of 2.64), indicating reliable predictive performance.
However, the stacking regressor emerged as the standout model in our analysis. It recorded the highest R2 value of 0.9298, demonstrating an exceptional ability to capture variance within the dataset and achieving the lowest MSE (3.2388) and RMSE (1.7962) scores. This performance indicates that the stacking regressor, by integrating multiple models, can significantly enhance prediction accuracy and reduce error variance, making it the most precise model evaluated in this study. The scoring model performance is shown in Table 9. We apply the scoring model to predict creditworthiness scores to the random data. The sample scoring model results are shown in Table 10.

4.3. Validation Results

Experts scored the performance of creditworthiness classification and scoring models against real-world scenarios. We then tested these scores to ensure the reliability and consistency of the expert scoring process. We conducted a thorough validation phase involving statistical analysis and structured discussions. This sub-section presents the results of the expert scoring process, including key descriptive statistics, inter-rater reliability (ICC) scores, and insights derived from expert consensus meetings (Koo & Li, 2016; Liljequist et al., 2019).
Table 11 summarizes the descriptive statistics of the expert scores, including the mean, standard deviation, minimum, and maximum values. These metrics provide an overview of the score distribution across experts. The results indicate a mean score of 72.8, with a standard deviation of 6.4, suggesting moderate variation in expert assessments. While most scores align closely, minor discrepancies were observed in specific instances. The result of the intraclass correlation coefficient (ICC) calculation, undertaken to evaluate the consistency of expert assessments, is also shown in Table 11. This statistical measure assesses the degree of agreement among experts when scoring the same set of profiles. The default settings are as follows: ICC type—two-way random effects model, average measures (ICC(2,k)); ICC value—0.84; interpretation—good reliability (0.75–0.90 threshold). The ICC value of 0.84 indicates good reliability, demonstrating substantial agreement among the five experts. This suggests that the scoring process was consistent across evaluators, enhancing the credibility of the results.

5. Discussion

A machine learning-based credit-scoring model utilizing social media data has gained prominence recently. This study highlights the innovative application of LinkedIn data for credit scoring, presenting a unique approach compared to previous studies that often mix traditional and alternative data sources. Niu et al. demonstrated the utility of social network information from mobile phones to significantly improve the predictive power of credit scoring models using machine learning techniques such as random forest and AdaBoost (Niu et al., 2019). Unlike Niu et al., our study exclusively harnesses professional social network data from LinkedIn, which enriches the credit scoring data pool and ensures the data’s reliability due to LinkedIn’s professional context. Orlova proposed a methodology utilizing digital footprint data and machine learning methods to manage individuals’ creditworthiness. Orlova’s approach, which blends traditional financial indicators with digital footprints, mirrors our method in its hybrid data utilization (Orlova, 2021). However, our model distinguishes itself by deploying a regression alongside classification, enhancing the detailed assessment of creditworthiness rather than a binary classification of credit risk. Muñoz-Cancino et al. discussed the dynamics of credit history and social interaction features in credit scoring, emphasizing enhanced performance through integrating social data (Muñoz-Cancino et al., 2023). Our approach aligns with this by using social interactions to refine the credit scoring process. Still, it advances further by implementing a stacking ensemble method, significantly strengthening the predictive accuracy by leveraging multiple model insights.
The primary contributions of this research to the domain of credit scoring are threefold:
  • First, innovative use of social media data, such as LinkedIn data, capitalizes on a relatively untapped yet rich source of reliable, professional, and social interaction data that indicates an individual’s economic position and professional stability.
  • Second, integrating classification and regression models in our analysis allows for a nuanced understanding of creditworthiness. This dual approach provides a more detailed assessment that can cater to different financial products and services, which are not commonly addressed in traditional models focusing solely on classification.
  • Third, applying a stacking ensemble method distinguishes our model by enhancing its robustness. This method utilizes the predictive strengths of individual models and combines them to create a more accurate and reliable prediction tool.

6. Research Limitation

Our research has inherent limitations that are crucial for understanding the scope and applicability of the findings while also opening avenues for further exploration and enhancement. One primary limitation involves ethical concerns and privacy implications tied to the use of personal data from social media. Although this study complies with current data protection regulations, the rapidly evolving legal landscape could present challenges. Additionally, there is a potential risk of bias and discrimination if not all demographic groups are equally represented online or if the algorithms inadvertently learn prejudicial patterns from the data.
Furthermore, the models developed and their findings are based on data from specific social media platforms, primarily LinkedIn. This specificity raises questions about the generalizability of the results to other platforms and the broader population, especially in regions with differing social media usage patterns or where LinkedIn is less prevalent. The accuracy of the credit scoring models also heavily depends on the algorithms and functionalities of the social media platforms from which the data is sourced. Any changes in these platforms’ data policies or algorithms could affect the consistency and reliability of the data, thereby impacting the models’ effectiveness.

7. Conclusions

This study represents a significant stride toward integrating social media analytics within credit scoring systems, challenging traditional credit evaluation methods and fostering financial inclusiveness. Our research addresses an important gap in financial technology by leveraging LinkedIn data to assess the creditworthiness of individuals without traditional financial histories. It proposes a model that could revolutionize credit assessments. The practical implications of this research are substantial, offering financial institutions a tool with which to potentially reduce loan defaults and expand credit access to broader demographics. The nuanced analysis of demographic, psycholinguistic, personality, and social network data enables lenders to make more informed decisions, reducing reliance on outdated credit scoring systems that often exclude young adults, immigrants, and lower-income individuals.
From a theoretical perspective, this study enriches the dialogue on the efficacy of alternative data in financial assessments, challenging the established paradigms of credit scoring. It posits that non-traditional data sources can be predictive, setting new benchmarks for reliable data in creditworthiness evaluations and inspiring further research into alternative data applications in financial risk assessment. Critically, this study’s dependence on the stability and transparency of social media platforms introduces unpredictability that could affect the long-term reliability of the developed scoring models. Ethical considerations regarding data privacy and potential misuse are significant, calling for strict regulatory oversight to ensure that financial inclusiveness does not compromise individual privacy rights.
An aspect that might draw criticism is the potential societal impact of using social media data for financial assessment. There is a risk that this approach could lead individuals to manipulate their online presence to enhance their creditworthiness, blurring the line between innovative data use and invasive surveillance. This could foster a culture of data manipulation among users, undermining the authenticity of social media as a space for personal expression. In conclusion, while this innovative approach to credit scoring using social media analytics presents transformative potential in financial technology, it also brings forth challenges and ethical concerns that require careful consideration to ensure that this approach benefits the financial ecosystem without infringing on individual rights or societal norms.

Author Contributions

Conceptualization, A.A.; methodology, A.A., A.A.H. and A.D.M.; software, A.A.H. and A.D.M.; validation, A.A.; formal analysis, A.A., A.A.H., and A.D.M.; investigation, A.A.H. and A.D.M.; resources, A.A., A.A.H., and A.D.M.; data curation, A.A., A.A.H., and A.D.M.; writing—original draft preparation, A.A.H. and A.D.M.; writing—review and editing, A.A.; visualization, A.A., A.A.H., and A.D.M.; supervision, A.A.; project administration, A.D.M.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The research was not funded by any external sources.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study utilized publicly available data from social media platforms in accordance with the terms of service of each platform. No personal identifiers were collected or stored beyond what was publicly accessible. The data were anonymized to ensure the privacy of individual users, and any potentially identifying information was removed before analysis. The analysis was conducted on aggregated data, ensuring that individual users could not be identified in any publications or reports. All data are stored in a secure environment and will be deleted two years after the publication of the study findings.

Conflicts of Interest

The authors declare no conflicts of interest related to this publication.

References

  1. Ahelegbey, D. F., & Giudici, P. (2023). Credit scoring for peer-to-peer lending. Risks, 11(7), 123. [Google Scholar] [CrossRef]
  2. Alamsyah, A., Bratawisnu, M. K., & Sanjani, P. H. (2018, May 3–4). Finding pattern in dynamic network analysis. 2018 6th International Conference on Information and Communication Technology (ICoICT) (pp. 141–146), Bandung, Indonesia. [Google Scholar] [CrossRef]
  3. Alamsyah, A., Dudija, N., & Widiyanesti, S. (2021). New approach of measuring human personality traits using ontology-based model from social media data. Information, 12(10), 413. [Google Scholar] [CrossRef]
  4. Annur, C. M. (2023, October 17). Penyaluran pinjaman online meningkat pada agustus 2023. Databoks. Available online: https://databoks.katadata.co.id/datapublish/2023/10/17/penyaluran-pinjaman-online-meningkat-pada-agustus-2023 (accessed on 1 September 2023).
  5. Arya, V., Sethi, D., & Paul, J. (2019). Does digital footprint act as a digital asset?—Enhancing brand experience through remarketing. International Journal of Information Management, 49, 142–156. [Google Scholar] [CrossRef]
  6. Azeez, O., Pradhan, B., & Shafri, H. (2018). Vehicular CO emission prediction using support vector regression model and GIS. Sustainability, 10(10), 3434. [Google Scholar] [CrossRef]
  7. Bartov, E., Faurel, L., & Mohanram, P. (2023). The role of social media in the corporate bond market: Evidence from Twitter. Management Science, 69(9), 5638–5667. [Google Scholar] [CrossRef]
  8. Beskopylny, A. N., Stel’makh, S. A., Shcherban’, E. M., Mailyan, L. R., Meskhi, B., Razveeva, I., Chernil’nik, A., & Beskopylny, N. (2022). Concrete strength prediction using machine learning methods catboost, k-nearest neighbors, support vector regression. Applied Sciences, 12(21), 10864. [Google Scholar] [CrossRef]
  9. Bo Wen, C., Chiun Chieh, H., & Mei Hung, H. (2013). Enhancing credit scoring model performance by a hybrid scoring matrix. African Journal of Business Management, 7(18), 1791–1805. [Google Scholar] [CrossRef]
  10. Bowers, A. J., & Zhou, X. (2019). Receiver operating characteristic (ROC) area under the curve (AUC): A diagnostic measure for evaluating the accuracy of predictors of education outcomes. Journal of Education for Students Placed at Risk, 24(1), 20–46. [Google Scholar] [CrossRef]
  11. Bradbury, D. (2011). Data mining with LinkedIn. Computer Fraud and Security, 2011(10), 5–8. [Google Scholar] [CrossRef]
  12. Carroll, P., & Rehmani, S. (2017). Alternative data and the unbanked. Available online: https://www.oliverwyman.com/our-expertise/insights/2017/may/alternative-data-and-the-unbanked.html (accessed on 10 August 2023).
  13. Chen, Y.-J., & Chen, Y.-M. (2022). Forecasting corporate credit ratings using big data from social media. Expert Systems with Applications, 207, 118042. [Google Scholar] [CrossRef]
  14. Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society Series B: Statistical Methodology, 20(2), 215–232. [Google Scholar] [CrossRef]
  15. Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing Journal, 91, 106263. [Google Scholar] [CrossRef]
  16. Djeundje, V. B., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Systems with Applications, 163, 113766. [Google Scholar] [CrossRef]
  17. Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. Available online: http://arxiv.org/abs/1810.11363 (accessed on 27 September 2023).
  18. Fahner, G. (2019). FICO score research: Explainable AI for credit scoring. Available online: https://www.fico.com/blogs/fico-score-research-explainable-ai-credit-scoring (accessed on 13 December 2023).
  19. Faturohman, T., Wiryono, S. K., Khilfah, H. L. N., Andri, A., Hamzah, M. A., Saputra, O., & Indrayana, G. G. (2024). Peer-to-peer lending default prediction model: A credit scoring application with social media data. International Journal of Monetary Economics and Finance, 17(2/3), 189–200. [Google Scholar] [CrossRef]
  20. Fernandez Vidal, M., & Barbon, F. (2019). Credit scoring in financial inclusion: How to use advanced analytics to build credit-scoring models that increase access. The Consultative Group to Assist the Poor (CGAP). [Google Scholar]
  21. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. [Google Scholar] [CrossRef]
  22. Fry, R. (2013). Young adults after the recession fewer homes, fewer cars, less debt. Pew Research Center. Available online: https://www.pewresearch.org/social-trends/2013/02/21/young-adults-after-the-recession-fewer-homes-fewer-cars-less-debt/ (accessed on 25 November 2023).
  23. Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: An overview. Available online: http://arxiv.org/abs/2008.05756 (accessed on 18 January 2024).
  24. Guo, G., Zhu, F., Chen, E., Liu, Q., Wu, L., & Guan, C. (2016). From footprint to evidence: An exploratory study of mining social data for credit scoring. ACM Transactions on the Web, 10(4), 22. [Google Scholar] [CrossRef]
  25. Gupta, D., Pratama, M., Ma, Z., Li, J., & Prasad, M. (2019). Financial time series forecasting using twin support vector regression. Plos ONE, 14(3), e0211402. [Google Scholar] [CrossRef]
  26. Gül, S., Kabak, Ö., & Topcu, I. (2018). A multiple criteria credit rating approach utilizing social media data. Data & Knowledge Engineering, 116, 80–99. [Google Scholar] [CrossRef]
  27. Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 94. [Google Scholar] [CrossRef] [PubMed]
  28. Jagtiani, J., & Lemieux, C. (2019). The roles of alternative data and machine learning in fintech lending: Evidence from the LendingClub consumer platform. Financial Management, 48(4), 1009–1029. [Google Scholar] [CrossRef]
  29. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NIPS 2017). Available online: https://papers.nips.cc/paper/2017 (accessed on 28 January 2025).
  30. Kecman, T.-M. H. V., & Kopriva, I. (2006). Kernel based algorithms for mining huge data sets (Vol. 17). Springer. [Google Scholar] [CrossRef]
  31. Kent, A. H., Emmons, W. R., & Ricketts, L. (2019). Are millennials a lost generation financially? Available online: https://www.stlouisfed.org/on-the-economy/2019/december/millennials-lost-generation-financially (accessed on 13 July 2023).
  32. Khyani, D., Jakkula, S., Gowda C, S. N., J, A. K., & R, S. K. (2021). An interpretation of stacking and blending approach in machine learning. International Research Journal of Engineering and Technology, 8(7), 3117–3120. [Google Scholar]
  33. Kim, J. (2011). How modern banking originated: The London goldsmith-bankers’ institutionalisation of trust. Business History, 53(6), 939–959. [Google Scholar] [CrossRef]
  34. Knutson, M. L. (2020). Credit scoring approaches guidelines. Available online: https://thedocs.worldbank.org/en/doc/935891585869698451-0130022020/original/CREDITSCORINGAPPROACHESGUIDELINESFINALWEB.pdf (accessed on 10 August 2023).
  35. Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. [Google Scholar] [CrossRef] [PubMed]
  36. Kunte, A. V., & Panicker, S. (2019, November 21–22). Using textual data for personality prediction:A machine learning approach. 2019 4th International Conference on Information Systems and Computer Networks (ISCON) (pp. 529–533), Mathura, India. [Google Scholar] [CrossRef]
  37. Kun, Z., Weibing, F., & Jianlin, W. (2020, November 20–22). Default identification of p2p lending based on stacking ensemble learning. Proceedings—2020 2nd International Conference on Economic Management and Model Engineering, ICEMME 2020 (pp. 992–1006), Chongqing, China. [Google Scholar] [CrossRef]
  38. Li, J., Wang, X., Yang, X., Zhang, Q., & Pan, H. (2023). Analyzing freeway safety influencing factors using the catboost model and interpretable machine-learning framework, SHAP. Transportation Research Record: Journal of the Transportation Research Board, 2678(7), 563–574. [Google Scholar] [CrossRef]
  39. Liljequist, D., Elfving, B., & Skavberg Roaldsen, K. (2019). Intraclass correlation—A discussion and demonstration of basic features. Plos One, 14(7), e0219854. [Google Scholar] [CrossRef]
  40. Liu, W., & Li, Q. (2017). An Efficient Elastic Net with regression coefficients method for variable selection of spectrum data. Plos One, 12(2), e0171122. [Google Scholar] [CrossRef] [PubMed]
  41. Lu, M., Hou, Q., Qin, S., Zhou, L., Hua, D., Wang, X., & Cheng, L. (2023). A stacking ensemble model of various machine learning models for daily runoff forecasting. Water, 15(7), 1265. [Google Scholar] [CrossRef]
  42. Mokheleli, T., & Museba, T. (2023). Machine learning approach for credit score predictions. Journal of Information Systems and Informatics, 5(2), 497–517. [Google Scholar] [CrossRef]
  43. Moshrefi, A., Tawfik, H. H., Elsayed, M. Y., & Nabki, F. (2024). Industrial fault detection employing meta ensemble model based on contact sensor ultrasonic signal. Sensors, 24(7), 2297. [Google Scholar] [CrossRef]
  44. Muñoz-Cancino, R., Bravo, C., Ríos, S. A., & Graña, M. (2023). On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance. Expert Systems with Applications, 218, 119599. [Google Scholar] [CrossRef]
  45. Muslim, M. A., Nikmah, T. L., Pertiwi, D. A. A., Subhan, J., Dasril, Y., & Iswanto. (2023). New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning. Intelligent Systems with Applications, 18, 200204. [Google Scholar] [CrossRef]
  46. Niu, B., Ren, J., & Li, X. (2019). Credit scoring using machine learning by combing social network information: Evidence from peer-to-peer lending. Information, 10(12), 397. [Google Scholar] [CrossRef]
  47. Orlova, E. V. (2021). Methodology and models for individuals’ creditworthiness management using digital footprint data and machine learning methods. Mathematics, 9(15), 1820. [Google Scholar] [CrossRef]
  48. Óskarsdóttir, M., Bravo, C., Sarraute, C., Vanthienen, J., & Baesens, B. (2019). The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing, 74, 26–39. [Google Scholar] [CrossRef]
  49. Park, H.-A. (2013). An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain. Journal of Korean Academy of Nursing, 43(2), 154. [Google Scholar] [CrossRef]
  50. Parker, K., & Igielnik, R. 2020 May 14. On the cusp of adulthood and facing an uncertain future: What we know about gen z so far. Pew Research Center. [Google Scholar]
  51. Pennebaker, J. W., & Boyd, R. L. (2015). Linguistic inquiry and word count: LIWC2015. Available online: www.LIWC.net/dictionaries (accessed on 10 July 2023).
  52. Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. Plos ONE, 9(12), e115844. [Google Scholar] [CrossRef] [PubMed]
  53. Persolkelly. (2024). Indonesia Salary Guide 2024. Persolkelly. [Google Scholar]
  54. Puteri Ramadhani, D., Mentari Wijaya, P., & Alamsyah, A. (2022, July 26–28). Credit scoring model construction based on LinkedIn social media data. 5th European International Conference on Industrial Engineering and Operations Management, Rome, Italy. [Google Scholar] [CrossRef]
  55. Ramadhani, D. P., Ekaputri, S. A., & Alamsyah, A. (2022, October 1–3). Modeling person’s creditworthiness over their demography and personality appearance in social media. 2022 7th International Workshop on Big Data and Information Security (IWBIS) (pp. 13–18), Depok, Indonesia. [Google Scholar] [CrossRef]
  56. Royal Bank of Canada. (2024). The role of credit in wealth creation. Royal Bank of Canada. Available online: https://www.rbcwealthmanagement.com/en-ca/insights/the-role-of-credit-in-wealth-creation (accessed on 24 February 2024).
  57. Schapire, R. E. (2013). Explaining adaboost. In Empirical Inference (pp. 37–52). Springer. [Google Scholar] [CrossRef]
  58. Sheridan, R. P., Liaw, A., & Tudor, M. (2021). Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships. arXiv. [Google Scholar] [CrossRef]
  59. Sinha, S. (2014). The importance of community: How modular organization of social networks affects their collective dynamics. Studies in Microeconomics, 2(1), 49–61. [Google Scholar] [CrossRef]
  60. Statistics Indonesia. (2023a). Average monthly net wage in Indonesia as of August 2022, by type of occupation and by gender. Statistics Indonesia. Available online: https://www.statista.com/statistics/1251401/indonesia-average-monthly-income-by-sector-andgender/ (accessed on 20 February 2024).
  61. Statistics Indonesia. (2023b). Average monthly net wage of employees in Indonesia from February 2010 to February 2023. Statistics Indonesia. Available online: https://www.statista.com/statistics/1065801/indonesia-average-monthly-net-wage-of-employees/ (accessed on 20 February 2024).
  62. Statistics Indonesia. (2023c). Average urban monthly net wage in Indonesia as of February 2023, by sector. Statistics Indonesia. Available online: https://www.statista.com/statistics/996428/average-urban-monthly-net-wage-by-sector-indonesia/ (accessed on 20 February 2024).
  63. Tan, J., Zhang, H., & Wang, L. (2015). Network closure or structural hole? The conditioning effects of network-level social capital on innovation performance. Entrepreneurship: Theory and Practice, 39(5), 1189–1212. [Google Scholar] [CrossRef]
  64. Tash, M. S., Kolesnikova, O., Ahani, Z., & Sidorov, G. (2024). Psycholinguistic and emotion analysis of cryptocurrency discourse on X platform. Scientific Reports, 14(1), 8585. [Google Scholar] [CrossRef] [PubMed]
  65. Tatachar, A. V. (2021). Comparative assessment of regression models based on model evaluation metrics. International Research Journal of Engineering and Technology, 8(9), 853–860. [Google Scholar]
  66. Temin, P., & Voth, H.-J. (2006). Banking as an emerging technology: Hoare’s Bank, 1702–1742. Financial History Review, 13(2), 149–178. [Google Scholar] [CrossRef]
  67. Tovanich, N., Centellegher, S., Bennacer Seghouani, N., Gladstone, J., Matz, S., & Lepri, B. (2021). Inferring psychological traits from spending categories and dynamic consumption patterns. EPJ Data Science, 10(1), 24. [Google Scholar] [CrossRef]
  68. Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. Springer. [Google Scholar]
  69. Wang, Q., Zhang, C., & Li, Z. (2022). The role of social media in financial risk prediction: Evidence from China*. Asia-Pacific Journal of Financial Studies, 51(4), 618–650. [Google Scholar] [CrossRef]
  70. Wijaya, T. (2023). The rise of innovative credit scoring system in Indonesia: Assessing risks and policy challenges. Available online: www.cips-indonesia.org (accessed on 28 February 2024).
  71. Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5(2), 241–259. [Google Scholar] [CrossRef]
  72. Wu, T., Zhang, W., Jiao, X., Guo, W., & Alhaj Hamoud, Y. (2021). Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration. Computers and Electronics in Agriculture, 184, 106039. [Google Scholar] [CrossRef]
  73. Yao, G., Hu, X., Xu, L., & Wu, Z. (2023). Using social media information to predict the credit risk of listed enterprises in the supply chain. Kybernetes, 52(11), 4993–5016. [Google Scholar] [CrossRef]
  74. Yates, L. A., Aandahl, Z., Richards, S. A., & Brook, B. W. (2023). Cross-validation for model selection: A review with examples from ecology. Ecological Monographs, 93(1), e1557. [Google Scholar] [CrossRef]
  75. Zhang, X. (2024). Financial Data Anomaly Recognition Model Based on Improved Support Vector Machine. In Intelligent computing technology and automation. IOS Press. [Google Scholar] [CrossRef]
  76. Zhang, Y., Jia, H., Diao, Y., Hai, M., & Li, H. (2016). Research on Credit Scoring by Fusing Social Media Information in Online Peer-to-Peer Lending. Procedia Computer Science, 91, 168–174. [Google Scholar] [CrossRef]
  77. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320. [Google Scholar] [CrossRef]
  78. Zusrony, E., Purnomo, H. D., & Prasetyo, S. Y. J. (2019). Employee communication network mapping analysis using social network analysis. Intemsif: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi, 3(2), 145. [Google Scholar] [CrossRef]
Figure 1. The methodology workflow.
Figure 1. The methodology workflow.
Jrfm 18 00074 g001
Figure 2. Correlation matrix heatmap for classification models.
Figure 2. Correlation matrix heatmap for classification models.
Jrfm 18 00074 g002
Figure 3. Correlation matrix heatmap for scoring models.
Figure 3. Correlation matrix heatmap for scoring models.
Jrfm 18 00074 g003
Figure 4. Mean final score by education feature.
Figure 4. Mean final score by education feature.
Jrfm 18 00074 g004
Figure 5. Mean final score by occupation feature.
Figure 5. Mean final score by occupation feature.
Jrfm 18 00074 g005
Figure 6. Distribution of creditworthiness in classification models.
Figure 6. Distribution of creditworthiness in classification models.
Jrfm 18 00074 g006
Table 1. Creditworthiness domain expert.
Table 1. Creditworthiness domain expert.
IDProfessionIndustry/OriginExpertise
EXP001Credit Risk AnalystBankCredit risk assessment, borrower profile evaluation, financial behavior analysis, and credit scoring model validation.
EXP002Fintech Data ScientistFintech/Lending PlatformMachine learning model development, feature engineering for credit scoring, and social media data analytics.
EXP003Credit Policy RegulatorRegulatorCredit policy formulation, compliance with lending regulations, and risk management oversight.
EXP004Professor of Finance and Risk ManagementAcademiaCredit risk modeling, financial inclusion studies, big data analytics in finance, and academic validation of financial models.
EXP005Actuarist/Credit Scoring ConsultantSociety of Actuaries, IndonesiaReal-world credit scoring applications, feature prioritization, and advanced analytics for credit decisions.
Table 2. Domain expert tasks.
Table 2. Domain expert tasks.
TasksEXP001EXP002EXP003EXP004EXP005
Profile SelectionYesYesNo.YesNo.
Features WeightYesYesYes No.Yes
Model ValidationYesYesYesYesYes
Table 3. Defining the data to collect from LinkedIn.
Table 3. Defining the data to collect from LinkedIn.
CategoryDescription
Names (coded for privacy)
Current company
DemographicPast companies
Current or latest education
Past education
User ActivityPosts (textual content)
Comments (textual content)
Engagement metrics (count of posts, comments, reactions)
Social NetworkDirect connections (first-degree)
Secondary connections (second-degree)
Table 4. Extracting features category from LinkedIn.
Table 4. Extracting features category from LinkedIn.
CategoryFeaturesReference
AgeEnhancing Credit Scoring Model Performance by a Hybrid Scoring Matrix (Bo Wen et al., 2013) (with adjustment)
Education
DemographicOccupation
SalarySalary Guide Indonesia 2024 (Persolkelly, 2024) (with adjustment)
AnalyticalWhen Small Words Foretell Academic Auccess: The Case of College Admissions Wssays (Pennebaker et al., 2014), Linguistic Inquiry and Word Count: LIWC2015 (Pennebaker & Boyd, 2015).
PsycholinguisticClout
Authenticity
Emotional Tone
OpennessNew Approach of Measuring Human Personality Traits Using Ontology-Based Model from Social Media Data (Alamsyah et al., 2021).
PersonalityConscientiousness
Extraversion
Agreeableness
Neuroticism
DegreeEmployee Communication Network Mapping Analysis Using Social Network Analysis (Zusrony et al., 2019).
Betweenness Centrality
Social NetworkCloseness Centrality
ModularityThe Importance of Community: How Modular Organization of Social Networks Affects Their Collective Dynamics (Sinha, 2014).
DensityNetwork Closure or Structural Hole? The Conditioning Effects of Network-Level Social Capital on Innovation Performance (Tan et al., 2015).
Table 5. Descriptive statistics for credit scoring variables analysis.
Table 5. Descriptive statistics for credit scoring variables analysis.
VariableUnit ValueRangeData CountMeanStd DevMin25th PercentileMedian75th PercentileMax
AgeYears old-100040.8813.4018.0029.0041.0052.0064.00
SalaryIDR/month-10009,576,0616,773,1900.004,120,2508,412,00013,471,50033,285,000
Analyticscore0–100100049.9429.240.0123.6751.2974.1499.97
Cloutscore0–100100051.3729.460.0325.4952.0877.1199.97
Authenticityscore0–100100051.0428.880.1026.1050.8776.2799.97
Tonescore0–100100050.3029.020.0224.3250.5575.1499.92
Closeness centralityscore0–110000.490.280.000.240.490.731.00
Betweenness centralityscore0–110000.260.140.000.140.260.380.50
Modularityscore0–110000.490.280.000.240.480.731.00
Densityscore0–110000.510.280.000.280.510.761.00
Agreeablenessscore0–100100050.2629.000.0026.0050.0077.0099.00
Conscientiousnessscore0–100100048.1129.170.0023.0046.0074.0099.00
Extraversionscore0–100100048.8528.700.0024.0050.0072.0099.00
Neuroticismscore0–100100049.2429.120.0024.0050.0073.2599.00
Opennessscore0–100100050.0528.440.0026.0050.0075.0099.00
Final Scorescore0–100100062.596.8739.4158.2462.3567.0681.18
Table 6. Category data type features.
Table 6. Category data type features.
Category TypeCategoryCount
EducationHigh School344
Master191
Doctor176
Diploma146
Bachelor143
OccupationPublic Officer317
White Collar285
Blue Collar123
Other118
Freelance102
No Job55
Credit StatusBad630
Good370
Table 7. Classification model performance result.
Table 7. Classification model performance result.
ModelAccuracyPrecisionRecallF1 ScoreAUC ROC
AdaBoost0.9030.8980.8910.8940.969
CatBoost0.8510.8640.8150.8300.949
SVM0.8650.8560.8520.8540.938
Logistic regression0.8840.8770.8720.8740.948
Stacking classifier0.9500.9470.9470.9470.985
Table 8. Sample data classification.
Table 8. Sample data classification.
Name Age Education Occupation Salary Analytic Clout Authenticity Tone
Person 136High SchoolBlue Collar3,993,0003.4567.47796.45
Person 247DoctorWhite Collar20,326,00075.0972.8363.3788.23
Person 1046High SchoolBlue Collar2,524,00069.299446.1222.39
Person 1135DiplomaWhite Collar11,056,00031.0981.1636.8228.12
Person 1250BachelorWhite Collar10,946,00080.1345.8942.9699.66
ClosenessBetweennessModularityDensityAgreeablenessConscientiousnessExtraversionNeuroticismOpennessCreditworthinessStacking Classification
0.340.20.930.6287378158 50 00
0.990.040.40.4224683496 94 11
0.650.250.940.2147416195 74 00
0.910.350.260.5476342410 65 11
0.40.280.640.021382945 47 10
Table 9. Scoring model performance result.
Table 9. Scoring model performance result.
Model Accuracy Precision Recall
CatBoost0.90724.31682.0748
SVM0.84197.31892.7037
Elastic net0.84757.01152.64
Stacking regressor0.92983.23881.7962
LightGBM0.80119.24243.0381
Table 10. Sample data scoring.
Table 10. Sample data scoring.
Name Age Education Occupation Salary Analytic Clout Authenticity Tone
Person 136High SchoolBlue Collar3,993,0003.4567.47796.45
Person 247DoctorWhite Collar20,326,00075.0972.8363.3788.23
Person 1046High SchoolBlue Collar2,524,00069.299446.1222.39
Person 1135DiplomaWhite Collar11,056,00031.0981.1636.8228.12
Person 1250BachelorWhite Collar10,946,00080.1345.8942.9699.66
ClosenessBetweennessModularityDensityAgreeablenessConscientiousnessExtraversionNeuroticismOpennessCreditworthinessStacking Scorimg
0.340.20.930.6287378158 50 059.663
0.990.040.40.4224683496 94 171.980
0.650.250.940.2147416195 74 056.817
0.910.350.260.5476342410 65 167.372
0.40.280.640.021382945 47 164.352
Table 11. Descriptive statistics of expert scores and ICC results.
Table 11. Descriptive statistics of expert scores and ICC results.
Metric Value
Number of Experts5
Mean Score72.8
Standard Deviation6.4
Minimum Score61
Maximum Score85
ICC Score0.84
Reliability LevelGood
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alamsyah, A.; Hafidh, A.A.; Mulya, A.D. Innovative Credit Risk Assessment: Leveraging Social Media Data for Inclusive Credit Scoring in Indonesia’s Fintech Sector. J. Risk Financial Manag. 2025, 18, 74. https://doi.org/10.3390/jrfm18020074

AMA Style

Alamsyah A, Hafidh AA, Mulya AD. Innovative Credit Risk Assessment: Leveraging Social Media Data for Inclusive Credit Scoring in Indonesia’s Fintech Sector. Journal of Risk and Financial Management. 2025; 18(2):74. https://doi.org/10.3390/jrfm18020074

Chicago/Turabian Style

Alamsyah, Andry, Aufa Azhari Hafidh, and Annisa Dwiyanti Mulya. 2025. "Innovative Credit Risk Assessment: Leveraging Social Media Data for Inclusive Credit Scoring in Indonesia’s Fintech Sector" Journal of Risk and Financial Management 18, no. 2: 74. https://doi.org/10.3390/jrfm18020074

APA Style

Alamsyah, A., Hafidh, A. A., & Mulya, A. D. (2025). Innovative Credit Risk Assessment: Leveraging Social Media Data for Inclusive Credit Scoring in Indonesia’s Fintech Sector. Journal of Risk and Financial Management, 18(2), 74. https://doi.org/10.3390/jrfm18020074

Article Metrics

Back to TopTop