1. Introduction
The deep integration of finance and technology has become a global trend, with the Internet and information communication technologies significantly enhancing services such as mobile payments, digital insurance, investments, and information lending. Unlike traditional financial institutions that tend to favor the wealthy, Internet lending has indeed broadened the coverage of the financial system, especially through large tech companies providing loans using digital technologies, which significantly reduce the economic cost of financial services [
1]. This has made loan services easier to access for low-income groups and small and medium-sized enterprises (SMEs), promoting inclusive finance [
2,
3,
4]. Some claim that fintech performs financial activities over the Internet, using the Internet as a technological tool [
5], and that this technology-driven financial innovation, which focuses on finance, centers around financial services [
6]. While BigTech companies such as Ant Group leverage their technological prowess to raise funds, regulators have recalled the failed peer-to-peer (P2P), raising concerns about the risk management strategy of similar BigTech lending and financial stability.
In order to recognize the risks of technology’s inclusive advantages in finance, this article attempts to examine the expansion of financial business through the lens of big data risk control, with a specific emphasis on credit operations that are considered the most representative. By doing so, we seek to explore the boundaries between inclusiveness and risk, aiming to provide insights into addressing these issues to some extent.
Fintech, in theory, is a notion that falls on a spectrum, positioned between conventional financial lending in markets and “financial disintermediation” [
7]. Recent research emphasizes the importance of economies of scope in understanding the dynamics of Big Tech corporations, highlighting that bundling services and leveraging user data significantly contribute to their growth [
8]. Asymmetric information and transaction costs underpin conventional financial lending and markets [
7]. As Internet technology advances, financial practices will approach no financial lending corresponding to the Walrasian general equilibrium [
9]. In its 2020 ‘Regulations on Internet Lending by Commercial Banks’, the China Banking Insurance Regulatory Commission specified Internet lending as commercial banks using Internet and mobile communications technologies, cross-verifying and managing risk through risk data and models, automatically processing loan applications online, conducting risk assessments, and completing core operations such as credit approval, contract signing, loan disbursement, and post-loan management, and providing personal loans and working capital loans for eligible borrowers for consumption, daily business operations, and other purposes. Technological empowerment of finance, particularly in the credit business, reduces transaction costs and information asymmetry between lenders and borrowers, broadening possible financial transactions and making previously impossible transactions possible [
10], enabling “financial disintermediation” to be more sophisticated.
A prominent Internet lending approach is “peer-to-peer (P2P)”—unsecured credit loans made between lenders and borrowers via online lending platforms rather than banks [
11]. At the Financial Forum 2021, Liu Fushou, the chief lawyer of the China Banking and Insurance Regulatory Commission, reported that the number of operational P2P online lending institutions had dropped from 5000 at its peak to zero by mid-November 2020 (See
https://www.cnfin.com/news-xh08/a/20211022/2005181.shtml for details (accessd on 30 November 2022)). The thirteen years of P2P practice have revealed that reality is not as rosy as theory implies. Identifying borrowers’ information has dominated the theoretical study of this technique. Scholars have used empirical data from different P2P platforms to discover that basic borrower characteristics, such as gender, age, marital status, education, and race [
12,
13,
14], financial characteristics [
15,
16], and linguistic characteristics [
17,
18] significantly affect borrowing. It is true that many of these factors affect P2P lending success, but P2P online lending companies differ significantly from traditional financial institutions in their business models, meaning lending decisions still mainly rely on explicit hard data such as asset and liability data. More data do not modify the credit model or reduce screening time [
19].
Thus, improving financing convenience for those with low or middle incomes and small and medium-sized firms has increased default rates and inclusivity, increasing financial risks. However, BigTech lending, pioneered in China, uses advanced data analytics for risk management [
20]. Large Internet technology companies benefit from their “ecosystems”, cutting search costs compared to P2P. The long-tail impact of cheap search costs allows these companies to escape the P2P paradigm and reach the under-explored blue ocean market of conventional financial institutions’ long-tail industries [
21], granting them a competitive advantage. Big data technology removes “information noise”, allowing prospective inclusive finance consumers to precisely profile themselves via information search, scaling up small loans and micro-loans, and herein lies its worth [
22]. BigTech lending credit risk management benefits from the big tech ecosystem and big data risk control models [
23]. This fintech approach enhances SME default risk prediction, facilitating financial inclusion [
24].
As seen above, peer-to-peer (P2P) lending has ceased operations, whereas BigTech lending, despite facing challenges, continues to thrive. While BigTech lending has the potential to advance inclusive finance, it shares genetic similarities with P2P lending, raising concerns about the possibility of encountering similar pitfalls. The “black box” nature of big data-driven risk control also poses questions about its long-term trustworthiness. These issues highlight the need for robust regulatory frameworks to ensure that BigTech lending remains a sustainable and trustworthy component of inclusive financial services. This article uses mathematical models and numerical simulations to examine how technology enables finance to benefit “long-tail” clients inclusively and the associated risks in using the most common credit business in digital finance.
This study found that expanding company boundaries decreases data representativeness, generating systematic disparities between unknown data features outside the sample and sample data, harming big data risk control. This, in turn, may trigger potential financial risks. Specifically, in the competition for long-tail-end customers, a monopolistic big data lending model might have an inherent incentive to attract borrowers with excessively high risks, thereby increasing credit risks. Additionally, big data lending strategies might entirely replace conventional ones. Numerical simulations further indicate that interest rate competition among multiple institutions will accelerate market reach of the possible risk border in case of scarce funds.
This article makes three marginal contributions. First, it expands credit research in digital finance, as the existing literature extensively emphasizes financial technology’s benefits and role in financial inclusion. However, studies on potential risks associated with financial technology are few. This study examines the inclusivity and invisible risks of technology-based lending on a wide scale, using credit services as an example to bring new perspectives to the existing research on this topic. Secondly, this paper aims to integrate the representativeness in borrower data leading to increased loan defaults [
19], as well as the potential vulnerabilities of fintech lending institutions during the pandemic [
25]. This paper integrates the front-end of fintech credit business expansion with the back-end of risk control technology and uses mathematical and numerical simulation to explore fintech’s potential risks, supplementing existing research. Thirdly, the article shows that technical risk management boundaries do not match the operational constraints of financial institutions participating in BigTech lending, causing risk in the commercial applications of this approach. This discovery offers regulators a theoretical basis for targeted oversight, as BigTech firms face risk challenges [
26]. In summary, this paper evaluates the advantages and hazards of fintech from one viewpoint of the model to help support regulation to keep up with the irreversible wave of technology and healthy and compliant fintech growth.
The subsequent sections of this paper are organized as follows:
Section 2 constructs a mathematical model for a theoretical dissection of how technology empowers finance;
Section 3 conducts numerical model analysis;
Section 4 further discusses the conclusions from the mathematical analysis; and
Section 5 concludes the paper.
2. Theoretical Model
This paper focuses on two types of entities in constructing mathematical models: borrowers, or those with loan demand, and lending institutions, including emerging Internet-based BigTech credit providers and traditional bank-type lenders. Both observable factors, such as income, and unobservable latent risk variables affect each borrower’s default risk. New Internet-based BigTech credit providers are compared to established banks. This study excludes large commercial banks owing to market segmentation, which serve non-tail-end customers. Traditional banks, usually regional commercial banks, here referred to as financial institutions, share clients with BigTech lenders. Different lending institutions use different methods to detect unobservable risk factors for borrowers, which influence loan costs and business models. Traditional banks identify risk through due diligence and financial audits. In addition to traditional risk models, the institutional business manager’s expertise with the client based on the data obtained is used for assessing client risk. The institution has to engage with each client in detail, rendering a high marginal cost and declining returns to scale. Instead, Internet-based big tech credit institutions take advantage of the ecosystem to obtain consumer credit data at a reduced cost. Information-based risk factor detection can be improved by better data representativeness. More comprehensive credit data and default histories of loan clients in the institutional database will improve risk assessment for new customers. However, if the data is limited or lacks an outline of default characteristics for a certain customer group, it will be less accurate in assessing new clients’ risk. Data-driven rising returns to scale are the outcome of artificial intelligence and big data technologies, which differ from the conventional credit institution’s credit assessment approach by showing marginal increasing returns.
Under the circumstances described above, this paper examines the differences in market structure and interest rate levels in the steady state as market capacity increases, considering the presence of (multiple) large technology credit institutions in the market and their coexistence with traditional financial institutions in different competitive scenarios. It will examine how BigTech credit affects credit market competitiveness, risk identification, credit bubbles, credit accumulation, and other key areas.
2.1. The Model
2.1.1. Market Subjects
Borrowers: The borrowers in this model are characterized by tuple , where denotes the state space, denotes the set of borrower income, and denotes the set of risk states. We assume without loss of generality. is a probability density that characterizes the joint distribution of the borrowers in terms of income and risk states. Let be a measurable function that, for each type of borrower , denotes its potential default probability. For each type of borrower , we assume that its income is public information to the institution, while its risk state is an unobservable variable. We further assume that the lending institution does not know the distribution of borrowers or the conditional distribution of borrowers’ risk states on observable income . The lending institution needs to gather market data to estimate the risk state distributions and , and infer the borrower’s risk state before granting a loan and setting the interest rate.
For individual borrowers, whether to borrow from a given institution depends on the interest rate level offered by the institution. For simplicity, this paper assumes that each borrower has a reservation interest rate characterized by . A given borrower is willing to borrow from an institution at an interest rate if and exceeds the lending rate. If , a given borrower is willing to deposit a unit of income into the institution at as long as it exceeds . Depositors have no default issue while borrowing; therefore, this study assumes their default rate is always 0, i.e., when . To streamline the analysis, we assume that for each type of borrower , only one unit of loan (deposit) can occur, so the borrower’s decision in each period is a binary choice. In the more general case, the total amount of deposits and loans made by each borrower at a given interest rate can be characterized by adjusting the distribution function. We also assume that depositors’ reservation interest rate for depositors, characterized by remains constant, effectively necessitating an excess of funds in the deposit market. The spontaneous reservation interest rate choice will complicate money shortages in the deposit market; therefore, it is not used in the subsequent analysis, and in situations where there is a shortage of funds, lending institutions allocate funds based on competitive priority.
Lending institutions: This article examines two types of lending institutions, data-based BigTech credit institutions and conventional ones such as banks. According to the previous discussion, the credit model adopted by BigTech credit institutions based on Internet and big data technologies has an incremental benefit effect of data size, a feature that can be portrayed by the following Markowitz-based mean-variance expected utility function
where
is the fixed cost of the technology input and
is a set of consumers obtained by independent sampling from the distribution
.
is the potential customer base of the institution.
is the maximum norm of the given function.
is the empirical approximation of the distribution
constructed by the set
.
is the institution’s inferred value of the true risk state
of a consumer
given information about the potential customer base
. We assume that the conditional distribution of
given
and
is the distribution of the normal random variable
. Given
and
, the posterior distribution of
is the distribution of the normal random variable
, provided that the prior distribution is uniform. The profit function of the institution Equation (1) requires that the net profit of the institution equals the difference between expected return and risk cost (described by the posterior variance of the speculation error
) and the fixed cost of technology investment based on the customer group
.
We assume that the traditional bank-type lenders’ expected utility function, based on the decreasing returns to scale, is
where the cost function
is monotonically increasing with
, and
. Traditional bank-type lenders’ utility decreases monotonically with clientele size, which must be finite, which is determined by the equilibrium condition in which the difference between the expected return on a marginal loan and its marginal cost is zero. Traditional banks use a credit approach that incorporates client monitoring and communication, which allows them to collect reliable credit information from consumers even without a massive clientele. Therefore, we assume the risk cost term in utility function (2) is 0. However, credit reporting will bring higher financial service costs, and for many clients, management expenses may outpace potential advantages.
2.1.2. Optimization Problems and Competition Rules for Lending Institutions
If there is only a unique credit institution in the market and assuming that the market capacity is expanding over time, i.e., given a market sequence , the optimal interest rate and the optimal market share at the time will be obtained by solving the following sequential optimization problem:
credit institutions () compete on interest rates and customers, and the optimal interest rate vector with market share is determined by the Nash equilibrium of the following interest rate game:
For , define , and ;
For any
and any
, then, in period
, given the equilibrium interest rates of the other institutions with market shares
and
, the equilibrium interest rate
and market share
should solve the following constrained optimization problem.
This single (multiple) institution interest rate pricing and market share capture problem assumes that once a customer becomes a customer of a given institution, the institution cannot voluntarily terminate the lending contract without competition from other institutions.
2.2. Equilibrium
Given the borrower distribution , this paper considers the two scenarios that the market is well-funded (i.e., ) and that the market is not well-funded (i.e., , respectively), where is the indicator function. First, consider the well-funded scenario, where we have our first proposition.
Proposition 1. Assume
- i
For every market , there is , where denotes the cardinality of the set , and is a positive integer. In other words, the market expands by consumers each time.
- ii
and hold for all , i.e., with the most optimistic risk assessment for each consumer, there is a positive expected return on lending to each consumer.
- iii
and hold for all , i.e., with the most pessimistic risk assessment for each consumer, there is a negative expected return on lending to each consumer.
Then the following hold.
- i
In the case of a single institution: ,and .
- ii
In the case of multiple institutions: , - iii
If we additionally define
- (a)
For is defined to make the ith element in the ascending sorted
- (b)
Let , where
- (c)
Let
- (d)
Let
- (e)
Let
then every element in is monotonically non-increasing along and
The first conclusion of Proposition 1 indicates that when there is only one large technology credit institution in the market, this institution will monopolize the entire market with a probability of 1, and its optimal loan interest rate will be consistent with the borrower’s reservation interest rate, i.e., the consumer surplus of borrowers is zero.
When there are risky borrowers such that the expected net return on their borrowing , this group of borrowers will still be institutional clients. Thus, institutions have an intrinsic incentive to sign up high-risk borrowers despite big data. According to the first conclusion of Proposition 1, a single large technology credit institution will cause excessive credit expansion (excessive inclusion), excessive risk-taking that cannot be covered by equilibrium interest rates, and a credit bubble, especially when the proportion of high-risk borrowers in the market is large enough that .
The second and third conclusions of Proposition 1 suggest that, in the case of multiple BigTech credit institutions, there is at least one institution whose entire clientele constitutes a subset that is fully representative of all clients. All institutions’ equilibrium interest rates will converge with probability 1, and no customer’s loan rate may provide any institution with a positive expected return. Therefore, in terms of the equilibrium interest rate, any borrower is considered to have excessive risk.
Due to the diminishing returns of traditional banking loan institutions’ expected utility, combined with Proposition 1, we have the following corollary.
Corollary 1. If both BigTech credit institutions and traditional banks provide lending services, then , where denotes the optimal market share of traditional banks . In other words, the market share of traditional banks will converge to 0 with probability 1, i.e., squeezed out by BigTech institutions.
The number of borrowers that potentially generate expected profits will exceed the number of potential savings customers in a poorly funded market, i.e., , resulting in competition among loan customers and the likelihood that some are unlikely to receive loans. Insufficient funds leading to competition among loan clients result in a much more complex limit state of the optimal (equilibrium) interest rate and market share for a single (multiple) institution than that in a well-funded market. For simplicity, this article assumes a constant savings interest rate of 0. Multiple institutions compete by total quantity. If there are loan customers with positive marginal returns and savings customers, and , then only the first customers can obtain loans after all loan customers are sorted in descending order by expected return. If there is only one BigTech credit institution in the market, the following proposition can be obtained:
Proposition 2. When the market is poorly funded and there is only one BigTech credit institution, if we let and , then
When the market is under-funded, competitive forces might compel BigTech credit institutions to provide more loans to high-interest-rate groups. Adverse selection in the market caused by asymmetric information, i.e., high-interest-rate groups have a higher risk tolerance, cannot be eliminated through big data according to Proposition 2.
In a poorly funded market where there are multiple BigTech credit institutions and/or traditional lenders competing, the competitive stable state is much more complex than when there is only one BigTech credit institution. Thus, we will perform a computer simulation to assess multi-agency competition and compare it to the well-funded scenario in the next section.
3. Numerical Simulation
This section will exhibit competitive equilibrium interest rates, market shares, and representative data trends for single/multiple BigTech credit institutions using numerical simulations. The simulation settings are set as follows:
- (a)
Borrowers correspond to the parameter space , and the distribution of borrowers on it is , that is, the product distribution consisting of the Chi-square distribution on with degree of freedom 1 and the standard normal distribution on .
- (b)
The default risk function of the borrower is , where denotes the cumulative distribution function of the standard normal distribution. According to the above setting, the borrower’s probability of default increases with his risk factor and decreases with his income level . The borrower’s probability of default increases with his risk factor and decreases with his income .
- (c)
The reserve interest rate for the borrower is , where the reserve interest rate of each borrower is the sum of a benchmark interest rate and the default risk premium. We set .
- (d)
In the profit function of a BigTech credit institution, we set and
With the above parameter settings, each simulation in this paper independently and repeatedly samples 1,000,000 players from the borrower space to obtain a borrower set according to the probability distribution . Let the market expansion last times, where for each . Market capacity is , i.e., there are 1000 potential customers for each expansion. We investigate the different scenarios of the number of BigTech credit institutions, and denote the equilibrium market share sequence , data representative sequence , and in the case of , the equilibrium interest rate sequence .
We will simulate well-funded and poorly funded markets based on the above setting. We assume that depositors, as the source of money, have deposited funds with lending institutions at a set interest rate, hence the simulation assumes that this rate stays 0. In the case of a well-funded market, we assume the total amount of depositors is 1.25 times that of borrowers ; and the reverse holds true in the scenario of a poorly funded market.
Figure 1 and
Figure 2 depict market share, data representativeness, and interest rate evolutionary patterns for plentiful and scarce institutions.
Figure 1a,b show the optimal market share and data representativeness for the case of a single BigTech credit institution.
Figure 1c,d show the equilibrium share and data representativeness when
.
Figure 1e shows the equilibrium interest rate when
(average by individual borrower). In
Figure 1d, the envelope of representative evolutionary trajectories of multi-institutional data is marked with red lines, i.e., the trajectory of
.
Figure 1e displays the institution’s average interest rate level
under zero-profit conditions (see definition iii(e) in Proposition 1), with a green line based on the number of borrowers. The difference between the mean and equilibrium interest rates, as well as the evolution trend of the standard deviation, are marked with red lines and error bars (where for each period
, the aforementioned standard deviation is defined as the sample standard deviation obtained by individual borrower l for the variable interest rate difference
).
Figure 1 presents the numerical test of Proposition 1. The scenario of a single BigTech credit institution has the market share of the institution converging towards 1, and its data representativeness (measured by the KS distance based on the empirical distribution of customer groups and the actual loanee distribution) converges to 0, which verifies conclusion 1 of Proposition 1.
Figure 1e shows that in the multi-institutional situation, the equilibrium market interest rate decreases and eventually falls below zero on average. The intersection of the average equilibrium interest rate trajectory with the average zero-profit interest rate implies an overexpansion of credit and an overaccumulation of risk in a market-wide sense. The peak of
Figure 1e’s red error line will steadily shrink and potentially drop below 0 when additional institutions are included (
). This implies that for most borrowers, credit institutions’ equilibrium interest rates are insufficient to cover their default risk, and the market tends toward credit overexpansion and risk overaccumulation, verifying conclusion 3 in Proposition 1.
Figure 1d depicts data representativeness dynamics in the multi-institution scenario. First, the minimum value of the data representativeness measure for all institutions converges toward zero, and each institution’s measure also tends to converge toward zero, validating and strengthening Conclusion 2 in Proposition 1. Next, unlike the scenario in
Figure 1b, when multiple institutions compete, the representativeness measure’s convergence and minimum value towards 0 for each institution’s data are not a monotonically decreasing process, but rather exhibit a fluctuating decline. As
increases, the representative measure reaches a local minimum point, then reverses to increase until it reaches the next local maximum point before decreasing. The new minimum is less than the previous one, thus the local minima are cyclically converging towards 0. This convergence of cyclical fluctuations indicates market competitiveness. When the data representativeness metric is strictly greater than zero, an institution may misjudge a borrower’s default risk. Different borrowers have an equal chance of making errors in the case of a single institution. Thus, while recruiting new loan customers, the odds of various borrowers being absorbed are essentially equal, preventing a skewing of customer pool data towards a particular group. The multi-institutional scenario is different, with institutional competition driving down the equilibrium rate. Therefore, only a greater intake of clients who are at equilibrium interest rates and higher absolute interest rates is likely to make the institution profitable. This leads to institutions not only misestimating the probability of default, but also more likely to make mistakes in high interest rate groups when the representative measure of institutional data is strictly greater than zero. The newly acquired customer base may tend to shift towards high-interest-rate groups, which will decrease data representativeness and cause the evolutionary trajectory to show a periodic growth trend (
Figure 1d). However, competitive pressure will hasten the overall drop in interest rates for groups with higher equilibrium interest rates, substituting high-interest-rate groups for low-interest-rate groups. This will bias institutional misjudgment behavior towards the initial low-interest-rate group in later iterations owing to competitive pressure, restoring institutional data representativeness and causing the periodic downward trend in
Figure 1d. This process also illustrates inter-institution competition’s intricacy.
Figure 1 and
Figure 2 show that the poorly funded market exhibits significantly different dynamic from the well-funded market. A single institution, constrained by financial limitations, cannot dominate the whole market, resulting in the inevitability of borrowers being unable to obtain loans. At the same time, the institution’s data representativeness metric will no longer converge to zero, and it will rebound and rise after reaching a positive minimum point. The institution’s client base will continue to deviate from key lending groups and its variance from the market’s whole sample will grow.
Second, in the multi-institutional case, comparing
Figure 1c with
Figure 2c shows that equilibrium multi-institutional competition levels the well-funded market. For the poorly funded market, competition among multiple institutions will force some to exit the market (i.e., their market share tends towards 0), granting the leading institution monopolistic strength (i.e., it has a significantly greater market share than other institutions, and the gap is growing). A more competitive market has caused complicated shifting patterns in data representativeness among institutions. Combining
Figure 2d,e, it is clear that institutions whose equilibrium market share shows a trend of convergence to zero have more violent data representativeness fluctuations, while the head institution, with its position as the head, maintains a constant level that is significantly greater than zero and fluctuates less. Compared to the institutional data trajectory in
Figure 1d with sufficient funds, the equilibrium market share converges to zero.
Figure 2d shows typical fluctuations comparable to
Figure 1d, but at a much greater frequency. Due to growing competition between institutions, inadequate funding leads to more misjudgment behavior by institutions.
Figure 2d shows no fluctuations in the head organization’s data representativeness trajectory. The leading institution will progressively monopolize the market as smaller institutions are forced out or lose market share. The evolutionary trajectory of the data representativeness shows little volatility since its client base has stabilized.
In the case of multiple institutions competing in a poorly funded market, the equilibrium interest rate similarly displays a continuous decreasing trend, as seen in
Figure 1e and
Figure 2e. Credit growth and risk buildup will result from the equilibrium interest rate falling below institutions’ zero-profit threshold. This suggests that conclusion 3 in Proposition 1 applies to a poorly funded market.
Figure 2e shows that the mean of the equilibrium interest rate and zero profit interest rate intersect much sooner than in
Figure 1e. In a poorly funded market, inter-institutional interest rate competition will enable the market to discover the limits of excessive loan growth sooner.
Figure 2e shows that even with a modest number of institutions (
), the peak of the red error line will fall below 0 after the evaluation. In a poorly funded market, inter-institutional competition will accelerate credit growth, and reducing the number of competing institutions is not enough to reverse this tendency. The average equilibrium interest rate reflected by the blue line in
Figure 2e is much lower than that in
Figure 1e. This suggests that market competition in a poorly funded market accelerates the decline in equilibrium interest rates, which is another indicator of excessive credit expansion, resonating with our findings that institutional competition worsens it and confirming Proposition 2.
4. Discussion
In an unpredictable world, finance optimizes cross-temporal and geographical resource allocation while assuming risk. Information processing serves to identify, analyze, monitor, and price risks in the financial sector. Then, the smoother the chain of information processing from data to information, information to knowledge, and knowledge to decision-making, the more it will converge the risk of financial activities from unknown risk in Knight’s category to known risk [
27]. For credit operations, due to incomplete information in financial markets, interest rates not only differentiate between borrowers through adverse selection but also influence borrower behavior through incentive effects, thus creating perverse incentives for borrowers and leading to adverse selection in lending [
28]. In a crisis, borrowers’ moral hazard may worsen the bank’s finances [
29]. As a result, financial institutions prefer customers with good collateral and hard information (credit reports), while customers at the long tail, who bear high uncertainty, have a large mismatch between contingent losses and risk compensation; therefore, they are forgone. Financial inclusion may reach a limit when the cost of financial risk equals the reward of risk within external constraints. The Internet’s big data age has lowered financial institutions’ risk while dealing with long-tail consumers, from unknown hazards that were previously impossible or difficult to evaluate to known dangers that can be quantified. This enhances financial inclusivity. In the differentiated competitive environment, providing small loans (small amounts, unsecured credit) to small and medium-sized enterprises is a challenge for small and medium-sized banks. Their risk identification and assessment of customer service rely not only on standard risk models and credit reports but more on deep contact with customers by credit business personnel, which results in relatively higher marginal costs. Big tech credit depends on internet platform ecosystems’ enormous data and AI and big data technology. Owing to big data, credit data is essentially free and the risk assessment model is more accurate. Comparing the two models based on data-style control reveals two main differences. One is the vastly different cost of obtaining credit data; the other is the differing scale of credit data on which data analysis is based, and the resulting differences between relying on human experience judgment and relying on artificial intelligence algorithm judgment. The Internet credit system needs a broad technology ecosystem to acquire customers and build digital footprints. This may explain why Ant Group, JD Digits, and Tencent Financial Technology are growing but P2P lending has vanished. Using technology, an Internet credit system rooted in the platform ecosystem can compete with traditional financial institutions for long-tail customers (see the corollary to Proposition 1) and advance financial inclusion. Big data risk control models are lowering risks to known dangers, but model and technological advancements are not eliminating financial risk. Essentially, using computer algorithmic procedures to analyze big data and make out-of-sample predictions (including classifications) is a generalization process, the strength of which depends on the similarity of the system characteristics of the training data and the unknown data. As businesses grow and competition increases, training data will become less representative, causing unknown data to deviate from system features and thus posing financial risks.
In this study, we present a novel theoretical model for BigTech credit risk management that is grounded in measure theory, distinguishing our work from existing approaches. By leveraging the rigorous mathematical framework of measure theory, we effectively quantify and analyze the complex relationships between data distributions and risk factors associated with long-tail customers. This approach enables us to address the inherent limitations of traditional risk models, which often struggle to account for the variability and uncertainty present in extensive data sets. Furthermore, we employ numerical simulations to calibrate our model, providing empirical validation of our theoretical findings and enhancing the robustness of our predictions. By integrating measure-theoretic principles with simulation techniques, we offer a comprehensive framework that deepens the understanding of risk dynamics in BigTech lending, contributing significantly to the literature on inclusive finance and technological innovation.
According to the first conclusion of Proposition 1, even with big data technology, institutions have an intrinsic incentive to attract risky borrowers, which can lead to over-inclusion (over-expansion of credit) and a credit bubble. The Ant Group remains in the advantageous phase of the big data credit model, supported by the extensive Ali ecological network, facilitating the identification of relatively high-quality borrowers from the first influx of consumers throughout the model’s development. However, large-scale growth makes risk control dependence on technology lines challenging. As the size margin grows, Internet credit data representation worsens (see the numerical simulation section). Inter-institutional interest rate competition will help the market break through over-inclusion (over-expansion of credit) sooner, particularly in a poorly funded market. The eco-platform on which the big data credit model is based may have a crossover of customers, which may be different from the current situation where customer data are independent. Will the current algorithms be able to respond in a timely manner, and will this introduce new contingent risk factors?
Further, the big data credit model uses technology to promote financial inclusion and risk by using the Internet platform’s ecosystem. The boundary of prevalence for risk converges at the point where a single borrower has an expected return of zero, and it can be relaxed to the point where the expected return of the customer as a whole is zero. According to Ant Group’s growth, financial institutions may manage risks and make a profit when they reach the inclusive critical boundary of big data credit. The progressive overshooting from the former to the latter with fast incremental growth increases financial risk while providing financial services to more people. Visible, rapid scale-up will drive big data credit financial institutions’ customers to the long tail. As customer groups’ quality declines, data accumulate and systematic deviations occur. It is crucial that BigTech credit institutions that weather risk management algorithms are able to accurately assess long-tail customer groups’ risk. This group with reduced risk tolerance and debt capacity might highlight client appropriateness challenges, lowering credit quality and rendering risk accumulation. The supply side of credit funds, from the purchase of credit asset-backed securities (ABS) to the joint loan to the diversion mode, from the contribution of credit enhancement to the implicit credit enhancement to the complete absence of credit enhancement, has strongly boosted Ant Group’s scale to explosively increase. At this stage, lending is profit-driven, not risk-driven. As the customer base continues to expand rapidly and shifts toward the long tail, financial institutions that have previously relaxed their vigilance may need to reassess their risk management strategies. There is a possibility that these institutions could become more alert to the challenges posed by this evolving landscape. Additionally, BigTech credit institutions, while taking proactive initiatives to serve a broader market, may encounter moral hazard risks. This situation necessitates a careful evaluation of their practices to ensure that the drive for growth does not compromise ethical lending standards or exacerbate financial risks. It is unclear how to separate credit evaluation and risk management across businesses with various risk exposures. In conclusion, how to reconcile the business boundary of financial institutions in big data credit with the technological border of risk management is likely the source of hazards in commercial application scenarios of big data credit and should be the focus of regulatory efforts.
As a theoretical paper, this study does not include empirical data or case studies, which could offer more concrete validation of the proposed model. Although the flexibility of our approach avoids strong assumptions, this can also lead to oversimplification of real-world complexities. Additionally, the absence of comparative analysis with traditional financial institutions or other fintech models narrows the scope of the findings. While our focus on BigTech credit aligns with the research objectives, future studies could explore how this model performs in relation to other lending mechanisms, offering a broader perspective on its risks and benefits.
Another challenge lies in the dynamic nature of the fintech sector. Technological advancements and regulatory developments are constantly reshaping financial markets, and our findings reflect the conditions at the time of writing. Future research should periodically revisit and update these theoretical models to account for emerging trends and shifts in regulation. Moreover, the generalizability of our findings may be limited by varying regional market conditions and regulatory frameworks. Expanding the scope of future studies to incorporate empirical data and cross-market analyses will not only validate the insights of our model but also contribute to a more comprehensive understanding of BigTech credit and its implications for financial inclusion and risk management.
While our model is grounded in measure theory and offers a structured and rigorous framework for understanding risk dynamics in BigTech lending, it may not serve as a one-size-fits-all solution. Future research could benefit from exploring the contexts in which our model is most effective and identifying specific situations where its assumptions may not be applicable.
5. Conclusions
This study investigates how BigTech credit institutions navigate the tension between the expanding commercial boundaries of their operations and the inherent technological limitations in risk management. Big data credit uses the Internet ecosystem to bridge the gap between traditional financial services and social investment and financing needs, offering a long-tail advantage and improving inclusive finance coverage and penetration. The technical innovation of big data credit with science and technology has not eliminated financial risks in finance. This paper takes the most typical and common credit business in digital finance as an example and constructs a mathematical model, applying numerical simulation to explore the benefits of technology-enabled finance to long-tail customers and the associated risks. This research demonstrates that even with advanced technology, financial risks persist. Specifically, risk control boundaries enabled by technology do not fully align with the business boundaries of BigTech credit institutions, leading to new forms of risk. As business boundaries expand, the quality and relevance of training data deteriorate, resulting in deviations from unknown out-of-sample data patterns and potential contingent risks. In addition, BigTech credit may have an incentive to attract high-risk borrowers, which might lead to over-inclusion and a credit bubble. Thus, while BigTech lending provides inclusive benefits, these benefits must be weighed against the latent financial risks that accompany them. The numerical simulations find that in a poorly funded market, interest rate competition among multiple institutions will cause the market to break through the universal (over-expansion of credit) boundary sooner. The market eventually formed a monopoly of lead institutions.
The black box nature of big data-based risk control introduces challenges for regulators, as it reduces transparency and creates information asymmetries. This study recommends the following policies based on its findings. First, integrate BigTech credit into the regulatory framework, raise moral hazard awareness, and avoid systemic risk. The fintech credit industry requires better digital monitoring. It should fit within the legal system and monitoring system and be consistent. In the face of rapid scale expansion, when it is difficult to meet market demand for a variety of channels to provide credit funds with their own funds, beware of moral hazard in BigTech credit financial institutions’ self-managed lending, joint lending, and lending assistance modes. Simultaneously, the substantial volume of long-tail credit clients served by BigTech institutions has resulted in an expanding business scale, necessitating macro-prudential oversight of these leading tech entities based on the framework of systemically important financial institutions (SIFIs).
Second, develop fintech-specific technical indicators and business standards and offer targeted regulatory programs to align financial institutions’ big data credit business limits with risk control technological boundaries. To promote healthy BigTech credit development, a long-term regulatory framework is needed after short-term risk management. Traditional regulation emphasizes capital, while big data and digital technologies drive big tech credit models. This requires regulation to become digital to meet the requirements of the big data and intelligent eras, creating a dual-element regulatory framework for finances and data. The asymmetry of division of labor and knowledge in the BigTech credit model allows for a minimal risk-sharing ratio.
BigTech credit shares some similarities with the P2P model, particularly in its reliance on digital platforms and data for lending decisions. However, like P2P lending, BigTech credit can attract high-risk borrowers due to over-inclusion, leading to credit bubbles. Numerical simulations in this study indicate that in poorly funded markets, competition for interest rates among multiple institutions may accelerate market saturation, causing credit boundaries to be breached. The market is likely to consolidate under a few dominant players, creating monopolies and further squeezing conventional banks out of long-tail client segments. Without careful regulation, BigTech lending may replicate the fragility observed in P2P lending, challenging its promise as a solution for sustainable financial inclusion. Thus, our final recommendation is to establish a joint-stock social unified credit technology company that utilizes data assets as a basis for equity investment. This initiative aims to break down information silos and integrate diverse data sources, significantly enhancing the dimensionality of data available for risk assessment. By leveraging this comprehensive data integration, we can empower financial institutions—especially small and medium-sized enterprises (SMEs)—to better manage risks and improve their credit offerings.
Moreover, the proposed framework must address critical ethical considerations regarding data ownership, rights, and profit-sharing. We suggest that a government-formed financial intermediary oversees the establishment of this company to ensure ethical practices and transparency in data usage. This intermediary would play a crucial role in confirming data asset ownership and facilitating collaboration among various stakeholders. By balancing the need for innovation with responsible data management, this proposal aims to create a sustainable ecosystem that supports the healthy and orderly development of financial technology institutions while promoting inclusive finance.