Next Article in Journal
Determinants of Digital Payment Adoption Among Generation Z: An Empirical Study
Previous Article in Journal
Booking Sustainability: Publicly Traded Companies as Catalysts for Public Goods Provision in Brazil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Can We Use Financial Data to Predict Bank Failure in 2009?

by
Shirley (Min) Liu
Ness School of Management and Economics, South Dakota State University, Brookings, SD 57007, USA
J. Risk Financial Manag. 2024, 17(11), 522; https://doi.org/10.3390/jrfm17110522
Submission received: 10 July 2024 / Revised: 30 October 2024 / Accepted: 15 November 2024 / Published: 19 November 2024
(This article belongs to the Section Business and Entrepreneurship)

Abstract

:
This study seeks to answer the question of whether we could use a bank’s past financial data to predict the bank failure in 2009 and proposes three new empirical proxies for loan quality (LQ), interest margins (IntMag), and earnings efficiency (OIOE) to forecast bank failure. Using the bank failure list from the Federal Deposit Insurance Corporation (FDIC) database, I match the banks that failed in 2009 with a control sample based on geography, size, the ratio of total loans to total assets, and the age of banks. The model suggested by this paper could predict correctly up to 94.44% (97.15%) for the failure (and non-failure) of banks, with an overall 96.43% prediction accuracy, (p = 0.5). Specifically, the stepwise logistic regression suggests some proxies for capital adequacy, assets/loan risk, profit efficiency, earnings, and liquidity risk to be the predictors of bank failure. These results partially agree with previous studies regarding the importance of certain variables, while offering new findings that the three proposed proxies for LQ, IntMag, and OIOE statistically and economically significantly impact the probability of bank failure.
JEL Classification:
C01; G01; G17; G21; G33; M41

1. Introduction

Why would one be interested in predicting bank failures in advance? Clearly, the problem of bank failure is a very important and serious problem in our society. The savings and loan crisis of the 1980s cost the taxpayers more than one hundred billion dollars (between two and three percent of GDP), while, according to the report of the Government Accountability Office, the 2008 financial crisis may cost the U.S. economy USD 13 trillion (over the GDP of an entire year).1 However, these are just the direct costs. There are numerous indirect costs as well when the banking system enters crisis mode. The banking crisis tends to severely tighten lending in the country and drastically slow the velocity of cash flow, thus worsening the economic situation.
The question is, of course, whether bank failure could be predicted in advance. The answer to this question is positive. Some previous studies focus on building a prediction model for future crises. Martin (1977) suggests using the logistic model to predict bank failure and factors that could predict bank failure (e.g., asset risk, liquidity, capital adequacy, and earnings). Meyer and Pifer (1970) document that loan growth, the ratio of operating revenues to operating expenses, and types of loans are associated with bank failure. However, the aforementioned papers were published before the 2008 financial crisis and it is important to know whether the variables used to predict bank failure before the 2008 financial crisis are still relevant today; therefore, this paper employs the predictors of the bankruptcy used by the previous studies to predict bank failure in the 2008 financial crisis. Furthermore, this paper proposes and examines a few new proxies—leverage (tangible capital ratio, denoted as Leverage), interest margins (denoted as IntMag), loan quality (loan riskiness, denoted as LQ), and the ratio of operating income to the operating cost of funds (denoted as OIOE) to find out whether the proposed new empirical proxies could be used to predict bank failure. This paper could fill the gaps in the current literature by adding novel proxies to the current existing literature.
To empirically examine my research question on whether the proxies documented in the previous studies and newly proposed proxies could be used to predict bank failure in 2009, I first analyze the sample statistics to find out the differences in the characteristics between the failed banks and not-failed banks. The two types of banks are matched by the size, age, and ratio of the total loans to the total assets of the banks within the same state. The sample descriptive statistics show that the not-failed banks are more capital-adequate, profitable, earnings-generation efficient, and liquid than the failed banks. Then, I regress the indicator variable of Bfailure (coded as one if the bank failed in 2009 and zero otherwise) separately on each explanatory variable to analyze if the previously documented empirical proxies and newly proposed empirical proxies could predict bank failure in the logistical regression. The results show that in the initial selection, each of the 16 empirical proxies, having the available data, might be a good predictor for bank failure in 2009. Then, I use a stepwise regression procedure with the criteria of p ≤ 0.1 for entering the model, and p ≤ 0.2 for staying in the model to select the prediction model. The results of the stepwise regression procedure show that 10 empirical proxies for capital adequacy, asset risk, interest income efficiency, profitability, and liquidity are good predictors of bank failure in the year 2009, the year subsequent to the 2008 financial crisis.
The empirical results of the stepwise procedure further show that among the newly proposed empirical proxies, loan quality (LQ), interest margins (IntMag), and profitability (OIOE) are statistically and economically significant predictors of bank failure, while Leverage (the tangible capital ratio) is not selected by the stepwise procedure. Particularly, empirical results show that in the stepwise procedure-selected regression model of regressing the indicator variable (Bfailure) on the explanatory variables, the coefficient on the LQ (loan quality), IntMag (interest margins), and OIOE (the ratio of operating income to operating expenses) are 297.603, −250.620, and 6.399, respectively, and statistically significant at the 0.01 level. These results are striking because the results indicate that these proposed proxies could be effective predictors of bank failure in the future. For example, with one unit increase in LQ (loan riskiness), the odds ratio (in this paper, the odds ratio is the ratio of the probability of the bank failing to the probability of the bank not failing) would increase by e297.603, the magnitude of which is immense. In other words, this result shows that with one unit increase in the LQ of a bank, the chance of the bank failing is e297.603 times more likely than the chance of not failing in the year 2009 when holding other factors constant. The impact of LQ on the likelihood of bank failure illustrates that the proxies (proposed by this paper) for loan quality (LQ), interest margin (IntMag), and profitability (OIOE) have an economically and statistically enormous impact on the probability of bank failure.
The empirical results are also consistent with previous studies that the risk dimension of the capital adequacy, asset (loan) risk, earnings efficiency, profitability, and liquidity of the bank could predict the bank’s failure in the future. For instance, the coefficient on the variable of return on assets (hereafter ROA) and a proxy for capital adequacy (e.g., T1CRAT—tier one capital ratio) in the year 2008 are statistically significant with a large magnitude in the logistic regression where I use a binomial variable to indicate bank failure in year 2009.2
Therefore, this paper contributes to the literature, practitioners, and regulators by examining and documenting that some empirical proxies (suggested in the previous studies) are still relevant in the later financial crisis, and newly proposed and validated three neoteric empirical proxies for the risk of loan riskiness, earnings efficiency, and profitability, which could effectively predict bank failure in the future. Thus, this study offers novel empirical proxies and evidence of using publicly available financial data to predict bank failure in 2009 to the current literature and whoever is interested in bank failure.
The paper proceeds as follows: Section 2 reviews the relevant literature and introduces the measurements of variables used in the logistical model to predict bank failure. Section 3 describes data and sample construction as well as reports the descriptive statistics of the empirical sample. Section 4 reports a correlation analysis of the variables investigated. Section 5 and Section 6 conducts a graphical analysis of the variables and model selection and sensitivity tests, respectively. Section 7 discusses the possible caveats, future research, and applications of this study. Section 8 concludes the paper.

2. Literature Review and Variable Measurement

This study investigates whether using the financial ratios used in the previous studies and four new ratios proposed by this study and a simple logistic regression method could predict bankruptcy in the recent financial crisis. This section first reviews the literature relevant to the current study and then describes the measurement of the variable of interest.

2.1. Literature Review

This paper is related to a stream of literature on predicting bankruptcy. This subsection first reviews the previous studies, which use financial ratios to predict bankruptcy, and then reviews the literature on research models that have been employed to predict bankruptcy.
The majority of the previous literature focuses on using financial ratios to predict the bankruptcy of firms in industries other than the banking and utility industries. I brief the studies closely related to this study. Employing an initial sample of 66 U.S. firms over the sample period of 1946–1965, Altman (1968) systematically assesses the quality of ratio analysis as an analytical technique in the setting of using financial ratios to predict corporate bankruptcy. Altman (1968) finds that the ratios (of working capital to total assets, retained earnings to total assets, earnings before interest and taxes to total assets, market value of equity to book value of total debt, and sales to total assets, all of which are computed by using one-year-ahead data) could correctly predict 95% percent of sample firms into bankruptcy and non-bankruptcy. This prediction power of the ratios is sustained when using alternative samples over different sample periods. Altman (1968) also finds that the prediction power of ratio analysis decreases when the lead time of financial ratio increases (e.g., the data used to compute ratios using a longer period—up to the fifth year—prior to the year of filing bankruptcy).
Using a sample of 53 bankruptcy firms and 58 matched non-bankruptcy firms over the sample period of 1969–1975, Altman et al. (1977) propose a new model (ZETA), the z-score (or zeta score), to predict the likelihood of bankruptcy of a corporation in the next two years. All the sample firms are selected from manufacturer and retailer industries; Altman et al. (1977) propose and find that the seven variables (e.g., return on assets—return on assets [ROA], stability of earnings—a measure of the standard error of estimated ROA, debt service—the ratio of earnings before interest and taxes to total interest payments, cumulative profitability—ratio of retained earnings to total assets, liquidity—current ratio, capitalization—common equity, and size) could correctly predict over 90% (70%) if using data from financial statements one (five) year prior to the failure.
Ohlson (1980) uses conditional logistic regression and financial ratios to predict corporate bankruptcy over the sample period of 1970–1976. The data used by Ohlson (1980) is unique because the researcher could identify the time when the financial report was available, and therefore, could determine if the time of reporting financial statements was before or after the time of filing bankruptcy. The feature of data is very important when assessing if the financial ratios could predict the subsequent corporate failure. The author finds that the measures of size, financial structures, performance, and current liquidity have predictive power to the corporate failure. Consistent with the previous literature, Ohlson (1980) finds that the longer the lead time of the financial ratios is, the less accurate the prediction of corporate failure is: using one-(and two-)year lead time financial ratios, the model could predict bankruptcy 96.12% (and 92.84%) accurately.
This study extends upon Altman (1968), Altman et al. (1977), and Ohlson (1980) by using more recent data and proposing a few new predictors of bankruptcy.
Among statistical models, discriminate analyses, logistical regressions, and factor analyses are popular simple statistical approaches that have been employed to predict bankruptcy. Assuming that the sample is normally distributed, discriminate analysis is a statistical method that categorizes unknown observations and the odds of their categorization into a certain category. Logistical regression is also a statistical method that could be used to model the chance of grouping observations into a certain class or event (e.g., alive/dead, going concern/bankrupt…). Some earlier literature (e.g., Dietrich and Kaplan 1982; Karels and Prakash 1987; Haslem et al. 1982) often uses discriminant analysis as the statistical method to predict bank failures. Discriminant analysis usually includes three subcategories: linear, multivariate, and quadratic. However, discriminant analysis requires data to be normally distributed, which limits its application in statistical analysis. Even though in the case that the distribution of regressors is not normal, the logistic regression method (i.e., a type of maximum likelihood estimation method) still could be used to predict bankruptcy.
I adopt logistic regression to select the predictors of bankruptcy rather than discriminant analyses because of the following reasons: First, logistic regression is suitable for my research question about whether we could use financial ratios suggested by previous studies and this study to predict bankruptcy because logistic regression is a good statistical method for studying the underlying structure of the prediction while discriminant analysis is a good statistical analysis for grouping itself (Wooldridge 2009). Moreover, the statistical software used by this study to conduct data analyses has a built-in feature to do classification (grouping) analyses by using logistic regression-related programming codes.3 Second, logistic regression is an appropriate statistical tool for the model with the dichotomous dependent variables, while discriminant analysis is an appropriate statistical tool for the model with the continuous dependent variables. The dependent variable of the model adopted in this study is a dichotomous variable. Third, logistic regression requires less assumptions than discriminant analysis. Discriminant analysis requires multivariate normality while logistic regression is robust to the deviation from normality. Therefore, logistical regression is an appropriate statistical method for the purpose of this study.
The recent literature often adapts the logistical regression method and may combine it with other types of statistical analysis methods to predict bankruptcy to measure the operating condition of banks. For example, West (1985) explores and finds that the combination of using a logistical regression model and factor analysis is useful to measure and evaluate a bank’s operating and financial health condition. Davis and Karim (2008) compare logistic regression, a parametric statistical method, with the signal extraction early warning system (EWS) method, a non-parametric method that possibly predicts crisis based on the magnitude of the deviation of a variable from its normal value. Using a sample from 105 countries over the period of 1979–2003, Davis and Karim (2008) argue that logit is a more appropriate method than signal extraction for country-specific EWS.
Employing a multi-period logit model, DeYoung and Torna (2013) test and find that nontraditional banking activities are attributed to the failures of U.S. commercial banks during the financial crisis. In particular, DeYoung and Torna (2013) find that (i) the probability of distressed bank failure decreased with solely fee-based nontraditional activities (e.g., securities brokerage and insurance sales) and increased with asset-based nontraditional activities (e.g., venture capital, investment banking, and asset securitization); (ii) banks undertaking risky nontraditional activities tended to conduct risky traditional lines of business. Cole and White (2012) analyze and find that the traditional proxies for the CAMELS components (i.e., Capital adequacy, Asset quality, Management, Earnings, Liquidity, and Sensitivity to market risk) and for commercial real estate investment could explain the bank failure in 2009. However, they do not find statistically significant evidence that residential mortgage-backed securities determine bank failure. The results of Cole and White (2012) raise questions on the regulatory risk weights and focused limits on commercial real estate loans of the current banking regulatory system while supporting that CAMELS is good at evaluating the survivorship of commercial banks.
Among the aforementioned studies that are related to this study, Cole and White (2012) is the closest study to the current study. However, this study is different from Cole and White (2012) and other studies related to this study in the following ways: (i) this study proposes four novel ratios that are not documented in the previous literature, based on my best knowledge; and (ii) provide analyses for recent bank failures (i.e., Silicon Valley Bank).
In addition to the existing statistical methods (e.g., discriminate analysis, logistic regression, and factor analysis), recent research introduces intelligent techniques (e.g., neural networks, decision trees, self-organizing maps, case-based reasoning, evolutionary approaches, rough sets, etc.) into bankruptcy analyses. Kumar and Ravi (2007) and Demyanyk and Hasan (2010) review the literature on the intelligent techniques that could be used to predict bank crises. Although intelligent techniques increase the accuracy of the bankruptcy prediction of simple statistical analyses from roughly around 70–95% to 100%, I would argue that the simple logistical analyses offered in this paper still have a good value of reference for practitioners and researchers who would like to use financial information from publicly filed financial statements and a simple logistical regression to conduct bankruptcy prediction for the banking industry in and after the past 2007–2009 financial crisis.

2.2. Variable Measurement

The dependent variable of the logistic regression model is a binary variable on the left-hand side of the regression equation and independent variables are on the right-hand side of the equation with the error term (µit).
Bfailuret+1 = β0 + βiIndependentVariableit + µit
where subscript i denotes the ith independent variable (i = 1, 2, 3, …), and t denotes the time t, the fiscal year 2008.
The binary variable of “Bfailure” is the dependent variable; coded as one if a bank fails in the subsequent year of t + 1 (the year 2009), otherwise as zero. I suppress the time subscripts t in my variable definitions and statistics analyses to be brief. I use a bank’s financial ratio for 2008 to predict its failure in 2009.
The independent variables in time t (year 2008) are grouped below.
Group1: Capital adequacy measurements
  • CLTL = Commercial loans/total loans.
  • T1CRAT = Ratio of tier 1 capital/risk-weighted assets, could be noted as a risk-weighted capital ratio.
  • T1CRev = Tier 1 capital/total interest and non-interest income (before the deduction of any expense), could be noted as the gross revenue ratio.
  • Leverage = Common share equity net of intangible assets/the total assets net of intangibles. This proxy could be regarded as a tangible capital ratio too. The difference between the measurement of tangible capital ratio (Leverage) proposed in this study and the previous literature is that the current measurement only counts the tangible asset effect on the capital adequacy ratio. It is interesting to look at how tangible capital affects bank failures because previous research always focuses on the effect of both tangible and intangible capital on bank failures.
I would expect that the more capital the bank has, ceteris paribus, the less likely the bank is to fail. Therefore, I expect that these four variables are negatively associated with the dependent variable.
Group 2: Asset (Loan) risk
  • GCOOI = Gross charge off/net operating income. The more charge off the loan of the bank has, the riskier the loan is. If the bank has more riskier loans, the more likely the bank will fail. Therefore, I would expect a positive relationship between GCOOI and bank failure.
  • LossLS = Loss provision/the sum of total loans and securities. The more loss provision, the riskier the loan is expected. Therefore, I would expect a positive relationship between LossLS and the dependent variable (Bfailure).
  • GCOTL = Gross charge offs/total loans. This loan risk proxy could be a proxy for default risk too. The higher the ratio is, the lower the quality of the total loans are. If a bank writes off a large proportion of its loans, then the probability for borrowers to repay the principal and interest of the loans on time is lower, which may increase the possibility of failure of the bank. Therefore, I expect that this explanatory variable is positively related to the dependent variable.
  • LossRes = Loan loss allowance/total charge off. LossRes could capture the degree of conservatism of the bank when the bank estimates the possible loan loss, a proxy for the default risk of a bank. The larger its value is, the less likely the bank failure is. Therefore, I would expect the coefficient of the variable to be negative in the regression.
  • LQ = Total interest revenue from loans/total loans, a proxy for loan quality (or the riskiness of the loan). I would assume that the riskier the loans are, the higher the interest rate should be on those loans. Hence, the larger the ratio is, the lower the quality of the loans of the bank is, and therefore, the higher the risk of bank failure is. I would expect the coefficient of the variable to be positive in the regression.
  • LoanRet = Loan revenue/total loans, a proxy for loan quality. I expect that the riskier the loans are, the higher the revenues should be earned from those loans. Therefore, I expect the LoanRet is positively related to the dependent variable (the probability of failure of a bank).
In sum, I would argue that a bank with riskier assets is more likely to fail subsequently.
Group 3: Efficiency (Pricing)
IntMag = (total interest income − total interest expenses)/the total liabilities. This interest margin is proxied for default risk. The higher the ratio is, the more efficiently a bank uses its interest-bearing liabilities, thereby, the less likely a bank subsequently fails. Therefore, this variable is expected to be negative in the regression.
Group 4: Earnings
  • PM = Net income/total revenues (net income/[non-interest income + interest income + income from trading assets + income from federal funds sold]). PM is denoted for profit margin proxied for the efficiency of profitability. The larger the ratio is, the less likely a bank failure is. Therefore, I would expect a negative relationship between this variable and the dependent variable.
  • ROA = Net income/lag of total asset (NIt/ATt−1), denoted for return on asset. ROA measures how efficiently a bank uses its assets. The more efficiently the bank uses its assets, the less likely the subsequently fails. Therefore, ROA is expected to be negatively associated with the dependent variable in the regression.
  • OIOE = Operating income/operating expenses. OIOE is proposed by this paper to proxy for earnings and expenses efficiency. In terms of measuring risk dimension, this variable is similar to profit margin. However, this variable is different from the profit margin. This variable measures how many dollars of operating income a bank earns with respect to one dollar of operating expenses, while the profit margin measures how many cents a bank can earn from each dollar of total revenues. The more efficiently the bank makes profits for the given amount spent, the less likely it subsequently fails. Therefore, the coefficient on OIOE in the regression is expected to be negative.
Group 5: Liquidity Risk
  • TLoanAT = Total loans/total assets.
  • DepositAt = Total deposit/total assets, a proxy for the liquidity risk. Banks record their deposits as liabilities. It might be difficult for a bank with a large ratio of DepositAT to repay its deposits to the depositors when the depositors ask for a huge amount of their deposits back. This implies that the larger this ratio of the bank is, the more likely the bank may fail. Therefore, I expect a positive relation between this variable and the dependent variable.
  • CashAT = Net liquid assets/total assets: (Cash + Assets Held in Trading Accounts)/total assets.4
I argue that the higher liquidity risk the bank has, the higher the probability the bank fails. Therefore, I expect a positive relationship between the above proxies for the liquidity risk and the Bfailure.
Control variables:
  • Size = log (total assets). It is well known that “too big to fail”. The larger a bank is, the less likely the bank fails (Wheelock and Wilson 2000)5. Therefore, I expect a negative relationship between this variable and the dependent variable.
  • Age = The year 2009—the year in which the bank started its business, which measures how long the bank has been in business. Theoretically, the longer the bank has been in business, the higher the chances that its management has been through several credit cycles and survived. Therefore, the management of the senior bank is likely to be more conservative and better than that of younger banks. Then, a senior bank might be less likely to fail than a younger bank. However, the senior bank may have an out-of-date system and might be more likely to fail than the younger banks. Because the relationship between the age of the bank and the chance of bank failure is ambiguous, I do not predict the sign of age in the regression equation.

3. Data and Sample and Sample Descriptive Statistics

3.1. Data and Sample

The list of banks that failed in the year 2009 was obtained from the website of the Federal Deposit Insurance Corporation (hereafter FDIC).6 The FDIC reports that one hundred forty (140) banks failed in the year 2009. Financial data of banks was obtained from the Bank Regulatory database in WRDS. I use the Bank Regulatory database instead of the data provided by Compustat because this database is more suitable for my research purpose. This database is from the Federal Reserve Bank of Chicago (FRB Chicago) and contains data on the Report of Condition and Income (named “Call Report”) for all the banks that are regulated by the Federal Reserve System, Federal Deposit Insurance Corporation (FDIC), and the Comptroller of the Currency. This database considers the fact that banking has certain unique characteristics that are not present in other industries, which are reflected in the much more available variables in this database. This database also covers more banks than the Compustat database. The advantages offered by the Bank Regulatory database provide more options to construct variables, which may capture the differences in the properties of failed and not-failed banks and could provide flexibility in selecting sample banks.
Not-failed banks are selected from the Bank Regulatory database and matched with failed banks by geography (the states where failed banks are located), size (log value of total assets), ages of banks (how long the banks have existed in the business), and the ratio of total loans to average values of total assets (TloanAT) over year t and t − 1. First, I restrict the control subsample banks having a size, age, and ratio of total loan to average total assets within an interval of the mean value plus and minus half of the standard deviation of each of these three variables (i.e., size/age/ratio of loan to assets) of the failed banks in a state at which the failed banks were located. If this restriction eliminates all the not-failed banks in a given state, I will relax the restriction and use a relatively wider interval for size to search again until I find matched not-failed banks. This procedure led to 258 not-failed banks matched with failed banks in the year 2009. The full sample comprises two sub-samples of failed (119 banks) and not-failed banks (258 banks), amounting to 377 banks in total. (Please refer to more details about the FDIC dataset in the Appendix A).

3.2. Sample Descriptive Statistics

Panels A and B of Table 1 report the descriptive statistics of the not-failed and failed bank subsamples, respectively. The big difference between the value of 1% and minimum and the difference between 99% and maximum of the sample could indicate that there might be outliers in the sample. The sample descriptive statistics show that there are outliers in the sample. The outliers could also be identified by means and univariate procedure.7 Trimming or/and winsorizing the sample may result in more statistically significant results, while doing so may lose the accuracy of the predictability of the model proposed in this paper. Therefore, I decided to not trim or winsorize the sample data to count into the effect of outliers, which may reveal additional information to predict bank failure. Without disappointing me, regression results show that the prediction model proposed in this study does a good job without trimming or winsorizing the final sample data.
I use the univariate test for the differences in the value of mean and median between the two subsamples to investigate whether there is any systematical difference in the characteristics between the two subsample banks. Table 1 shows that for the not-failed (failed) bank subsample, the mean values of T1CRAT and T1CRev are 0.119 (0.065) and 1.454 (0.864), respectively. The magnitude of the mean value of proxy for the capital adequacy of the not-failed banks is almost twice that of the failed banks. These differences in the mean and median values of proxies for capital adequacy are not only statistically significant at the 0.01 level but also economically significant. In other words, not-failed banks are generally more capital-adequate than failed banks.
For the not-failed (failed) bank subsample, the mean value of GCOOI, LossLS, GCOTL, LQ, and LoanRet is 0.089 (0.375), 0.009 (0.040), 0.008 (0.032), 0.078 (0.081), and 0.065 (0.068), respectively. According to the construct of the proxy, the larger value of the ratio, the higher the asset risk of the bank. Results of the descriptive statistics for the asset risk proxy indicates that the failed banks have much higher asset risk than not-failed banks. Furthermore, the difference in mean and median value of the proxy for the asset risk is statistically significant above 0.1 level.
The mean value of IntMag is 0.037 (0.029) for the not-failed (failed) bank subsample, which indicates that the not-failed banks more efficiently earn profit from loans than failed banks. For the not-failed (failed) bank subsample, the mean values of PM, ROA, and OIOE are 0.028 (−0.811), 0.003 (−0.048), and 1.240 (0.969), respectively. According to the construct of the proxy for the earnings, the larger the ratio, the less likely the bank will subsequently fail. The differences in the mean (and median) values between the not-failed banks and failed banks are statistically and economically significant. The descriptive statistics for the proxy of earnings indicate that the not-failed banks are more profitable and more efficiently earn profit and use their assets than the failed banks.
For the not-failed (failed) bank subsample, the mean value of TLoanAT, DepositAT, Leverage, and CashAT is 0.734 (0.734), 0.820 (0.839), 0.092 (0.053), and 0.039 (0.052), respectively. According to the construct of proxy for the liquidity of risk, the larger the value of TLoanAT, DepositAT, and Leverage, the higher the liquidity risk of the bank. The differences in the mean (and median) value between the not-failed banks and failed banks are statistically and economically significant, indicating that the not-failed banks have less liquidity risk than the failed banks. If the cash of the bank is obtained mainly from its liability (such as deposits of its customers), then the larger percentage of cash in the total assets is obtained from liability, the higher default risk the bank may face. Therefore, the bank having such kind of cash is more likely to default compared to the bank having a smaller CashAT ratio if holding other factors constant. The mean and median value of CashAT of the failed bank subsample are statistically significantly larger than those of the not-failed bank subsample, which is consistent with my expectation.
In summary, the sample descriptive statistics reported in Table 1 suggest that compared to the failed banks, not-failed banks are more capital-adequate and more profitable and more efficiently use their assets to make a profit, having lower asset risk and less liquidity risk.

4. Correlation Analysis of the Variables Investigated

Table 2 reports the Pearson (above the diagonal) and Spearman (below the diagonal) correlation for empirical proxies used in the model selection. The results in Table 2 show that the probability of bank failure is negatively correlated with the empirical proxies for capital adequacy: the Pearson (Spearman) correlations between Bfailure and T1CRAT, T1CRev, and Leverage (tangible capital ratio) are −0.546 (−0.685), −0.503 (−0.605), and −0.587 (−0.626), respectively, and statistically significant at above the 0.1 level. These results are consistent with my expectation that the banks having enough capital are less likely to fail in the future.
The results reported in Table 2 show that the probability of bank failure is generally positively correlated with the empirical proxy for asset (loan) risk and the sign on the correlation coefficient is, in general, consistent with my expectation: the Pearson (Spearman) correlations between Bfailure and GCOOI, LossLS, GCOTL, LossRes, LQ, and LoanRet are 0.498 (0.526), 0.544 (0.621), 0.498 (0.524), −0.053 (−0.354), 0.100 (0.041), and 0.115 (0.140), respectively, except the Pearson correlation between Bfailure and LossRes and Spearman correlation between Bfailure and LQ.
The results in Table 2 also show that the probability of bank failure is negatively correlated with the empirical proxies for efficiency and profitability, which is consistent with my expectation that the banks making profit more efficient are less likely to fall in the future: the Pearson (Spearman) correlations between Bfailure and IntMag, PM, ROA, and OIOE are −0.379 (−0.379), −0.612 (−0.725), −0.660 (−0.727), and −0.485 (−0.474), respectively, and statistically significant at above the 0.1 level.
The results in Table 2 suggest that the probability of bank failure is higher for banks having higher liquidity risk, which is consistent with my expectation: the Pearson (Spearman) correlations between Bfailure and TLoanAT, DepositAT, Leverage, and CashAT are −0.002 (0.038), 0.119 (0.202), −0.587 (−0.626), and 0.141 (0.070), respectively. Because Leverage in this paper in fact is a proxy for tangible capital adequacy, therefore, Leverage could be used as a proxy for liquidity too (e.g., a higher percentage of tangible assets among the total assets, the less liquid the bank is). The Pearson (Spearman) correlation coefficients between Bfailure and DepositAT and Leverage are statistically significant at above the 0.1 level, while the other two pairs of correlations have mixed results.
In summary, in Table 2, the correlation between Bfailure (the indicator variable of bank failure) and the investigated predictive variables is generally consistent with my expectation that the banks having more adequate capital and lower liquidity risk, making more profit and making profit more efficiently, are less likely to fail in the future when holding the other factors constant.
The next section will further discuss which variable should be included in the regression model using graphic, logistic regression, and stepwise statistical techniques.

5. Graphical Analysis of Data and Model Selection

In this section, I conduct a graphical analysis of the variables proposed to be used to predict bank failure. I run the logistic regression of Bfailure, the indicator variable for bank failure, on explanatory variables added only one at a time. Then, I save the predicted value of the dependent variable (probability of bank failure) and plot the predicted probability against the independent variable used in the regression. Finally, I employ the stepwise procedure to select the final predictive model, using the criteria of entering p-value of 0.1 and staying p-value of 0.2.

5.1. Graphic Analyses and Single Variable Regression

Figure 1a–p presents the plots of the estimated probabilities along with the observed probabilities (blue color) and 95% confidence intervals (lower/upper bound is green/black color) as functions of the each investigated continuous independent variable shown one by one in Figure 1a–p, respectively. For example, the first examined explanatory variable is the risk-weighted capital ratio measured as a ratio of tier 1 capital to risk-weighted assets (T1CRAT). The graph (i.e., Figure 1a) shows that the estimated probabilities are a logistic function of the risk-weighted capital ratio in descending order. Column 1 of Table 3 shows the regression of Bfailure on T1CRAT (risk-weighted assets). Here, I use the descending order (instead of the ascending order) in the computer programming (SAS) for the logistic regression because of the following reasons: (1) my empirical interest is Bfailure, which is coded 1 if the bank failed in the year 2009; (2) the default of the software is event = 0; and (3) therefore, I specify the event = 1 to obtain better prediction results.
The maximum likelihood estimate of coefficient on T1CRAT is −99.848, with a Wald Chi-Square 74.784, p-value < 0.0001. The global null hypothesis tests (likelihood ratio, score, and Wald tests) reject the null hypothesis that the coefficient on TICRAT is zero. These statistics indicate that T1CRAT is statistically significant and negatively associated with Bfailure. The Pseudo R-square is 0.453, and the max-rescaled R-square is 0.636 when the regression only includes one predictor (T1CRAT). The Hosmer and Lemershow goodness-of-fit statistics indicate that the model overall does not suggest that the model with one predictor fits well (Chi-square = 76.624, p-value < 0.0001).8 However, the criteria of grouping used by Hosmer and Lemershow goodness-of-fit test is subjective and the results of this test could change with the change in the grouping criteria.9 Therefore, the result of the Hosmer and Lemershow goodness-of-fit test reported in this paper should be interpreted with caution. For the goodness of fit of the model, I would think that R-square statistics is a better reference than the statistics of Hosmer and Lemershow goodness-of-fit test. Using only this variable, the model can overall correctly predict bank failure or non-failure by 87.3%. ROC curve is a plot of sensitivity versus one minus specificity (i.e., 1 − specificity) of the model.10 One minus specificity is the proportion of non-event observations that are predicted to have an event response. A very large percent estimated area (92.53%, indicated by the ‘c’ statistic [‘c’ varies from 0 to 1]) under the ROC curve indicates an adequate fit of the model. These statistics indicate that the risk-weighted capital ratio (T1CRAT) is a good predictor for bank failure. Consistent with my expectation, the more tier 1 capital a bank has, the less likely the bank will subsequently fail.
The plots of the estimated probabilities along with the observed probabilities (blue color) and 95% confidence intervals (lower/upper bound is green/black color) as functions of the continuous independent variable of T1CRev (gross revenue ratio) and Leverage (tangible capital ratio) are shown in Figure 1b and Figure 1c, respectively. The graph of the predicted probability of bank failure against the two proxies for capital adequacy—T1CRev and Leverage—shows that estimated probabilities are a logistic function of T1CRev (i.e., risk–gross revenue ratio) [Leverage—tangible capital ratio] in the descending [ascending] order. The direction of correlation between the estimated probability of Bfailure and Leverage is interesting, which indicates that the higher tangible ratio is related to a higher chance of bank failure. This graph indicates that intangible assets play an important role in the capital adequacy consideration for preventing bank failure.
Columns 2 and 3 of Table 3 report the regression of Bfailure on T1CRev and Leverage, respectively. The maximum likelihood estimate of coefficient on T1CRev [Leverage] is −4.751 [12.460], with a Wald Chi-Square 72.211 (p-value < 0.0001) [9.605 (p-value < 0.0001)]. The global null hypothesis tests (likelihood ratio, score, and Wald tests) reject the null hypothesis that the coefficient on TICRev (Leverage) is zero. These statistics indicate that T1CRev (Leverage) is statistically significant and negatively (positively) associated with Bfailure. The Pseudo R-square is 0.330 (0.029) and the max-rescaled R-square is 0.463 (0.041) when the regression only includes one predictor, T1CRev (Leverage). The Hosmer and Lemershow goodness-of-fit statistics show that the model with one predictor of T1CRev (Leverage) overall does not fit (fits) well, with Chi-square = 52.498 (0.891) and p-value < 0.0001 (>0.1). Using only T1CRev (Leverage), the model can overall correctly predict bank failure or non-failure by 84.6% (69.0%). ROC curve is a plot of model sensitivity versus one minus specificity (1 − specificity) of the model. The ROC curve for estimating the regression of bank failure on T1CRev (Leverage) indicates an inadequate (adequate) fit of the model of regressing bank failure on T1CRev (Leverage). These statistics indicate that T1CRev (gross revenue ratio) is a good predictor of bank failure, while Leverage (tangible capital ratio) is unlikely to be a good predictor of bank failure. Consistent with my expectation and the analysis for the effect of T1CRAT, the more tier 1 capital a bank has, the less likely the bank will fail, while the effect of Leverage on bank failure is inconclusive.
Then, the aforementioned analyses are applied to the other predictors of interest. The graph of 1.4–1.18 and the results reported in Columns 4–18 of Table 3 indicate that the following empirical proxies could be good predictors of bank failure:
  • Proxies for asset risk
    GCOOI: the ratio of gross charge off to net operating income;
    LossLS: the ratio loss provision to the sum of total loans and securities;
    GCOTL: the ratio of gross charge offs to total loans;
    LQ: the ratio of total interest revenue from loans to total loans;
    LoanRet: the ratio of loan revenue to total loans.
  • Proxies for efficiency of loan revenue
    IntMag: the interest margin.
  • Proxies for earnings
    PM: profit margin;
    ROA: return on asset;
    OIOE: ratio of operating income to operating expenses.
  • Proxies for liquidity risk
    DepositAt: ratio of the total deposit to total assets;
    CashAT: ratio of net liquid assets to total assets.
The graph analyses and regression results reported in Table 3 indicate that the - Size and Age of the bank are the factors that might affect bank failure and should be controlled in the regression, while the analyses do not suggest that LossRes (ratio of loan loss allowance to total charge off) and TLoanAT (total loans to total assets) are good predictors of bank failure.
In summary, I investigate all the variables proposed in Section 2. CLTL (i.e., the ratio of commercial loans to total loans) was eliminated when I computed the data descriptive statistics because of the lack of data. To investigate if the predictive power of the explanatory variables is affected by outliers, I winsorize the proposed explanatory variables and rerun regressions. The results using the winsorized data are slightly better than those using the original data. Therefore, I only report the results using the original data in the tabulated tables because the reported results are against my proposed explanatory variables, which suggests the reported results are robust. I also transform the variables (e.g., proxies for liquidity risk) that show having the logistic function with Bfailure into natural log values and rerun the regressions. The results using natural log values of these independent variables are qualitatively the same as those using the original forms. However, the coefficients on those natural log values of predictors are much smaller than those on the original forms. I report the regression results using the original data form because I would like to show the relationship between the probability of bank failure and the predictors in the original form.11

5.2. Multivariate Variable Regression

Recall that the proposed model is Equation (1). I first regress an indicator variable (coded as 1 if a bank failed in the year 2009, otherwise 0) on the set of the predictors explored in Section 5.1. Then, I use the stepwise procedure operated in SAS to select my final model.12 Stepwise is a statistics procedure to help researcher(s) choose the best prediction model with the minimum amount of explanatory variables, combining the forward- and backward-featured methods. I set up my significant level for entering a variable into the model (SLENTRY) as 0.1, and the significant level for removing a variable (SLSTAY) as 0.2, which means that a variable must have a p-value < 0.1 to be able to enter the model and a p-value < 0.2 to be able to stay in model. I do not set up the entering and staying p-values as 0.05 and 0.1 because these criteria could be too restrictive to allow only very few qualified variables in the selected model, which may lose good predictors.
Columns 1 and 2 of Table 4 report the regression results of using the full set of the proposed independent variables and of using only the independent variables selected by the stepwise procedure, respectively.
Equation (2) below is the most optimal model selected by the stepwise procedure under the criteria of entering p = 0.1 and staying p = 0.2.
Bfailuret+1 = β0 + β1T1CRATt + β2TICRevt + β3LQt + β4IntMagt + β5PMt + β6ROAt + β7OIOEt + β8TLoanATt + β9CashATt + β10Agett
The results of maximum likelihood estimates obtained from SAS are as below.
Bfailuret+1 = −35.413 − 144.036 × T1CRATt + 5.126 × TICRevt + 297.603 × LQt − 250.620 × IntMagt + 4.180 × PMt − 166.533 × ROAt + 6.399 × OIOEt + 22.597 × TLoanATt + 14.078 × CashATt + 0.024 × Aget
The logistic regression model models the log odds of a positive response (probability modeled is Bfailure = 1) as a linear combination of the predictor variables. Equation (3) can be re-written as Equation (4):
Log[p/(1 − p)] = −35.413 − 144.036 × T1CRATt + 5.126 × TICRevt + 297.603 × LQt − 250.620 × IntMagt + 4.180 × PMt −166.533 × ROAt + 6.399 × OIOEt + 22.597 × TLoanATt + 14.078 × CashATt + 0.024 × Aget
where p is the probability that the bank failed in the year 2009, while (1 − p) is the probability that the bank did not fail in the year 2009. p/(1 − p) denotes the odds ratio, and Log[p/(1 − p)] denotes the log odds (i.e., log-valued odds ratio).
For logistic regression, the parameter estimates could be interpreted as follows: for a one-unit change in the predictor variable, the difference in the log odds for a positive outcome is expected to change by the respective coefficient, given the other variables in the model are held constant.13
The results indicate that the log odds for bank failure in the year 2009 is −35.413 when all the variables in the model are equal to zero. The coefficient on the risk-weighted capital ratio (T1CRAT) is −144.036, indicating that if a bank’s T1CRATt ratio increases by one unit, the difference in log odds for bank failure is expected to decrease by 144.036, given that the other variables in the model are held constant. In other words, if with one unit increase in T1CRAT, the chance of the bank failing is e144.036 times more likely than that of the bank not failing, the change in the odds ratio, p/(1 − p), would be e144.036 with one unit increase in T1CRAT when holding the other factors constant. These results are not only statistically significant at the 0.01 level but also economically significant for the possible huge impact of tier one capital adequacy on the possibility of bank failure in the future. This result supports the theoretical reason for the important role of tier one capital adequacy for banks documented in Penman (2017).
Similarly, the coefficient on T1CRev, LQ (proxy for loan quality—loan risk), IntMag (proxy for efficiency), PM, ROA, OIOE, TLoanAT, CashAT, and Age is 5.126, 297.603, −250.620, 4.180, −166.533, 6.399, 22.597, 14.078, and 0.024, respectively, all of which are statistically significant at the 0.01 levels. The Wald test statistics for the selected model is 62.04 (p-value = 0.000), rejecting the null hypothesis that all coefficients on the independent variables equal zero. These results indicate that with one unit increase in the T1CRev, LQ (proxy for loan quality-loan risk), IntMag (proxy for efficiency), PM, ROA, OIOE, TLoanAT, CashAT, and Age, log odds for bank failure would increase by 5.126, 297.603, decrease by 250.620, increase by 4.180, decrease by 166.533, increase by 6.399, 22.597, 14.078 and 0.024, respectively, if the other variables in the model are held constant, suggesting that the impact of these selected predictors on the probability of bank failure are not only statistically significant but also economically significant. These results further indicate that the influence of T1CRAT, LQ (loan quality or loan risk), IntMag, and ROA on the probability of bank failure is much stronger than the other predictors, indicating that the capital adequacy, loan quality (or loan riskiness), earnings, and earnings efficiency are the most important factors in the selected model. Specifically, these results suggest that (i) loan quality (riskiness) impacts the chance of bank failure most; (ii) earnings efficiency is the second most influential factor in bank failure; (iii) the magnitude of the influence of profitability (ROA) and tier 1 capital adequacy on the probability of bank failure follows that of earnings efficiency; and (iv) age least impacts the probability of bank failure. The results (i)–(iii) of this paper echo the theory of Penman (2017) that reasons that integrating the earnings into computing prudential capital ratio may restrict the banks from taking on risky projects.
The above results are consistent with my expectations. The chance for a bank (having adequate capital, good earnings performance, less risk loans, and more efficient business operation) to fail is lower than a bank having inadequate capital, poor earnings performance, less efficient operation, and riskier loans.
The Pseudo R-square is 73.5%, indicating that the model well fits data. Furthermore, in the lack of a fitness test, the Hosmer and Lemershow goodness-of-fit test statistics (Chi-square = 2.930, p-value = 0.403, when grouped the observations into five groups) show that the selected model fits well. The classification of the model selected by the stepwise procedure shows that the model can predict a bank failure 93.1% correctly, not-failed bank 95.69% correctly, and 94.88% overall correctly. An estimated area (98.05%) under the ROC curve indicates the model almost perfectly discriminates the response (bank failure).
In Section 4, I discussed that some predictors might be highly correlated indicated by Table 2. Because the logistic regression in the SAS program does not directly test multicollinearity, I conducted the multicollinearity test in the linear regression as suggested by Allison (2012). The untabulated TOL and VIF statistics do not show any serious multicollinearity issue(s) in the model when 0.1 is set as the tolerance cutting point.
Because logarithmic transformation could effectively reduce the skewness of the variable and possibly increase the linear relationship between the dependent variable and independent variables, I use the natural logarithmized value of the sum of one and the explanatory variable (1 + ith independent variable, which is previously defined) applied to the stepwise procedure to rerun the logistic regression. Table 5 reports that the stepwise procedure selects the following Equation (5).
Log[p/(1 − p)] = −35.699 − 9.725 × LogT1CRATt + 45.621 × LogLQt + 135.379 × LogLoanRett − 129.491 × LogIntMagt − 211.697 × LogROAt + 14.090 × LogOIOEt
where p is defined as same as in Equation (1): the probability that the bank failed in the year 2009 is 1, otherwise 0.
The regression results of using natural logarithmized independent variables reported in Table 5 show that (1) the natural log valued ROA, IntMag, and LoanRet have a larger impact on the probability of bank failure than the log valued of the other predictors if holding other factors constant; (2) the coefficient of natural log valued LQ is not statistically significant at the 0.05 level although it is selected by the stepwise procedure and has been able to stay in the final model. The variables of IntMag and OIOE, the other two predictive variables proposed by this study, are selected by the stepwise procedure and remain statistically significant with the presence of the other selected predictors. These results suggest that capital adequacy, loan risk, earnings efficiency, and profitability of the bank are the good predictors of Bfailure (i.e., bank failure). The model fitting statistics of natural log valued regression are slightly better than those reported in Table 3 using the not logarithmized variables: (1) the Pseudo R-square is 76.4% (73.5%); (2) correctly classify 94.32% (93.1%) of the failed and 96.37% (95.69%) of the not-failed bank, and 95.83% (94.88%) overall; (3) estimated area under ROC curve is 98.3% (98.05%). These statistics indicate that the results of using natural log valued predictors are qualitatively similar to those not using logarithmized predictors, but with a slightly higher predictive power. Although regression using logarithmized variables has slightly better goodness-of-fit statistics than those not using logarithmized predictors, the final sample size of log valued model is smaller than that of not using logarithmized predictors and the logarithmized variables may change their original statistical inference for the regression results after many steps of mathematical computation. To balance the pros and cons of the logarithmized value model (Equation (5)) and not logarithmized model (Equation (3)), I decide to use Equation (3) as the final model to predict bank failure in the year 2009.

6. Sensitivity Tests

6.1. Alternative Measurements

To assess whether the reported results are robust to alternative measurements of explanatory variables, the following different measurements are adopted to proxy for capital adequacy and profitability. I measure the capital adequacy ratio as the ratio of the sum of tier one and tier two capital to risk-weighted assets instead of the capital adequacy ratio of tier one capital to risk-weighted assets, which is used in Table 1, Table 2, Table 3, Table 4 and Table 5. Because economic environments change with time, to incorporate the historical information on the efficiency of using assets of a bank, I use over 20-(and 10-)year - data to reconstruct ROA (return to assets) assimilating the dynamic economic information of the profitability of the bank. Furthermore, profit margin is measured as the ratio of net income divided by total revenues instead of net income before tax divided by total revenues, which was employed for the analyses reported in Table 1, Table 2, Table 3, Table 4 and Table 5. The empirical results of using these alternative measures are qualitatively the same as those reported in Table 3, Table 4 and Table 5.

6.2. Sample Selection and Omitted Variables

The models (selected by the stepwise procedure to predict bank failure) could be sensitive to the sample selection criteria and may be subject to the possible omitted variable issues. Therefore, to mitigate the concern of possible sample selection bias and omitted variables, I match the not-failed banks with the failed banks by geography, size, ages of the banks, and the ratio of total loans to total assets.
The sample selection procedure may restrict the control sample into certain groups according to the characteristics of the not-failed banks. However, the goal of imposing such restrictions on selecting the control group selection is to isolate the effect of the proposed new proxies and some existing proxies for the various risks on the chance of bank failure. The sample selection procedure of the current study is similar to the difference-in-difference that examines the effect of a treatment on the treated group and control group and the two groups are assumed to have the same characteristics before the treatment and test the differences in the outcomes between the control and treated group after applying the treatment. The sample selection criteria seemingly may cause selection bias that the control group (not-failed banks) was matched with the treated group (failed banks) by the geography, size, age, and ratio of total loans to total assets (i.e., a measure of liquidity risk for the bank). However, just because of such restricted match criteria that control for the characteristics common for both groups (i.e., failed and not-failed banks), the predictive power of the existing explanatory variables and those newly proposed by this study could be tested and isolated from the common characteristics of the banks without worrying too much about the impact of missed macro-/microeconomic variables. Therefore, such sample selection criteria fit my research purpose.

6.3. Other Regression Method

To test whether the proposed regression model is sensitive to different regression methods, I re-run the model using the Probit regression instead of the logistic regression used for the results reported in Table 3, Table 4 and Table 5 and Figure 1 and Figure 2. The results obtained from the Probit regression are qualitatively similar to those obtained from the logistic regression. The magnitudes of coefficients of the predictive variables and Pseudo R-Square of the Probit regression are smaller than those obtained from the logistic regression.

6.4. Out-of-Sample Test

To validate if the proposed model could work for datasets outside of the sample, which was used to analyze the results reported in Table 1, Table 2, Table 3, Table 4 and Table 5, employing the 137 banks that failed in the year of 2010 (great recession period) as the extended sample to conduct out-of-estimate-sample prediction analyses, I test and find that the model proposed in Equation (3) of Section 5.2 could correctly predict, on average, 84.22% of the banks failed in 2010. Therefore, the proposed model could predict bank failure outside of the estimated sample.

7. Discussion

7.1. Further Discussion on Stepwise Procedure

On the one hand, using the stepwise procedure to select the predictors of the model may provide the following strengths: (i) the computer algorithm could select the combination of the variables based on the objective criteria; (ii) the stepwise selection method may facilitate to uncover new relationships among variables; and (iii) may avoid irrelevant or unessential variables that do not well explain the dependent variable.
On the other hand, employing the stepwise method may have the following, not exhausting, weaknesses: (i) the variables selected by the stepwise method might be unreliable or inconsistent depending on the data and imposed criteria; (ii) the stepwise method might overstate the significance of the explanatory variables and disregard the effects of interactions or higher-order terms; (iii) stepwise might be sensitive to the sample size, correlation among variables, and the significance level; (iv) the stepwise method might violate the assumptions of the regression; and (v) the final model selected by the stepwise procedure might not be consistent with the theoretical explanations (e.g., Altman and Andersen 1989; Copas 1983; Derksen and Keselman 1992; Hurvich and Tsai 1990; Mantel 1970; Roecker 1991; Tibshirani 1996).
Fortunately, this study employs logistic regression to conduct statistical analyses, which could minimize the aforementioned disadvantages of using the stepwise method. To alleviate the concerns of disadvantages of (i), (ii), and (iii), using a good-sized sample (over 30 observations), I compare the results of the estimated coefficients and statistics of good fitness between using stepwise and those of the logistic regressions without using the stepwise method which are reported in Table 4. The statistics of the two methods are qualitatively similar and the results of not using the stepwise are more statistically significant than those of without using the stepwise method, while the number of explanatory variables included in the final model selected by the stepwise is fewer than that of not using the stepwise method. Moreover, the VIF test does not show any serious concern of multicollinearity among the explanatory variables. To minimize the concern of disadvantage (iv), I (check) do not find any evidence (on whether) that the final model selected by the stepwise method violates any assumptions of logistic regression. Furthermore, the explanatory variables selected by the stepwise method are among those that were constructed based on the theoretical reasonings described in Section 2.2, which could minimize the concern for the (v) disadvantage of using the stepwise method.

7.2. Possible Caveat and Suggestions for Future Research

I do not use early bank failure data to test those predictive variables because (1) the goal of the current study is to test whether some variables suggested by previous studies still work well to predict bank failure using the data in subsequent periods and whether new variables proposed by this paper have any predictive ability; and (2) there is the existing literature on predicting bank failure using banks that failed before and in the 2008 financial crisis and this paper adds new empirical evidence to the literature by using data in the post-2008 financial crisis.
I am aware that economic environments may change with different time regimes and, in the case of using datasets other than the estimated sample, the predictive power of the explanatory variables proposed in the selected model in this study might not be the same as that shown in this study, although I would expect that those proxies might still well predict bank failure.
This study focuses on if using some simple ratios and logistical regression could predict bank failure in and subsequent to the 2008 financial crisis and on proposing and validating new empirical proxies for capital adequacy (Leverage), loan risk (LQ), earnings efficiency (IntMag), and loan profitability (OIOE). Although I match not-failed banks with failed banks with geography, size, and the ratio of total loans to total assets to reduce statistical estimate errors (i.e., type I and II error) and improve regression results, I would expect that there might be more predictors that could be explored by future research.
Future research may build upon this study to explore more predictive indicators for the bank’s failure such as macroeconomic variables that could capture capital market conditions and qualitative factors such as the human capital of the bank, managerial ability of the leader team of the bank, etc. Moreover, future research could also explore non-linear models or use machine learning to enrich this line of literature.

7.3. Possible Applications of the Study

Regulators could use the results of this study to prevent future bank failure by requiring the banks to meet the threshed values of those ratios. For example, the mean value of loan quality (LQ) for the not-failed [failed] bank group is 0.078 [0.081]. This result indicates that the larger the LQ ratio is, the higher chance the bank will subsequently fail. Therefore, in addition to the currently implemented regulations requiring banks to meet a certain percentage of tier one capital, regulators may set a warning threshold for banks to keep the LQ, say, below 0.08. O.W., the bank might be subject to being disciplined by the regulators. Alternatively, regulators might mandate banks to keep LQ low to ensure banks do not too aggressively make revenues from risky lending loans Similarly, investors could set up a threshold for a ratio proposed by this study or used in the previous studies when considering if a bank is financially healthy and not subject to failure to make the investment decision on the bank or/and deposit money into the bank.

7.4. Illustration

While it might be complicated to predict bank failure, in this subsection, I employ Silicon Valley Bank (SVB) as an example to illustrate how the results of this study could be used in a comprehensive way, but not difficult, by utilizing publicly available free data (not WRDS).
I obtain the call report of SVB from the publicly available website and free of any financial charge to people (world-wide) and could access the website at the National Information Center (Federal Financial Institutions Examination Council—FFIEC) website14 and annual report of SVB from the filings at the website of the U.S. Securities and Exchange Commission.15 The annual report of SVB is a good supplement to the call report in this illustration.
In Section 5.2, I analyze and decide to use the stepwise selected model (i.e., Equations (3) and (4)) as the final model to predict bank failure in 2009. I use the Call Report and annual report data of SVB for the year 2022 to compute the ratios16 used in Equation (4) and document the estimate that the chance of failing in 2023 is 0.0002, indicating that the chance of SBV failure in the subsequent year of 2022 (i.e., 2023) is very small.
I notice that the CashAT ratio (computed as the ratio of total amount of cash in deposit to total assets) for SVB is 0.06, indicating the total assets of SBV, on average, constitute 6% of the cash deposit of the bank’s customers for the year 2022. This study defines CashAT as the ratio of the sum of Cash and Assets Held in Trading Accounts to total assets, which includes more accounts than the CashAT computed for SVB for the year 2022. In Section 2.2 (page 11), I argue that the larger the CashAT ratio, the higher the risk of bank failure. Moreover, Table 1 reports that the mean value of CashAT for the (not-failed) failed bank group in the year 2009 is (0.039) 0.52. The CashAT ratio for SVB is even higher than 0.52 (the mean value of CashAT for the failed bank group in 2009), indicating that SVB has a higher probability to fail in the subsequent year of 2022 (i.e., 2023) than, on average, that of the bank failed in 2009, if holding the other factors constant.
The results of the estimated Equation (4) and CashAT ratio for the year 2022 suggest that SBV is unlikely to fail in the subsequent year 2023 unless there is any extraordinary event that could trigger a bank run such as a massive withdrawing deposit could cause SVB to fail in the year 2023. Very unfortunately, in March 2023, after interest rate hikes during the 2021–2022 inflation spike, SVB had a “bank run” on its deposit, which resulted in SVB being closed on 10 March 2023.17

8. Conclusions

In this paper, I try to predict bank failure in the year 2009, one year subsequent to the financial crisis in the year 2008, using some variables documented in the previous literature and four new predictive variables constructed by this study. These variables are proxied for the risk of capital adequacy, assets (loans) quality, earnings efficiency, cost efficiency, and liquidity of the bank. I match the subsample of the not-failed banks with the failed banks by geography, size, age, and the ratio of total loans to total assets within the interval of one standard deviation away (i.e., plus and minus) from the mean value of each variable of failed banks located in the same state.
The results of the marginal effect tests in the logistic regression show that the proxies of capital adequacy, loan risk (quality), earnings efficiency, and profitability affect the probability of bank failure most in the selected sample. Using the natural log valued predictors, the model selected by the stepwise procedure could correctly classify 94.32% (96.37%) of the failed (not-failed) banks in the sample. The results are robust to a battery of sensitivity tests. The out-of-sample test shows that the proposed model could correctly predict, on average, 84.22% of banks that failed in 2010.
In sum, this study examines and finds that some previously documented factors and three novel (constructed by this study) predictive variables (LQ—loan risk, IntMag—earnings efficiency, and OIOE—profitability) could effectively predict bank failure in and after the year 2009. Therefore, this study contributes to the literature and practitioners by offering three new predictors, which are useful and have strong predictive power for bank failure, in addition to confirming that some early documented predictors could still predict later bank failure and illustrating that logistic regression, a simple regression method, still works well to forecast bank failure among the many modern advanced and complicated prediction methods.

Funding

This research received no external funding.

Data Availability Statement

The source of data supporting reported results are shown in Section 3.1 and Section 7.4.

Acknowledgments

I thank two anonymous reviewers for their helpful comments. I gratefully thank Eli Bartov, Cheng-Few Lee, Mostafa Maksy, Jahyun Goo, and the audiences of Florida Atlantic University Seminar, 2014 AAA-Atlantic Regional Meeting, and 28th PBFEAM for their helpful comments on the early version of this manuscript. I also cordially appreciate Mike Gengrinovich’s valuable input to the early version of this manuscript. I am also grateful for the generous financial support from University of Illinois Chicago (and South Dakota State University) when I worked on the earlier (and current) draft of this manuscript.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Bank Data Description

The Bank Regulatory Database: It contains five databases for regulated depository financial institutions. These databases provide accounting data for bank holding companies, commercial banks, savings banks, and savings and loan institutions. The source of the data comes from the required regulatory forms filed for supervising purposes.
The Commercial Bank Database: It is from the Federal Reserve Bank of Chicago (FRB Chicago), and contains data of all banks filing the Report of Condition and Income (named “Call Report”) that are regulated by the Federal Reserve System, Federal Deposit Insurance Corporation (FDIC), and the Comptroller of the Currency. These reports include balance sheets, income statements, risk-based capital measures, and off-balance sheet data. The Commercial Bank Database has data available quarterly from 1976 to 2021. It includes basically commercial banks and savings banks. It does not have data from savings institutions (e.g., S&L associations) that file the Thrift Financial Report (TFR) with the Office of Thrift Supervision (OTS). Effective with data of the second quarter of 2021, structure data including bank attributes could be found at the National Information Center (Federal Financial Institutions Examination Council—FFIEC) website (accessed on 20 October 2024 via: https://www.ffiec.gov/npw/FinancialReport/DataDownload). Quarterly commercial bank financial data could be obtained via FFIEC Central Data Repository’s Public Data Distribution (accessed on 20 October 2024 via: https://cdr.ffiec.gov/public/ManageFacsimiles.aspx).
SEC Edgar website: an official website managed by the U.S. Securities and Exchange Commission (SEC), at which the companies (including banks) publicly listed in the U.S. file the financial reports (e.g., 10K, 8K, …) to SEC and people could access the filed reports for free.

Appendix B. Variable Definitions

The following variables omit time (t) and firm (j) subscript.
Bfailure = 1 if the bank fails in the subsequent year of t + 1 (year 2009), 0 otherwise.
p/(1 − p): Denotes the odds ratio. p is the probability that the bank failed in the year 2009, while (1 − p) is the probability that the bank did not fail in the year 2009.
Investigated variable group 1—Capital adequacy measurements:
CLTL = Commercial loans/total loans or basic capital/asset.
T1CRAT = Tier 1 capital/risk-weighted assets.
T1CRev = Tier 1 capital/total interest and non-interest income (before the deduction of any expense).
Leverage = Common share equity net of intangible assets/the total assets net of intangibles.
Investigated variable group 2—Asset (Loan) risk or Asset (loan) quality measurements
GCOOI = Gross Charge off/net operating income
LossLS = Loss provision/the sum of total loans and securities
GCOTL = Gross charge offs/total loans
LossRes = Ratio of loan loss allowance/total charge off
LQ = Total interest revenue from loans/total loans
LoanRet = Loan revenue/total loans
Investigated variable group 3—Efficiency (Pricing) measurement:
IntMag = (The difference in total interest income and total interest expenses)/the total liabilities
Investigated variable group 4—Earnings measurement:
PM = Net income/total revenues (net income/[non-interest income + interest income +income from trading assets + income from federal funds sold])
ROA = The net income/the lag of total asset (NIt/ATt−1)
OIOE = Operating income/operating expenses
Investigated variable group 5—Liquidity risk measurement:
TLoanAT = Total loans/total assets
DepositAt = Total deposit/total assets
CashAT = Net liquid Assets/Total assets: (Cash + Assets Held in Trading Accounts)/total assets
Control variables:
Size = Natural log value of total assets
Age = Current year (2009) − the year of starting business of the bank
Independent variable used in Table 5:
LogT1CRAT = log(1 + T1CRAT)
LogLQ = log(1 + LQ)
LogLoanRet = log(1 + LoanRet)
LogIntMag = log(1 + IntMag)
LogROA = log(1 + ROA)
LogOIOE = log(1 + OIOE)

Notes

1
Interested readers can refer to the details of the report of GAO in the following link (latest access on 17 November 2024): https://www.huffpost.com/entry/financial-crisis-cost-gao_n_2687553.
2
For example, in the stepwise-selected regression of regressing the binomial variable, Bfailure (indicating bank failure or not failure) on ROA and with other independent variables, the coefficient on the ROA is −166.533, indicating that, with one unit increase in ROA, the odds ratio (i.e., the ratio of the probability of bank failed to the probability of the bank not failed) would decrease by e166.533, holding other factors constant. This result is not only consistent with the previous studies that profit is a very important indicator showing whether the bank has a good chance to survive in the business but also supports the theory of Penman (2017), which reasons integrating earnings into prudential capital calculation may disincentivize banks to take risky projects.
3
I use SAS (version of 9.3 and 9.4) to conduct initial analyses and STATA (version of SE 16) to double-check the regression results reported in the tables. The two statistical software achieve qualitatively similar statistical results.
4
Cash used in this ratio represents the cash and balance due from depository institutions, including “Total of ‘Cash Items in Process of Collection and Unposted Debits (0020)’, ‘Balances Due from Banks in Foreign Countries and Foreign Central Banks (0070)’, ‘Currency and Coin (0080)’, ‘Balances Due from Depository Institutions in the U.S. (0082)’ and ‘Balances Due from Federal Reserve Banks (0090)’”, and “certificates of deposit held in trading accounts”. If a bank possesses a large amount of cash from the deposit of Depository Institutions (or its customer), it may have a higher risk of “bank run” when the customers demand to cash their deposit for whatever reasons such as the sentiment of being afraid of not getting their deposit back and therefore rushing for withdrawing cash, thus may lead to “bank run” and ultimately fail after massive withdrawing deposit. Please refer to a case study (Silicon Valley Bank) in Section 7.
5
Wheelock and Wilson (2000) show that size is associated with Bfailure.
6
Latest access on 18 October 2024 at: http://www.fdic.gov/bank/individual/failed/index.html.
7
Outputs of univariate procedures (of using the SAS software) are available upon request. This procedure could help identify the first five observations with the lowest and highest values in the sample. Although I could apply the law of large numbers to the final sample, which has more than 30 observations, fortunately, logistic regression used in this study does not need to have any special assumptions requiring data to meet (Dielman 2005). Therefore, Logistic regression is a good analysis method suitable for this study.
8
The Hosmer and Lemershow goodness-of-fit is used to test if the observed outcome is consistent with the expected outcome across different groups that could be divided by using different criteria. The null hypothesis for the Hosmer and Lemershow goodness-of-fit test is that the observed outcome is the same across different groups. Therefore, the smaller value of the test statistics indicates a better fit (Hosmer et al. 2013).
9
10
Sensitivity represents the probability that the model predicts a positive outcome for the observation which is in fact a positive outcome. Specificity represents the probability that the model predicts a negative outcome for the observation which is indeed a negative outcome. Therefore, the model with 100% sensitivity and 100% specificity is the ideal predictive model. However, in the real world, it is extremely rare for a predictive model. Therefore, the larger percentage of sensitivity and smaller percentage of one minus specificity (i.e., 1 − specificity) indicate a better predictive model. To visualize the sensitivity and specificity, ROC is created. The area under the ROC curve, ranging from 0 to 1, indicates how accurately the model classifies the outcomes. Therefore, the large area under the ROC curve indicates a more accurate classification of the model.
11
In addition to using the natural log valued independent variables to investigate the possible outlier effect of the independent variable on the outcome variable, even though the logistic regression does not have its own set of assumptions (Dielman 2005), I still check the residual plots to identify outlier effects. The untabulated figures show that only the variable Leverage (tangible capital ratio) has residuals outside of two standardized distances in the regression reported in Column (3) of Table 3. However, no robust test is needed to address the possible outlier effect of Leverage on the regression results because Leverage is not selected in the final model. Other predictors do not have residuals outside of two standard deviations. Other assumptions are considered as well. Moreover, the law of large numbers is applied to this sample; therefore, the normality of the sample is not a concern.
12
The final model selected by the stepwise procedure using SAS (version of 9.3 and 9.4) is qualitatively as same as the one selected by STATA (SE 16).
13
Latest access on 17 November 2024 at: https://stats.oarc.ucla.edu/sas/dae/logit-regression/.
14
Latest access on 17 November 2024 at: https://cdr.ffiec.gov/public/ManageFacsimiles.aspx.
15
16
I could not obtain all the variables from the Call Report of SVB for the years 2021 and 2022 that I need to compute the ratios used in Equation (4); therefore, I use the annual report of SVB for the years 2021 and 2022 to replace the data in the Call Repor that I was unable to identify.
17
Latest access on 17 November 2024 at: https://en.wikipedia.org/wiki/Silicon_Valley_Bank.

References

  1. Allison, Paul D. 2012. Logistic Regression Using SAS: Theory and Application, 2nd ed. Cary: SAS Publishing, pp. 60–62. [Google Scholar]
  2. Altman, Douglas G., and Per Kragh Andersen. 1989. Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine 8: 771–83. [Google Scholar] [CrossRef]
  3. Altman, Edward I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance 23: 589–609. [Google Scholar] [CrossRef]
  4. Altman, Eward I., Robert G. Haldeman, and P. Narayanan. 1977. ZETA ANALYSIS, a new model to identify bankruptcy risk of corporations. Journal of Banking and Finance 1: 29–54. [Google Scholar] [CrossRef]
  5. Cole, Rebel A., and Lawrence J. White. 2012. Déjà vu all over again: The causes of US commercial bank failures this time around. Journal of Financial Services Research 42: 5–29. [Google Scholar] [CrossRef]
  6. Copas, John B. 1983. Regression, prediction and shrinkage (with discussion). Journal of the Royal Statistical Society, Series B 45: 311–54. [Google Scholar] [CrossRef]
  7. Davis, E. Philip, and Dilruba Karim. 2008. Comparing early warning systems for banking crises. Journal of Financial Stability 4: 89–120. [Google Scholar] [CrossRef]
  8. Demyanyk, Yuliya, and Iftekhar Hasan. 2010. Financial crisis and bank failures: A review of prediction methods. Omega 38: 315–24. [Google Scholar] [CrossRef]
  9. Derksen, Shelley, and Harvey Jay Keselman. 1992. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology 45: 265–82. [Google Scholar] [CrossRef]
  10. DeYoung, Robert, and Gokhan Torna. 2013. Nontraditional banking activities and bank failures during the financial crisis. Journal of Financial Intermediation 22: 397–421. [Google Scholar] [CrossRef]
  11. Dielman, Terry E. 2005. Applied Regression Analysis, 2nd ed. London: South-Western, p. 383. [Google Scholar]
  12. Dietrich, J. Richard, and Robert S. Kaplan. 1982. Empirical analysis of the loan classification decision. The Accounting Review 57: 18–38. [Google Scholar]
  13. Haslem, John A., Carl A. Scheraga, and James P. Bedingfield. 1982. An analysis of the foreign and domestic balance sheet strategies of the US banks and their association to profitability performance. Management International Review 32: 55–75. [Google Scholar]
  14. Hosmer, David W., Stanley A. Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression, 3rd ed. Hoboken: Wiley. [Google Scholar]
  15. Hurvich, Clifford M., and Chih-Ling Tsai. 1990. The impact of model selection on inference in linear regression. American Statistician 44: 214–17. [Google Scholar] [CrossRef]
  16. Karels, Gordon V., and Arun J. Prakash. 1987. Multivariate Normality and forecasting of business bankruptcy. Journal of Business Finance Accounting 14: 573–93. [Google Scholar] [CrossRef]
  17. Kumar, Ravi P., and Vadlamani Ravi. 2007. Bankruptcy prediction in banks and firms via statistical and intelligent techniques—A review. European Journal of Operational Research 180: 1–28. [Google Scholar] [CrossRef]
  18. Mantel, Nathan. 1970. Why stepdown procedures in variable selection. Technometrics 12: 621–25. [Google Scholar] [CrossRef]
  19. Martin, Daniel. 1977. Early warning of bank failure. Journal of Banking and Finance 1: 249–76. [Google Scholar] [CrossRef]
  20. Meyer, Paul A., and Howard W. Pifer. 1970. Prediction of bank failures. The Journal of Finance 25: 853–68. [Google Scholar] [CrossRef]
  21. Ohlson, James A. 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research 18: 109–31. [Google Scholar] [CrossRef]
  22. Penman, Stephen. 2017. Prudent Capital and Prudent Accounting for Banks. Columbia University Working Paper. New York: Columbia University. [Google Scholar]
  23. Roecker, Ellen B. 1991. Prediction error and its estimation for subset—Selected models. Technometrics 33: 459–68. [Google Scholar] [CrossRef]
  24. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58: 267–88. [Google Scholar] [CrossRef]
  25. West, Robert C. 1985. A factor analytic approach to bank condition. Journal of Banking and Finance 9: 253–66. [Google Scholar] [CrossRef]
  26. Wheelock, David C., and Paul W. Wilson. 2000. Why do banks disappear? The determinants of U.S. bank failures and acquisitions. The Review of Economics and Statistics 82: 127–38. [Google Scholar] [CrossRef]
  27. Wooldridge, Jeffrey M. 2009. Introductory Econometrics, A Modern Approach, 4th ed. London: South-Western, chap. 17. [Google Scholar]
Figure 1. Plots of the estimated probabilities along with the observed probabilities (blue color) and 95% confidence intervals (lower/upper bound is green/black color) as functions of each investigated explanatory variable. (ar) shows the estimated probability of bank failure as a function of T1CRAT (i.e., ratio of tier 1 capital to risk-weighted assets), T1CRev (ratio of tier 1 capital to total interest and noninterest income), Leverage, GCOOI (ratio of gross charge off to net operating income), LossLS (ratio of loss provision to the sum of total loans and securities), GCOOI (ratio of Gross Charge off to net operating income), LossLS (ratio of Loss provision to the sum of total loans and securities), GCOTL (ratio of gross charge offs to total loans), LossRes (ratio of loan loss allowance to total charge off), LQ (ratio of total interest revenue from loans to total loans), LoanRet (ratio of loan revenue to total loans), IntMag (ratio of net interest profit to total liabilities), PM (ratio of net income to total revenues), ROA (ratio of net income to lag of total assets), OIOE (ratio of operating income to operating expenses), TloanAT (ratio of total loans to total assets), CashAT (ratio of net liquid assets to total assets), DepositAT (ratio of total deposit to total assets), size, and Age, respectively.
Figure 1. Plots of the estimated probabilities along with the observed probabilities (blue color) and 95% confidence intervals (lower/upper bound is green/black color) as functions of each investigated explanatory variable. (ar) shows the estimated probability of bank failure as a function of T1CRAT (i.e., ratio of tier 1 capital to risk-weighted assets), T1CRev (ratio of tier 1 capital to total interest and noninterest income), Leverage, GCOOI (ratio of gross charge off to net operating income), LossLS (ratio of loss provision to the sum of total loans and securities), GCOOI (ratio of Gross Charge off to net operating income), LossLS (ratio of Loss provision to the sum of total loans and securities), GCOTL (ratio of gross charge offs to total loans), LossRes (ratio of loan loss allowance to total charge off), LQ (ratio of total interest revenue from loans to total loans), LoanRet (ratio of loan revenue to total loans), IntMag (ratio of net interest profit to total liabilities), PM (ratio of net income to total revenues), ROA (ratio of net income to lag of total assets), OIOE (ratio of operating income to operating expenses), TloanAT (ratio of total loans to total assets), CashAT (ratio of net liquid assets to total assets), DepositAT (ratio of total deposit to total assets), size, and Age, respectively.
Jrfm 17 00522 g001aJrfm 17 00522 g001b
Figure 2. ROC curve of logistic regression of bank failure (Bfailure) on each of tested explanatory variables.
Figure 2. ROC curve of logistic regression of bank failure (Bfailure) on each of tested explanatory variables.
Jrfm 17 00522 g002aJrfm 17 00522 g002bJrfm 17 00522 g002cJrfm 17 00522 g002d
Table 1. Sample descriptive statistics.
Table 1. Sample descriptive statistics.
Panel A: Not-failed bank statistics
VariableMeanMedianStd DevQ1Q31st Pctl99th PctlMinMax
T1CRAT0.119 ***0.108 ***0.0400.0980.1240.0710.2950.0300.349
T1CRev1.454 ***1.340 ***0.4501.2071.5310.5963.2080.3493.715
Leverage0.092 ***0.085 ***0.0240.0780.1020.0570.1800.0500.222
GCOOI0.089 ***0.045 ***0.1300.0160.1090.0000.5070.0001.190
LossLS0.009 ***0.005 ***0.0150.0020.0110.0000.059−0.0050.187
GCOTL0.008 ***0.004 ***0.0110.0010.0090.0000.0460.0000.108
LossRes17.9192.987 ***103.7451.5718.3930.403218.7300.3261525.000
LQ0.078 *0.0770.0130.0720.0830.0500.1070.0310.186
LoanRet0.065 **0.065 ***0.0100.0610.0690.0350.0820.0240.175
IntMag0.037 ***0.037 ***0.0100.0320.0420.0180.0570.0140.130
PM0.028 ***0.081 ***0.2920.0110.146−1.3900.352−2.9850.399
ROA0.003 ***0.006 ***0.0170.0010.010−0.0800.025−0.1550.035
OIOE1.240 ***1.235 ***0.1931.1361.3420.6771.7130.2771.745
TLoanAT0.7340.7420.0850.6850.8050.5120.8730.4950.897
CashAT0.039 ***0.0290.0370.0200.0460.0030.1890.0010.368
DepositAT0.820 **0.826 ***0.0620.7820.8660.6560.9330.6260.944
Size12.509 *12.3810.69412.03912.90711.42914.55211.39314.710
Age48.027 ***40.000 ***34.98517.00080.0003.000115.0003.000118.000
AT360.857 ***238.198365.924169.224403.26891.9512088.84088.6742446.600
Tloans270.081 ***168.154296.534119.501288.60367.5751720.07061.1992085.690
Panel B: Failed bank sample statistics
VariableMeanMedianStd DevQ1Q31st Pctl99th PctlMinMax
T1CRAT0.0650.0680.0360.0470.088−0.0660.121−0.1650.176
T1CRev0.8640.9210.5170.6281.121−1.0091.940−2.0793.004
Leverage0.0530.0540.0280.0380.068−0.0150.120−0.0460.134
GCOOI0.3750.2870.3660.1260.5140.0071.9810.0001.992
LossLS0.0400.0320.0320.0180.0550.0010.1270.0010.234
GCOTL0.0320.0240.0300.0110.0440.0010.1260.0000.206
LossRes7.6521.23952.3290.7042.6790.22731.8880.135571.625
LQ0.0810.0770.0180.0700.0870.0580.1510.0530.176
LoanRet0.0680.0670.0090.0620.0720.0460.0890.0360.094
IntMag0.0290.0280.0110.0230.0370.0060.050−0.0030.051
PM−0.811−0.6250.790−0.955−0.361−3.4010.052−5.1840.071
roa−0.048−0.0410.040−0.060−0.026−0.1810.003−0.2730.006
OIOE0.9690.9720.2910.8001.1680.2691.7170.2361.811
TLoanAT0.7340.7580.1210.6640.8290.3920.9360.3100.944
CashAT0.0520.0350.0530.0170.0780.0030.2570.0010.292
DepositAT0.8390.8590.0970.7950.9050.4540.9670.4090.987
Size12.70712.4071.33911.73013.41510.33416.4169.79417.060
Age37.29422.00040.6608.00047.0002.000141.0001.000151.000
AT1039.460244.4112846.760124.274669.83030.77213,476.10017.92225,638.730
Tloans732.192184.1801861.15093.392509.17621.2458670.69015.06316,500.310
Panels A and B report the descriptive statistics of variables of interest and the proxies for different dimensions of the risk of bank failure variables used in regression analyses for the control sample (not-failed banks) and test sample (failed bank), respectively. The sample statistics are based on the test sample of 119 distinct banks that failed in the year 2009 and the control sample of 258 banks that survived in 2009. The two samples include 377 banks in total as the full sample. For brevity, I omit subscript t presenting for the year and j presenting for bank j. *, **, and *** indicate statistical significance in the difference in means (medians) between the not-failed bank sample and failed bank sample, at the 10%, 5%, and 1% levels, respectively, based on two-tailed t-statistics (nonparametric Wilcoxon test). See Appendix B for variable definitions.
Table 2. Pearson (above) and Spearman (below) correlation table for all variables used in model selection.
Table 2. Pearson (above) and Spearman (below) correlation table for all variables used in model selection.
VariableBfailureT1CRATT1CRevLeverageGCOOILossLSGCOTLLossResLQLoanRetIntMagPMROAOIOETLoanATCashATDepositATSizeAge
Bfailure −0.546−0.5030.1700.4980.5440.498−0.0530.1000.115−0.379−0.612−0.660−0.485−0.0020.1410.1190.097−0.134
T1CRAT−0.685 0.839−0.094−0.400−0.415−0.368−0.0400.073−0.0170.3300.4730.4780.341−0.1660.070−0.206−0.0800.229
T1CRev−0.6050.795 −0.097−0.367−0.407−0.3780.115−0.236−0.2660.1100.3920.3980.222−0.0330.005−0.214−0.0360.097
Leverage0.141−0.156−0.026 0.0920.1490.093−0.223−0.0940.041−0.136−0.247−0.305−0.2770.2030.0320.272−0.107−0.335
GCOOI0.526−0.489−0.3680.152 0.8730.959−0.100−0.0190.078−0.346−0.687−0.605−0.4600.0320.1840.1730.141−0.238
LossLS0.621−0.581−0.4660.2130.882 0.894−0.0920.0270.153−0.250−0.680−0.671−0.4300.0790.1630.1840.127−0.243
GCOTL0.524−0.468−0.3980.1270.9900.872 −0.1020.0970.165−0.270−0.621−0.584−0.409−0.0220.2070.1600.150−0.217
LossRes−0.3540.3370.273−0.065−0.913−0.696−0.925 −0.152−0.205−0.0780.0050.0080.050−0.034−0.002−0.056−0.036−0.010
LQ0.0410.166−0.159−0.171−0.009−0.0790.077−0.059 0.6370.3520.0470.0360.131−0.5960.135−0.020−0.0230.179
LoanRet0.140−0.056−0.2520.0210.2290.2010.277−0.2390.605 0.5540.0180.0010.0070.0240.1140.109−0.1180.011
IntMag−0.3790.4240.179−0.131−0.324−0.346−0.2890.2360.3000.386 0.4500.4210.5380.184−0.119−0.124−0.0830.168
PM−0.7250.6480.470−0.208−0.714−0.804−0.6960.5220.093−0.0990.507 0.9330.7330.044−0.215−0.203−0.0350.279
ROA−0.7270.6480.451−0.204−0.695−0.783−0.6750.5050.102−0.0760.5170.992 0.7330.018−0.184−0.202−0.0210.297
OIOE−0.4740.4640.303−0.229−0.519−0.536−0.4960.3790.155−0.0190.5880.7620.748 0.026−0.161−0.2560.1300.279
TLoanAT0.038−0.213−0.0650.2010.0750.1600.014−0.007−0.5870.0280.170−0.075−0.0690.011 −0.2630.0200.016−0.257
CashAT0.0700.017−0.0330.0050.0580.0170.078−0.0780.1300.1300.037−0.038−0.045−0.047−0.205 0.039−0.006−0.008
DepositAT0.202−0.239−0.2550.2900.1910.2050.195−0.1260.0250.145−0.138−0.203−0.200−0.268−0.0420.116 −0.330−0.088
Size0.026−0.112−0.035−0.1370.1510.1720.146−0.151−0.081−0.230−0.182−0.101−0.1030.1010.006−0.079−0.241 −0.103
Age−0.1980.2780.094−0.524−0.288−0.402−0.2570.1710.3200.0750.2870.3870.3810.349−0.2610.082−0.150−0.047
The table reports correlation statistics among the dependent and independent variables for the full sample, totaling 377 distinct banks. The correlations that are statistically significant at and above the 0.1 level are bolded. See Appendix B for variable definition.
Table 3. Results of logistic regression of bank failure on tested independent variables.
Table 3. Results of logistic regression of bank failure on tested independent variables.
123456789
T1CRATT1CREVLeverageGCOOILossLSGCOTLLossResLQLoanRet
Constant8.432 ***4.717 ***−0.882 ***−1.969 ***−2.312 ***−1.962 ***−0.718 ***−1.903 ***−2.531 ***
[0.000][0.000][0.000][0.000][0.000][0.000][0.000][0.002][0.003]
Tested Independent Variable−99.848 ***−4.751 ***12.460 ***6.744 ***83.527 ***77.481 ***−0.00314.207 *26.475 **
[0.000][0.000][0.002][0.000][0.000][0.000][0.381][0.061][0.040]
Wald Chi-Square 74.7844 ***72.211 ***9.605 ***63.881 ***73.810 ***63.942 ***0.7673.513 *4.231 **
[0.000][0.000][0.0019][0.000][0.000][0.000][0.381][0.061][0.040]
Pseudo R-square 0.4530.3300.0290.2630.3210.2610.0040.0100.013
Max-rescaled R-square 0.6360.4630.0410.3700.4510.3660.0060.0130.019
Hosmer and Lemershow Goodness-of-Fit 76.624 ***52.498 ***0.89112.69314.904 *26.744 ***61.728 ***6.23215.574 **
[0.000][0.000][0.999][0.123][0.061][0.001][0.000][0.621][0.049]
Classification (% correct)87.30084.60069.078.882.578.567.969.068.2
ROC Curve (% estimated area)92.53087.57058.782.788.682.671.952.558.7
Observations377377377377377377377377377
101112131415161718
IntMagPMROAOIOETLoanATCashATDepositATSizeAge
Constant2.750 ***−1.936 ***−2.003 ***4.9627 ***−0.741−1.071 ***−3.827 ***−3.487 **−0.424 **
[0.000][0.000][0.000][0.000][0.377][0.000][0.004][0.017][0.013]
Tested Independent Variable−106.602 ***−5.337 ***−89.494 ***−5.122 ***−0.0456.661 ***3.680 **0.215 *−0.008 ***
[0.000][0.000][0.000][0.000][0.968][0.010][0.022][0.062][0.010]
Wald Chi-Square 50.488 ***93.360 ***97.8651 ***63.5486 ***0.0026.668 ***5.257 **3.489 *7.054 ***
[0.000][0.000][0.000][0.000][0.968][0.010][0.022][0.062][0.008]
Pseudo R-square 0.1640.4390.4600.2270.0000.0190.0150.0090.019
Max-rescaled R-square 0.2300.6160.6450.3190.0000.0260.0210.0130.026
Hosmer and Lemershow Goodness-of-Fit 22.137 ***51.901 ***51.2534 ***16.073 **18.118 **16.041 **33.454 ***38.738 ***25.846 ***
[0.005][0.000][0.000][0.041][0.020][0.042][0.000][0.000][0.001]
Classification (% correct)75.689.190.580.468.468.468.468.768.4
ROC Curve (% estimated area)73.595.495.179.447.654.462.651.662.3
Observations377377377377377377377377377
Table 3 reports the results of the cross-sectional logistic regressions of the indicator variable Bfailure on the investigated variables. All the regressions include an intercept. The table reports the logistic regression coefficient estimates and, in [ ], p-values based on standard errors. *, **, and *** indicate statistical significance in means at the 10%, 5%, and 1% levels (two-tailed), respectively. See Appendix B for variable definitions.
Table 4. Results of the stepwise procedure to select the final regression model.
Table 4. Results of the stepwise procedure to select the final regression model.
(1)(2)
Bfailure Bfailure
Constant−19.206−35.413 ***
[0.138][0.008]
T1CRAT−121.279 ***−144.036 ***
[0.001][0.000]
T1CREV3.7115.126 ***
[0.101][0.003]
Leverage−7.854
[0.432]
GCOOI1.539
[0.887]
LossLS−109.567 ***
[0.001]
GCOTL−18.66
[0.882]
LossRes−0.005
[0.783]
LQ203.916 **297.603 ***
[0.030][0.000]
LoanRet126.888
[0.157]
IntMag−231.082 ***−250.620 ***
[0.000][0.000]
PM−3.264.180 ***
[0.342][0.000]
ROA−193.791 ***−166.533 ***
[0.000][0.000]
OIOE8.367 ***6.399 ***
[0.003][0.002]
TLoanAT18.810 **22.597 ***
[0.012][0.000]
CashAT10.09414.078 *
[0.288][0.075]
DepositAT−3.439
[0.386]
Size0.480
[0.160]
Age0.024 **0.024 ***
[0.012][0.003]
Wald Chi-Square 55.76 ***62.04 ***
[0.000][0.000]
Pseudo R-square 0.8000.735
Max-rescaled R-square 0.8530.843
Hosmer and Lemershow Goodness-of-Fit 5.2502.930
[0.731][0.403]
Classification (% correct)93.494.88
ROC Curve (% estimated area)98.398.05
Observations377377
Columns (1) and (2) of Table 4 report the results of the cross-sectional logistic regressions of the indicator variable Bfailure on the full set of the investigated variables and the explanatory variables selected by the stepwise procedure. All the regressions include an intercept. The table reports the logistic regression coefficient estimates and, in [ ], p-values based on standard errors. The p-values, in [ ], below the Wald Chi-Square and Hosmer and Lemershow goodness-of-fit statistics are based on the Chi-Square test statistics. *, **, and *** indicate statistical significance in means at the 10%, 5%, and 1% levels (two-tailed), respectively. See Appendix B for variable definitions.
Table 5. Results of the logistic regression model selected by stepwise using the log-transformed values of the tested predictors.
Table 5. Results of the logistic regression model selected by stepwise using the log-transformed values of the tested predictors.
Bfailure
Constant−35.699 ***
[0.000]
LogT1CRAT−9.725 ***
[0.000]
LogLQ 45.621
[0.299]
LogLoanRet135.379 ***
[0.046]
LogIntMag−129.491 **
[0.007]
LogROA−211.697 ***
[0.000]
LogOIOE 14.090 ***
[0.000]
Wald Chi-Square 52.970 ***
[0.000]
Pseudo R-square 0.764
Max-rescaled R-square 0.857
Hosmer and Lemershow Goodness-of-Fit 10.560
[0.228]
Classification (% correct)95.83
ROC Curve (% estimated area)0.983
Observations336
Table 5 reports the results of the cross-sectional logistic regressions of the indicator variable Bfailure on the explanatory variables selected by the stepwise procedure. All the regressions include an intercept. The table reports logistic regression coefficient estimates and, in [ ], p-values based on standard errors. The p-values, in [ ], below the Wald Chi-Square and Hosmer and Lemershow goodness-of-fit statistics are based on the Chi-Square test statistics. **, and *** indicate statistical significance in means at the 5%, and 1% levels (two-tailed), respectively. See Appendix B for variable definitions.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S. Can We Use Financial Data to Predict Bank Failure in 2009? J. Risk Financial Manag. 2024, 17, 522. https://doi.org/10.3390/jrfm17110522

AMA Style

Liu S. Can We Use Financial Data to Predict Bank Failure in 2009? Journal of Risk and Financial Management. 2024; 17(11):522. https://doi.org/10.3390/jrfm17110522

Chicago/Turabian Style

Liu, Shirley (Min). 2024. "Can We Use Financial Data to Predict Bank Failure in 2009?" Journal of Risk and Financial Management 17, no. 11: 522. https://doi.org/10.3390/jrfm17110522

APA Style

Liu, S. (2024). Can We Use Financial Data to Predict Bank Failure in 2009? Journal of Risk and Financial Management, 17(11), 522. https://doi.org/10.3390/jrfm17110522

Article Metrics

Back to TopTop