1. Introduction
Investors looking to diversify their portfolios may find utilities to be an appealing industry. Utility exchange-traded funds (ETFs) largely maintain investment portfolios of regulated utilities with a strong and stable profit margin (
Tsolas 2019). Instead of taking a risk and buying stocks of a single firm, utility sector ETFs allow investors to participate in a number of firms in this sector. These ETFs epitomize diversity, and, moreover, they are attractive because of the tax rules on the income generated by dividends paid (
Lydon 2010).
ETFs are open-ended investment vehicles that try to track the return and risk of their underlying indices. These funds combine the flexibility of stocks with mutual fund diversification. ETFs have grown in popularity since their inception because they offer investors better diversification, tax efficiency, and lower expenses (
Henriques et al. 2022;
Bowes and Ausloos 2021). The Sharpe ratio (
Sharpe 1966) and Jensen’s alpha (
Jensen 1968) are two of the most frequently utilized risk-adjusted measures used to evaluate the performance of these funds. They do, however, have significant disadvantages, such as the need to find benchmarks and the significance of market timing (
Roll 1977;
Henriques et al. 2022).
The mean-variance analysis, which may be associated with the above measures, has several limitations because mean-variance optimal portfolios are extremely sensitive to even small changes in input parameters and the existence of outliers (
Adhikari et al. 2020). Because of this, portfolio construction is a current concern, and robust optimization has been employed to address mean-variance analysis’s shortcomings (
Ceria and Stubbs 2006;
Adhikari et al. 2020). Moreover, some criteria must be met in the case of socially responsible investing (
Arslan-Ayaydin et al. 2018).
Existing techniques for evaluating ETF performance have the previously mentioned drawbacks. In order to give effective support to investors, one of the challenges inherent in analyzing the performance of an ETF is to incorporate various variables in the analysis and produce a single index (
Henriques et al. 2022).
Data envelopment analysis (DEA) is deemed to be a suitable method that can be employed to identify the best-in-class ETFs. Recent DEA studies deal with utility ETFs’ performance evaluation (
Tsolas 2019;
Henriques et al. 2022). DEA, proposed by
Charnes et al. (
1978), allows a set of the same entities (e.g., ETFs), called decision-making units (DMUs), to be evaluated, whose performance depends on the selected inputs and outputs. DEA calculates a comparative ratio of weighted outputs to weighted inputs for each DMU, which is reported as an efficiency rating with values ranging from zero to unity (best performance). DMUs are classified as efficient or inefficient using this method depending on their efficiency rating. In addition, the DEA method allows evaluators to obtain benchmarks for underperforming DMUs in order to design performance improvement interventions. DEA produces an efficient frontier on which the efficient DMUs are located, and they can be used to determine the possible improvements for the inefficient DMUs.
The radial efficiency measure (
Charnes et al. 1978) cannot account for all inefficiency components of the DMU under evaluation. To tackle this problem,
Charnes et al. (
1985) developed the additive model as an input and output slack-based DEA model. This slack-based model classifies DMUs as efficient if they have zero slacks. Later on,
Tone (
2001) developed the slack-based measure (SBM) of efficiency. Two ways to evaluate the performance of DMUs have been developed in the literature, namely, the efficiency and super-efficiency approaches. The former approach only estimates inefficiency by providing ratings ranging from zero to unity, whereas the later approach only estimates extreme efficiency by producing ratings larger than unity (
Guo et al. 2017;
Tran et al. 2019).
Du et al. (
2010) developed a super-efficiency additive DEA model based on the super-efficiency SBM model (
Tone 2002). First, efficient DMUs must be identified, and then these efficient DMUs must be subjected to an additional super-efficiency model.
Jahanshahloo et al. (
2013) used the RAM super-efficiency model, which is based on the additive structure, to rank efficient DMUs. The additive structure allows inefficiency to be measured by taking into account each DMU’s input and output variables and, thus, it removes the separation of the input and output orientations in model building.
The aim of this article is to use a super-efficiency approach to analyze and rank a set of utility exchange-traded funds (ETFs) in order to find the best ETFs. The RAM model is used in this work to evaluate the sample ETFs and rank the inefficient funds, while the super-efficiency RAM model is used to fully rank the RAM-based efficient funds. Other slack-based selected DEA models are also used to analyze the ETFs. The results show that the proposed approach delivers the same efficient funds as other slack-based selected DEA models; hence, it appears to be useful as a fund selection tool.
The current study adds to the existing body of knowledge in two ways. To begin, the performance of a sample of utility ETFs is assessed using both the RAM and the super-efficiency RAM to not only provide ratings but also a thorough ranking of the funds. The use of super-efficiency RAM allows for even more discrimination in ETF performance models. Although the concept of super-efficiency is not new, the author believes this is the first time super-efficiency RAM has been mentioned in the relevant ETF literature. Second, multiple additive-structured models are used to confirm the RAM’s ability to distinguish between efficient and inefficient funds.
Two research questions are addressed in the current study: (1) What are the input and output variables’ contributions to the inefficiency of ETFs? (2) What ETFs are the best in their class? The RAM and the super-efficiency RAM are applied to a set of utility ETFs traded in the United States as part of the empirical phase of the study. This sector was chosen given its size and significance to the global economy and environment (see also
Henriques et al. 2022).
The article is structured as follows:
Section 2 provides a synopsis of DEA studies on ETF performance evaluation. The RAM and super-efficiency RAM models utilized in the analysis are presented in
Section 3.
Section 4 introduces the data set and explains why the input and output variables were chosen. The presentation and discussion of the results is covered in
Section 5. The final section concludes.
2. Literature Review
A growing number of DEA studies have been conducted on rating the performance of ETFs. Three research threads are formed by the pertinent works. The first strand focuses on single DEA studies, which see fund management as a multi-input, multi-output process.
Chu et al. (
2010) was followed by a number of works that fall under this area of study (
Prasanna 2012;
Acharya et al. 2015;
Choi and Min 2017). By placing risk metrics and transaction costs on the input side of the DEA and return measures and other performance indices on the output side, these research works apply various DEA models to evaluate fund performance.
Chu et al. (
2010) first used a range directional measure (
Portela et al. 2004) to evaluate ETF performance.
Prasanna (
2012) employed DEA to assess the performance of Indian ETFs.
Acharya et al. (
2015) employed DEA to evaluate a set of Indian gold ETFs.
Choi and Min (
2017) used DEA to assess the performance of a set of ETFs and their corresponding index.
The second area of research focuses on two-stage DEA studies where the goal is to pinpoint performance-related factors. In a follow-up step of the first stage, where the ratings are measured, regression models can be used to model the explanatory variables as well as the DEA ratings. The works of
Tsolas (
2011) and
Tsolas and Charles (
2015) are located in this area of study.
Tsolas (
2011) employed a two-step technique to evaluate natural resources ETFs, integrating the generalized proportional distance function (
Kerstens and Van de Woestyne 2011) with a censored Tobit model.
Tsolas and Charles (
2015) used a two-step process for the performance appraisal of green ETFs: slacks-based DEA models were utilized to generate DEA-based ETF ratings, and then regression analysis was employed to predict the ETF ratings.
The third strand focuses on the combination of DEA and other techniques such as grey relational analysis (GRA) (
Tsolas (
2019) and multi-objective linear programming (
Henriques et al. 2022)).
The performance evaluation of ETFs by means of DEA should take into account the possible existence of negative values in the data set, e.g., returns (
Henriques et al. 2022). The RAM of inefficiency has several advantages over DEA models, which do not have the additive structure. In these models, performance is overestimated, resulting in a low modeling approach discriminating power (
Chen et al. 2019). Using the super-efficiency RAM model and other slack-based DEA models based on the additive structure, the current work improves on
Tsolas and Charles (
2015). The combined use of RAM and super-efficiency RAM is superior to other DEA models referred to above because it provides full ranking of DMUs.
In addition to the primitive evaluation of fund performance, the literature has mainly tried to understand how transaction costs and fund size affect the efficiency of portfolios. The predicted return is treated as the output, while investment costs and risk indices are taken into consideration as inputs for this purpose (
Choi and Min 2017).
The variables for evaluating ETF performance are chosen in accordance with the pertinent literature. Regarding which inputs and outcomes should be unambiguously included in a DEA model, academics and investors are divided. Expense ratio (ER) (
Chu et al. 2010;
Tsolas 2011,
2019), tracking error (TE), beta (BETA) coefficient (
Choi and Min 2017), total assets (TA), and (1-year) return (
Chu et al. 2010;
Choi and Min 2017) are the variables used for the current study.
Tsolas and Charles (
2015) suggested using the TE for DEA-based ETF performance measurement, while the TA was used by
Tsolas (
2020) for reflecting fund size. The interested reader is directed to
Henriques et al. (
2022) for a recent review on DEA research on the performance evaluation of ETFs.
3. Methods
Given a set of
n ETFs to evaluate, where ETF
j uses input
to produce output
, the inefficiency of ETF
0 with data (
X0,
Y0) is estimated by employing the following additive weighted variable returns to scale (VRS) model (
Lovell and Pastor 1995)
where
is an intensity vector for DMU
j,
,
are the input and output slacks, respectively, and
,
are weights that reflect the importance of input and output slacks, respectively.
The inefficiency of ETF0. is reflected by the objective function , where * indicates optimality. Since and , the value of is greater than or equal to zero. The model (1) maximizes the weighted input and output slacks, which are used to measure the distance between ETF0 and the efficiency frontier. If there are slacks (i.e., values of the objective function greater than zero), the fund is inefficient and needs to boost outputs while lowering inputs simultaneously to become efficient.
Depending on the weights chosen, the model (1) is related with various measurements (
Tsolas 2020). The most known additive weighted model under VRS is the RAM of inefficiency (
Cooper et al. 1999) that stems from model (1) using the weights
where
and
represent the ranges of the input
i and output
r, respectively.
The objective function where * indicates optimality, is a metric of the inefficiency of ETF0. Since and , then . In the case that the fund under evaluation is inefficient.
The current study proposes the RAM of inefficiency, a non-oriented slacks-based model that accounts for the weighted slacks of both inputs and output in order to generate the performance metric. As a result of the slacks of inputs and outputs being divided by the range of their observed values, RAM of inefficiency is units-invariant, meaning that the objective function of the model is dimensionless (
Tsolas 2020).
The aforementioned metric provides ties of zero for numerous efficient DMUs, making it impossible to discern among them. Fortunately, by omitting the DMU under evaluation (i.e., DMU0) from the analysis, the super-efficiency model may be utilized to rank the efficient DMUs (i.e., DMUs with zero inefficiency).
DEA ratings can be used to rank inefficient DMUs. Because the ratings for efficient DMUs are all the same and equal to unity, ranking efficient DMUs is the most difficult task in DEA. To rank the efficient units, the super-efficiency approach might be utilized. These units can achieve an efficiency score better than one by omitting DMU0 from the analysis. The RAM-efficient DMUs are ranked using this feature.
If we assume that ETF
0 is efficient, we cannot simply change model (1) and (2) by excluding ETF
0 from the reference set in order to obtain the super-efficiency of ETF
0 because the resulting model might not have a feasible solution. The constraints and the objective function of model (1) and (2) should be modified (
Du et al. 2010). The RAM super-efficiency model that provides the rating of DMU
0 is as follows (
Du et al. 2010)
The super-efficiency rating of DMU0 using the optimal solution of Model (3) is: .
Model (3) classifies the identified efficient DMUs by Model (1) and (2) into extremely efficient and efficient with super-efficiency scores greater or equal to unity, respectively. Thus, extremely efficient DMUs can be ranked by using their super-efficiency scores. To rank the non-extremely efficient DMUs, the extremely efficient units must be removed from the analysis. Model (3) is employed again to obtain the new super-efficiency scores of the remaining efficient DMUs. By repeating this process, all efficient DMUs can be fully ranked (
Jahanshahloo et al. 2013).
4. Data Set
All funds identified by etfdb.com as Utilities Equities ETFs made up the sample for the current study. All utility ETFs that were available as of 18 July 2017, and that were traded in the United States, were included in the sample. Etfdb.com is the source of the information on ETF variables. The data were available for at least a five-year period, although not all funds’ stats were available because some of them had been established earlier (after 2015). Due to lacking data, only the 1-year return was used for the fund returns.
The selection of inputs and outputs is based on (i) the scientific literature and (ii) a performed isotonicity test among available variables on ETF database. The chosen variables are: ER, TE, BETA, and TA as inputs and average 1-year return as output.
The ER represents the portion of the amount of investment in an ETF incurred annually that goes towards a fund’s management fees. ETFs with reduced expense ratios are regarded as advantageous for this reason (
Bourgi 2019;
Tsolas 2019). TE is the difference between the performance of the ETF and the performance of the relevant index for the same investing period (
Cummans 2015). The beta coefficient measures a fund’s vulnerability to market fluctuations. A beta of one suggests that the ETF’s price moves in lockstep with the market. While a beta of less than one implies that the price of the fund is less volatile than the market, a beta of more than one suggests that the ETF’s price rises higher than the market (
Killa 2021;
Henriques et al. 2022). Low beta funds exhibit greater levels of stability than their market-sensitive counterparts and will usually lose less when the market crumbles. Given lesser risks and lower returns, these are considered safe and resilient amid uncertainty (
Killa 2021). TA describe the total amount of assets or investments managed by a particular fund. TA include securities that receive income from security lending (
Elton et al. 2019).
The mean annual return reports over a 1-year return for a fund. It is calculated net of the fund’s expense ratio and other costs, e.g., sales charges, other commissions (
Henriques et al. 2022).
Due to their non-availability from etfdb.com, some potential input and output variables are excluded from this study. Portfolio price/cash flow (P/CF), portfolio price/book (P/B), standard deviation (
Tsolas 2011;
Tsolas and Charles 2015), downside risk (
Chu et al. 2010), maximum drawdown (i.e., a metric of a fund’s capability to rebound from a trough), and monthly downside deviation (
Prasanna 2012) are potential input variables that could reflect user cost. Potential output variables (i.e., conventional performance metrics) are the Sharpe ratio and Jensen’s alpha (
Tsolas 2011,
2019;
Tsolas and Charles 2015) and upper deviation (
Chu et al. 2010).
An isotonicity test was performed between the input (ER, TE, BETA, and TA) and output (1-year return) candidate variables using Pearson’s correlation coefficient. This test is intended to confirm the isotonicity of the DEA approach (i.e., efficiency decreases as inputs increase and efficiency increases as outputs increase, according to
Cooper et al. (
2011)). The input and output variables passed the isotonicity test when there were positive (and substantial) inter-correlations among them.
Table 1 shows descriptive statistics for the fund data used in the analysis.
5. Results
Table 2 displays descriptive statistics for RAM-based ETF efficiency ratings derived from the use of model (1) and (2). The sampled funds’ mean efficiency is at 94 percent, while the median efficiency is around 100 percent. Twelve (60 percent) of the twenty funds in the study are relatively efficient. Inefficient funds have a mean efficiency of around 86 percent, while the median efficiency is at 87 percent.
The best-in-class funds, according to RAM-based ratings, are: Reaves Utilities ETF (UTES), Reaves Utilities ETF (UTLF), FIDELITY MSCI UTILITIES INDEX ETF (FUTY), First Trust NASDAQ Clean Edge Smart Grid Infrastructure Index Fund (GRID), PowerShares S&P SmallCap Utilities Portfolio ETF (PSCU), PowerShares DWA Utilities Momentum Portfolio ETF (PUI), Guggenheim S&P High Income Infrastructure ETF (GHII), Vanguard Utilities ETF (VPU), Columbia India Infrastructure ETF (INXX), Utilities Select Sector SPDR Fund (XLU), SPDR S&P International Utilities Sector ETF (IPU), and iShares Global Infrastructure ETF (IGF).
The weighted additive VRS model (1) with weights
and a version of it, the measure of inefficiency proportions (MIP,
Cooper et al. 1999), were utilized in the case of robustness. These models also produce funds that are comparable to best-in-class funds. Details of the results are available from the author.
The sources of inefficiency of inefficient funds can be determined using input and output slacks.
Table 3 illustrates the RAM model (1) and (2)’s optimal input and output slacks as ratios of their respective ranges. ER, TA, BETA, and TE are the factors that have the biggest impact on ETF inefficiency. The average input inefficiencies (about 5 percent) are higher than the average output inefficiencies (3.1 percent). The fact that systematic risk (BETA) has slack for only one inefficient fund is a remarkable result.
In line with a recent study by
Neves et al. (
2021), efficient ETFs with a little greater beta outperformed inefficient low beta funds on average.
For the twelve best-in-class funds described above, the RAM of inefficiency yields zero ties, making comparisons between them impossible. The super-efficiency RAM model (3) is used to rank the top funds.
Table 4 depicts the complete ranking of RAM-efficient ETFs after using Model (3) to derive RAM super-efficiency ratings.
It is noteworthy that two of the efficient ETFs, XLU and VPU, were shown to be efficient in a prior DEA-based study on utility ETFs (
Tsolas 2019).
6. Conclusions
The current paper employs both the RAM and super-efficiency RAM models to assess the performance of a sample of utility ETFs. To identify the efficient and inefficient funds, the RAM of inefficiency is used as a particular additive weighted VRS model. The goal of the current study is to respond to the research questions posed. Answers to questions (1) and (2) are included in the findings, which point to the following: (1) Reducing inputs, mainly ER and TA and less BETA and TE, while concurrently boosting output has the potential to improve performance (i.e., return). (2) Fund performance ratings can be distinguished using the derived RAM and super-efficiency RAM ratings. The fund discrimination results are validated using a variety of alternative slack-based DEA models from the additive model family. In order to completely rank the sample ETFs, the RAM-efficient funds are further analyzed using the RAM super-efficiency model.
Some insights on fund performance are offered by the findings. The sample funds have no TE issues. BETA is not a critical aspect of fund performance, although in line with the conclusions of other studies, on average, efficient ETFs with a little greater beta outperformed inefficient low beta funds. ER appears to have the most impact on fund performance, while fund size should also be taken into consideration.
The following two components of the current study can be expanded upon. First, selecting multiple time periods and examining the sensitivity of results would be interesting. Second, the proposed approach could be combined with existing DEA slack-based models to further study the approach’s discriminating potential.
Professionals and investors can benefit from the current study’s findings. Financial analysts could use the proposed metrics to track the success of the ETF industry. The RAM and RAM super-efficiency models could provide information about financial investors’ portfolios, which they could utilize to make investment decisions. Fund managers may be interested in tracking the performance of their funds as well as the efficiency with which they are managed. The current analysis aids professionals and investors in constructing an ETF performance benchmark that takes into account not only risk but also fund expenses, total assets, and tracking error.