4.1. Data Collection
This study considers a sample of European listed companies and operating in sectors particularly exposed to environmental and social risks. According to classification, Basic Materials, Industrial, Oil & Gas, and Utility companies were included in the analysis. The focus on these sectors was driven by the consideration that sustainability is likely to be the most valuable non-financial information disclosed through an integrated report, being it a considerable tail-risk factor.
For each company included in the sample, prices and financial data were obtained quarterly from Factset, considering the period 2013-12-31:2018-06-29. It is worth noting that this study’s dataset consists of a panel of cross-section observations. Each observation corresponds to a distinct company at some point in time . For each observation, consensus estimates for currently unreported earnings per share were defined considering the median of all the estimates available. In this respect, price data refers to the closing price as of the date of each observation. Observations entailing either a last reported negative book-value per share or a negative were excluded from the analysis, since they could be considered as outliers in the present framework.
Companies adopting the integrated reporting framework according to the IIRC were identified considering the list published on the IIRC’s website (
http://examples.integratedreporting.org/all_reporters, [
121]). The list is dated December 2013, which was considered as starting date for this study’s empirical analysis. For each observation, the closing date of the last fiscal year was obtained from
Factset’s Fundamentals database. As a general rule, it’s been assumed that annual reports became available 6 months after the end of the related fiscal year. According to this rule, for each observation associated to an integrated reporter the latest available company’s report was obtained as to estimate
. In order to avoid double counting issues (see
Section 4.2 below), the following “priority” logic was applied in order to consider the most relevant document for what concerns sustainability disclosure. Whenever a document listed as “integrated report” was present on the company’s website, it was the only one considered in the analysis. Alternatively, the sustainability report, if available, was considered. If none of the two previous conditions was met, the annual report was used in the analysis. All the documents considered in the analysis were drafted in English.
It is worth noting that company reports of non-integrated reporters were exempted from textual analysis, consistent with this study’s methodological assumption that sustainability disclosure might be verifiable only through integrated reporting. Put differently, traditional sustainability disclosure has not been considered as verifiable, consistent with many of the studies presented in
Section 2. In this respect, the lack of a definite proof of positive value-relevance, even in presence of assurance [
58,
98], contributed to consider it as a babbling equilibrium. As D’Aquila stressed [
122], the Sustainability Standard Accounting Board (SASB) recently affirmed that “by and large, companies continue to take a minimally compliant approach to sustainability disclosure, providing the market with information that is inadequate for making investment decisions” [
123]. Hence, sustainability disclosure from non-integrated reporters was excluded from the purpose of the present study. Conversely, integrated reporting, being one of the most recent and complete framework in terms of sustainability disclosure’s requirements, was considered as potentially able to provide the means for verifiable sustainability disclosure.
Finally, companies with an average market cap below 3.5 Euro billions (approx. 4 USD billions) were excluded from the sample in order to preserve the focus on large-caps companies. As European Small and Mid-sized companies tend to trade at more demanding multiples because of better long-term growth potentials, this choice has the advantage to bypass the inclusion of additional control variables in the regressions presented in this study. Similarly, companies with less that 20% floating shares were excluded. According to this study’s authors’ experience, this is a minimum threshold to assert market prices being only limitedly influenced by the presence of insiders or other liquidity gaps.
This study’s sample is ultimately composed by 180 companies, of which 32 are integrated reporters according to the list published on the IIRC’s website. After excluding broken data (e.g., missing consensus estimates), this study’s sample was ultimately composed of 3382 panel observations, corresponding to 19 different quarters (2013-12-31:2018-06-29) including on average 178 companies at each date.
Table 1 below describes the distribution of some variables of interests, while
Table 2 outlines the composition of the sample by countries and sectors (% of companies).
Finally,
Table 3 illustrates the distribution of integrated reporters across the sectors considered in this present paper. In absolute terms, all sectors with the exception of Oil & Gas companies include approximately the same number of integrated reporters (approx. 10 per sector). However, as the number of industrial companies is much larger than Basic Materials and Utilities, integrated reporting appears to be a more consolidated trend in these two sectors.
4.2. From Theory to Empirics: Natural Language Processing Proxy for
In reality, companies obviously do not disclose their probability of success as they would in a model-like situation. For what concerns sustainability, corporations focus on issuing qualitative statements backed by selected data, in order to represent environmental and social risks. Of course, these statements could be omitted or intentionally manipulated, depending on whether they result verifiable or not. To this extent, it is reasonable to posit that the “tone” of these statements is a good proxy of the “message” that would be equivalently sent to shareholders in a model-like situation.
In light of the previous observations, this study assumes a correspondence between the theoretical message of the model,
, and the overall tone of the statements related to sustainability. This paper’s methodology relies on the relative frequency of positive tones
as a proxy for the theoretical message
that managers would issue to influence market’s perception about their companies’ risk
. This relation can be represented empirically as,
where
is a measurement error component which is supposed to be independently distributed from
or any other financial variable included in this study’s analysis. It is worth noting that in absence of any statistical assumption on the distribution of this error term, the previous equation would be simply a tautology and would be of no help in supporting the identification of the key parameters of the regression models presented in the next Section.
Several studies employed textual analysis to better understand the narrative adopted in company reports [
124,
125,
126,
127,
128,
129,
130,
131]. In this respect, the analysis of tones (positive/negative/neutral) is particularly widespread, especially in terms of market’s reaction [
132,
133,
134,
135,
136,
137,
138,
139,
140].
The remainder of this Section describes how this study systematically analyzed integrated reports in order to obtain . As anticipated, for each company the relevant textual document was obtained directly from its website. Unfortunately, European companies are not generally required to present their annual statements in a standard format such as the 10-k format required by the SEC for US listed firms. Besides, integrated reports are generally available only in pdf format, and for this reason the companies’ files employed in this study had to be converted in a “machine-readable” format.
After collecting all the relevant pdf files from companies’ website, the latter were converted in the .txt format using Python’s pdfpage open-source package. A bespoke routine was drafted by the authors of this study in order to preserve an overview of the parsing process. During this process, all the text elements embedded in pure “images” were lost. Nevertheless, a random sample check showed that the same concepts presented through images were mentioned or sometimes even more extensively discussed in other textual parts of the documents. As a technical aside, each document was converted with UTF-8 encoding.
The conversion algorithm proceeded sequentially considering all the pages of each integrated report and isolating the related textual items. The following step was to define a list of concepts to be searched for a sentiment analysis. This paper considered as a starting point the concept vector (
Table 4) presented in Wen [
141], which was based in turn on a list published in precedence by E. I. du Pont de Nemours and Company [
142].
A concept vector formally consists of a textual file including regular expressions (regex). Each regex represents the “root” of one or more words that, respectively individually or jointly, are strongly associated with the topic of interest (i.e., sustainability). This paper extended the original list of regex included in Wen [
141] in order to account for social sustainability as well as environmental sustainability. While the latter remains in general prominent in determining business risks, in some cases social sustainability could be also a matter of investors’ concern. An example could be the reputational risk associated to very low levels of employees’ welfare (i.e., risk of class actions), but several others could be made. As a general remark, the list of additional words included in this paper’s analysis was calibrated in order to avoid duplications of the same concept.
The next step of the procedure consisted in obtaining for each pdf file all the sentences where at least one of the regex above occurred once, avoiding double counting. Following Wen [
141], a bespoke Python program was written based on the natural language processing package nltk. This process is commonly referred as tokenization.
The last step was to use
Vader as to measure the percentage of text with positive, negative and neutral tone for each sentence dealing with sustainability [
143]. After having performed this task, the synthetic number of positive (negative) words occurring in each sentence was obtained by multiplying the percentage of text with positive (negative) meaning times the numbers of words included in the sentence considered. Finally, for each report,
was obtained considering the ratio between the sum of all the positive words to the sum of all those positive and negative, having excluded sentences with more than 150 words. (e.g., tables, where sentiment is hardly measurable). Formally, the following equation defines
according to this study’s methodology,
where:
is a regex included in the concept vector;
is any sentence included in a given pdf;
is the number of words included in sentence ;
is any sentence with less than 150 words including regex ;
is the percentage of words in sentence with either positive or negative , according to Vader.
Vader is the acronym of
Valence Aware Dictionary and sEntiment Reasoner, a valence-based sentiment analysis algorithm for textual data which was developed by Hutto and Gilbert [
143]. The use of Vader represents a step-forward compared to the approach of Wen [
141], which was uniquely based on counting the frequency of positive and negative words based on common dictionaries. Vader instead is a bit more based on actual human reasoning and can understand several syntax structures that affect the tone of a statement. The developers of this algorithm indeed created a gold-standard list of lexical features aimed at enhancing context awareness in textual analysis. This was achieved including the results of large-scale linguistic experiments that were conducted leveraging on Amazon Turk’s network.
One of the main differences with standard positive (negative) words counting is that Vader attributes to each word a score reflecting the intensity of its positivity (negativity). The score attributed to each word ranges from for very negative terms to for those very positive, while neutral words get a rating equal to zero. Vader provides a dictionary associating to each English word the related score, which in turn was obtained averaging the rating attributed by several independent reviewers. Within the context of a specific sentence, scores might be adjusted to take into account specific syntactic effects (e.g., negations) as well as the impact of punctuation (e.g., “!”).
The percentage of positive text
in each sentence is obtained according to Equation (13) below
that is, considering the ratio between the total positive scores and the sum of total number of scores plus the number of words with neutral tone
. Similarly, the percentage of positive text
in each sentence is obtained as,
As anticipated, the percentage of positive (negative) words is obtained multiplying times . This means that the percentage of positive (negative) text within a sentence could be larger than the number of positive (negative) words included. For this reason, this number has been defined as “synthetic” in precedence, and it represents the equivalent number of “basic” positive (negative) words that should be adopted in a sentence in order to induce the same sentiment. Besides, as Vader could be sensitive to punctuation, this paper included punctuation when measuring the length of each sentence.
A more thorough exposure of Vader would require a much longer digression which is out of this paper’s scope. However, in order to illustrate how Vader works in practice, consider the following sentence from Rio Tinto’s 2015 annual report [
144] (p. 119) which contains the regex \
\:
“Environmental costs result from environmental damage that was not a necessary consequence of operations, and may include remediation, compensation and penalties.”
As it appears evident, the message conveyed by this statement is rather neutral. Indeed, this sentence is nothing but the definition of environmental costs reported in the annual report’s section “Notes to the 2015 Financial Statement”. The Vader algorithm correctly classifies this statement, attributing the majority (approx. 85%) of text to the “neutral” bucket and only a minimal part to the negative one (approx. 15%). In a sense, the “negative” part comes from the fact that the company had to mention damages and the related potential impact on earnings.
As a technical aside, this paper makes use of the Vader module included in the sentiment library of the Python package nltk.
4.3. Regression Analysis
This Section presents the final part of this study, that is, the assessment of whether integrated reporting is cheap talk in relation to sustainability disclosure. The analysis of the distribution of
(
Section 5.1) has shown that, from an empiricist perspective, either a babbling equilibrium or a discretionary equilibrium can fit the data collected. In this respect, the former reflects a situation in which sustainability disclosure is not verifiable and has no effect on market valuations while the latter the exact opposite.
The following equation can be formulated in order to derive a regression model consistent with the competing theories presented,
where
is an indicator function which is equal to one if company
is an integrated reporter, otherwise to zero. Equation (15) can be written equivalently as
where
is equal to zero under the null hypothesis of cheap talk, otherwise to one. As anticipated, an empirical counterpart for
is obtained considering a multiple of consensus earnings estimates for the current unreported fiscal year, which might be different depending on the sector in which each company operates (e.g., industrial companies have different long-term growth perspectives compared to utilities). For ease of notation, the relation
, will be written without referencing to the specific sector to which company
belongs.
Substituting the expression
in Equation (16), the following equation is obtained,
In order to test the model on actual data, it remains to substitute
with its proxy
. In order to do so, recall that this study presumes the following relation,
which can be used in Equation (16) obtaining
It is useful to recall that is a zero-mean measurement error term which was supposed to be independently distributed from or any other financial variable included in the analysis.
The term
is somewhat problematic, as it includes a measurement error component
interacting with other regressors. In order to deal with this additional complexity, it is convenient to examine
’s expected value conditional upon the whole set of regressors involved,
When integrated reporting is not adopted, . Conversely, when integrated reporting is adopted and observed, there could be two different scenarios. If integrated reporting is cheap talk , then its adoption is neither related to nor to . Furthermore, is not related to , as the possibility of a partition equilibrium was ruled out. Hence, in this case it appears that .
If instead a discretionary equilibrium took place
, integrated reporting would be adopted provided
being above a given threshold
. In this circumstance,
would be equal to
and, consequently, the following equation can be formulated,
The distribution of the measurement error
is also problematic as
is necessary bounded between zero and one. Besides,
is an unknown parameter. This study addressed this potential endogeneity issue in a simple, approximated but easily implementable way. First, the unknown threshold
was set to the minimum value of
observed
. Second, the distribution of the error term was approximated as Gaussian with standard deviation
equal to that of
, obtaining
Heuristically, the idea underlying this hypothesis is that whenever is close to one, it is less likely that has been underestimated, as . This approach is consistent with the null hypothesis of “babbling”, as integrated report generally includes a large number of statements related to sustainability disclosure.
Hence, the term
could be ultimately represented as
where
satisfies the exogeneity condition
. However, it should be noticed that
is linear in consensus earnings; therefore, heteroscedasticity shall be properly addressed in hypothesis testing. Before getting to this point it is necessary to summarize the results obtained so far and discuss which control variables deserve to be included.
In light of the discussion above, the following empirical model can be outlined,
where
, while
and
are sector dependent parameters.
Equation (25) is of course unlikely to perfectly fit the data generating process underlying this study’s sample. For this reason, this study includes the book-value per share as control variable and allows the presence of a disturbance term independently distributed from any financial variable included in Equation (25) above. In order to avoid common estimation problems, all data are considered in Euro, which serves as common base currency. In addition, for each company both sides of Equation (25) are pre-multiplied times the number of common shares outstanding . In this way, a constant term is in introduced in the regression in order to allow for .
The inclusion of a common intercept term in the regression would be prevented by “scaling” effects if not multiplying both sides of Equation (25) times the number of shares outstanding. Indeed, consider two companies, A and B, differing only for the number of their outstanding shares. Assume that company A has issued only one share and the related price fits the equation . As a consequence, the price of company’s B shares reads , contradicting the possibility of an intercept term common to both companies.
Thus, this study’s regression model reads as
where
,
is the last reported (
FY0) book-value per share, while
and
are sector specific parameters.
The last step of this study’s regression analysis is to consider the possibility that self-selection can occur in this sample [
145,
146]. As said,
is assumed to be independently distributed from any other financial variable, such as
and
. Nevertheless, it could be possible that
is correlated to sustainability, if the latter was disclosed in a verifiable way
. Thus, if the alternative hypothesis of integrated reporting being value relevant was true, a self-selection bias could be effectively present in the regression [
118,
145,
146].
Under the null hypothesis of “babbling”, integrated reporting is adopted irrespectively of
and
provides no information about sustainability, and therefore
Conversely, if a discretionary equilibrium occurred
,
would be still equal to zero, while
to
Conditional upon
being equal to 1, this paper models
and
as jointly normally distributed, possibly with correlation
different from zero. This approach is consistent with Heckman’s methodology to deal with endogenous selection [
146]. Exploiting the properties of the bivariate normal distribution, if a discretionary equilibrium occurred (i.e.,
), it would be possible write
As a result, the error term of the regression model can be represented as
where
. Substituting Equation (30) in Equation (25), this study’s regression model is ultimately formulated as
where
. Estimates for this model’s parameters are obtained through the application of the Ordinary Least Squares (OLS) estimator, as the error term
satisfies the exogeneity condition
regardless the equilibrium occurring
. While the OLS estimator provides unbiased and consistent point estimates for the regression’s coefficients, the related standard errors could be invalidated by the presence of heteroscedasticity in the regression’s error term, as
, which is included in
, was shown to be linear in
. For this reason, hypothesis testing is performed considering heteroscedasticity robust
standard errors [
147].
Consistent with Equation (15), a babbling equilibrium occurs if is not statistically different from zero. In case this hypothesis was rejected, the occurrence of a discretionary equilibrium could be accepted provided that the hypothesis is not rejected. Thus, perfect verifiability requires the difference between and to be not statistically significant, provided both being statistically different from zero. As anticipated, estimation and, consequently, hypothesis testing, will be performed assuming and being sector specific.
As the treatment of the measurement error entailed additional statistical hypotheses, the robustness of results is assessed considering the possibility to exclude
from the analysis
, accepting a trade-off between a simpler empirical model and a potential omitted variable bias. In this case, the regression equation reads
having assumed that, in case of a discretionary equilibrium occurred,
could be approximated as linear in
. Results are presented and discussed in the next section.