3.1. Saved Health and Cost-Effectiveness Analysis
Incorporating the prevention of ill-health and premature death into a climate change adaptation measure is certainly plausible (for the critical relevance of public health data for the (public) debate on climate change, see [
9]). The question is how to define and measure ill-health, though. In this respect, Michaelowa and colleagues endorse the DALY. The justification given for this theoretical choice mainly refers to the drawbacks of other approaches and the public rejection of attaching monetary values to human life. Traditional methods to do so make the value of a human life contingent upon the respective person’s income. The human capital approach, for instance, determines “the pay-back” on the “investment” in human capital as the present value of the individual’s future earnings, using market wage rates ([
18] (p. 215f.)). This is especially troublesome when it comes to the comparison of data for industrialized and developing countries, which is common practice in the discourse on climate change ([
4] (p. 3)). Michaelowa and colleagues reckon that since monetary valuation of life is “fraught with ethical and political challenges” ([
7] (p. 67)), it “should be avoided, especially if comparing industrialized and developing countries” ([
6] (p. 2152)). To escape the “endless political debates about an equitable valuation of human life and death” ([
6] (p. 2149) it would be critical “to have a non-monetary indicator that addresses the health benefits of adaptation projects” ([
7] (p. 67); see also [
4] (p. 9)).
Considering non-monetary benefits, Michaelowa and colleagues shift from CBA to a form of CEA, which defines the benefit measure in terms of some unit other than money, usually some physical target ([
12] (p. 16)). In health care, this could be reduction in blood pressure in mmHg. Although such physical measures appear straightforward and easy to understand, they obviously do not allow for comparing the efficiency of projects with different goals. At this point,
generic measures of health, such as DALYs or quality-adjusted life years (QALYs), enter the scene, which allow to capture both premature mortality and the reduced quality of life due to ill-health. The DALY does so by combing the number of
years of life lost (YLL) with the number of
years lived with disability (YLD). The term “disability” is used in a broad sense here and denotes any short- or long-term loss of health ([
19] (p. 2198), [
20] (p. 2130)). To measure the YLDs, each year lived with a disability is attached a disability weight, where 0 represents perfect health and 1 expresses death or a condition equivalent to death. Being a measure of
ill-health and life
lost, the DALY is not a “good” to be saved or maximized but a “bad” to be minimized ([
21] (p. 307)). (The terminology “disability-adjusted life-year” is misleading, since, to speak with Anand/Hanson [
21] (p. 310), “more of a ‘life-year’ (even adjusted for disability) is normally understood as a ‘good’, which should be maximized and not minimized.” Accordingly, there exists some confusion on this matter in the literature on climate change as well, for instance, when Köhler and Michaelowa speak of the “the concept of Disability Adjusted Life Years (DALYs) saved” ([
4] (p. 3); see also [
5] (p. 111)). In an otherwise illuminating paper, Nolt describes the DALY consistently wrong when he claims that “[o]ne DALY is one year of healthy life. […] [E]ach disability is assigned a value between 0 (death) and health (1), lower numbers representing greater severity” and so forth ([
9] (p. 351)). What he says is right—though not with respect to the DALY but for its close relative, the QALY.).
Michaelowa and colleagues hardly discuss the DALY, but their reasons for using it quoted above suggest that the authors consider the DALY as
not being fraught with ethical challenges and as
not prompting political debate. Due to the DALY’s lack of transparency, the latter claim may indeed be correct ([
9] (p. 351)), but the former is far from true. To see this, a closer examination of the DALY and its measurement is illuminating.
The DALY was developed during the first
Global Burden of Disease (GBD) study 1990, launched inter alia by the World Health Organization (WHO), with the purpose of estimating the global burden of health loss due to diseases, injuries, and risk factors (such as tobacco use or high blood pressure) differentiated by age, sex, and geographical region. Adjacent to providing a unit of measurement for monitoring the global burden of disease, the DALY is also intended to serve as an outcome measure within CEA ([
22], (p. 704)). It seems fair to say that the DALY is for public health what a universal metric would be for climate change adaptation. Since the beginning, however, the DALY has ignited a critical debate on its theoretical and methodological foundations and has undergone crucial modifications [
21,
22,
23,
24,
25,
26,
27,
28,
29]. In the following text, the debates on the central issue of what the DALY is supposed to measure in the first place will be illustrated by considering three important steps of the DALY’s development within the GBD framework. The purpose of this endeavor is twofold: for one thing, the analysis supports the thesis that the DALY incorporates lots of controversial normative assumptions. For another, it will become clear that the construction of the DALY as a descriptive measure of ill-health is already heavily influenced by distributional considerations—just as the SW metric proposed by Michaelowa et al. is.
For the measurement of disability weights, preliminary considerations on the DALY proposed six disability classes describing some condition in general terms, an example being the following: “[L]imited ability to perform activities in two or more of the following areas: recreation, education, procreation or occupation” (class 3) or “[n]eeds assistance with activities of daily living such as eating, personal hygiene or toilet use” (class 6) ([
23] (p. 438, Table 2)). These classes were evaluated by a group of medical experts by means of a method called magnitude estimation ([
23], (p. 439)), which asks the respondents a question of the form “How many times worse is one state than another [reference] state?” ([
30] (p. 16)). In this way, class 3 was assigned a weight of 0.400 and class 6 a weight of 0.920. Each disability class was supposed to represent “a greater loss of welfare or increased severity than the class before,” and the weights were intended to measure a disability’s “impact on the individual” ([
23] (p. 438)). However, the disability weights were not supposed to measure the myriad ways in which a health state affects individual well-being (the intricate relationship between “health” and “well-being” and the question of what should be tackled by means of summary measures of population health has been discussed extensively by Hausman [
17]), for such an account would have to take into account the social context of the respective individual ([
23] (p. 437f.)). Instead of tackling the specific
disadvantage caused by a health state, the disability classes and, hence, the weights were supposed to represent the
disability in terms of human functioning ([
23] (p. 438)).
This individualist perspective was thwarted by two other normative choices, though. First, to determine the YLL, the designers of the DALY applied a maximum life expectancy of 82.5 years for women, which equaled the life expectancy for women in Japan, the country with the highest life expectancy worldwide ([
22] (p. 711)). For men, the maximum life expectancy was set to 80 years, where the divergence was supposed to mirror different life expectancies due to biological factors, not lifestyle choices ([
22] (p. 711)). A standardized life expectancy was applied globally because it should not be considered “more important to save the life of a person in a rich country (with greater life expectancy) than to save the life of someone in a poor country” ([
28] (p. 201)). Both choices can be criticized. For one thing, and as pointed out by Lyttkens (ibid.), if “biologically” different life expectancies are taken into account in the case of sex, why not also consider such differences between other social groups, once they are detected? As to the application of a universal life expectancy, it can be argued that it is beside the point to claim that a man living in Sierra Leone and dying at the age of 30 due to a certain disease loses 50 DALYs
attributable to the respective disease, while his life expectancy had only been 38 years anyway ([
26] (p. 5)). (The 38 refers to 1997, the year Murray and Acharya’s paper was published. In 2018, according to the World Bank, the life expectancy in Sierra Leone was 54.309 years. See
https://data.worldbank.org/indicator/SP.DYN.LE00.IN?locations=SL, accessed on 24 August 2020.) While driven by plausible distributional considerations, the use of a universal life expectancy is thus problematic when it comes to the descriptive measurement of the burden of disease due to specific diseases or risk factors.
The second normative choice of concern here refers to differential valuations of the DALY prohibited at different ages, the value being highest for young adults around the age of 25 and lowest for both children under 10 and the elderly over 60 years ([
23] (p. 436)). The reason for attaching age weights to DALYs is stated by Murray as follows: “Higher weights at a particular age does [sic] not mean that the time lived at that age is per se more important to the individual, but because of social roles the social value of that time may be greater” ([
23] (p. 435)). Young adults, so the idea, play an important role “in providing for the well-being of others” (ibid., see also [
22] (p. 718)). That is to say, the well-being of the most vulnerable is supposed to be accounted for by attaching a higher value to the health of those who care for them. (Since I suppose that discussions about discounting future benefits are well known in the adaptation literature, I do not elaborate upon discounting future DALYs here. See [
26] (pp. 695–98) and [
29].) This is highly problematic, though, especially for two reasons. First, this reasoning opens Pandora’s box, as Lyttkens ([
28], (p. 200)) put it, since it implies that a higher value would have to be attached to the health of doctors, nurses, or other persons providing care and crucial services for others as well, whereas the health of persons without children or elderly dependents to take care of would have to be assigned a lower value ([
25], (p. 692)). The same would be true for chronically ill or disabled adults who require care. This seems to be an ethically unacceptable consequence. Second, age weighting based on social considerations, along with using universal life expectancies, introduces distributive considerations as to who should receive priority when it comes to treatment into the DALY, so the resulting number cannot be regarded a purely
descriptive measure of ill-health.
For the final GBD 1990, some aspects of the DALY were reconsidered ([
22,
24]). While age weighting and a universally high life expectancy were adhered to, the question of what to measure was now answered unambiguously regarding a
social evaluation of health: “It can be argued that for burden of disease […] and cost-effectiveness analyses that are intended to inform social choices, a method that directly measures social preferences for health states would be more appropriate than one that measures individual preferences” ([
22], p. 713). In this context, social preferences are preferences individuals have not with regard to their own health but concerning the distribution of health or health care on other persons, not including themselves ([
31], p. 26).
These social preferences were elicited by means of the person trade-off (PTO) method from a group of health care providers from each region of the world convened at the WHO in 1995 ([
22] (pp. 713, 715)). The respondents were confronted with two version of the PTO. In the first version, PTO1, they were asked to compare life extension for a healthy individual with life extension for a person with a disability: “[W]ould you as decision maker prefer to purchase, through a health intervention, 1 year of life extension for 1,000 perfectly healthy individuals or 2000 blind individuals?” ([
22] (p. 714)). In a series of such questions, it was elicited how large the number of blind persons must be so that the respondent is indifferent between the two scenarios. If the number is, say, 8000, the disability weight would amount to 1 minus 1000 divided by 8000, that is, 0.875 ([
32] (p. 1424)). In the second version of PTO, PTO2, the respondents were asked to compare the value of curing a certain number of individuals with disabilities on the one hand with life extension for a certain number of healthy persons on the other: “[H]ow many people cured of blindness do you consider equal to prolonging the lives of 1000 healthy people?” ([
32] (p. 1424)) If the disability weights derived from PTO1 were inconsistent with those inferred from PTO2, the respondents were instructed to reconcile their answers ([
24] (p. 36)).
However, it is questionable whether different weights resulting from the two PTOs present an inconsistency in the first place, since they ask for quite different things. Consider PTO1 first: the question assumes that a life-year gained for (or better: in) a disabled person is of less value than a life-year gained in an otherwise healthy person. Yet, the respondents might believe that both groups of persons have equal claims to the life-saving procedure ([
32] (p. 1424)). In terms of value, this means that they regard prolonging the life of 1000 healthy persons and prolonging the life of 1000 disabled persons as equivalent. This, however, implies a disability weight of zero, which means that a blind person counts as perfectly healthy and has no claim to health care when it comes to resource allocation. This is why such results were considered irrational. The problem here, again, is that the task of descriptively evaluating how bad a health state is
for the respective individual is meshed up with distributional considerations about resource allocation. Note that at this point, we are back to the very problem of valuing human lives, which Michaelowa and colleagues were so eager to evade by using the DALY in the first place.
PTO2, by contrast, does not entail any devaluation of life-years gained by a certain group of patients and might therefore lead to a different disability weight for the health state at stake. In effect, the respondents might interpret the PTO1 scenario as an issue of distributive justice, that is, social value, whereas they might read PTO2 as asking for their evaluation of the health state in question, i.e., individual value (see also [
28] (p. 197)). Asking the respondents to resolve the apparent inconsistency not only forces them to attach less weight to life-years gained for the disabled but also renders the resulting disability weights meaningless ([
32] (p. 1424), [
27] (p. 121)).
In the course of the GBD 2010, the DALY’s conceptual, normative, and methodological foundations have been substantively revisited ([
19]). To begin with, the GBD now decidedly seeks “to quantify health loss rather than welfare loss” ([
20] (p. 2130)) so that the disability weights are taken to “reflect the general population judgment about the ‘healthfulness’ of defined states, not any judgments of quality of life or the worth of persons or the social undesirability or stigma of health states” ([
33] (p. 12)). Accordingly, the age weights have been dropped, whereas the application of a universal life expectancy is still adhered to ([
19] (p. 2199)). (Life expectancy is no longer differentiated between men and women, though, and YLL are calculated using a new reference-standard life expectancy at each age.) For measuring the “healthfulness” of different health states, two new methods were developed, and the respective questions were answered by a sample of the public from all over the world. To be able to discuss what the methods elicit, the questions need to be considered at length. The first method, a paired comparison, looks as follows:
“Now, we want to learn how people compare different health problems. […] I will ask you to tell me which person you think is healthier overall, in terms of having fewer physical or mental limitations on what they can do in life. […] There are no right or wrong answers to these questions. Instead, we are interested in finding out your personal views.
The first person [[…] has mild tremors and moves a little slowly, but is able to walk and do daily activities without assistance.
The second person [[…] has some trouble remembering recent events, and finds it hard to concentrate and make decisions and plans.
Who do you think is healthier overall, the first person or the second person?” ([
34] (p. 2)).
While this type of question focuses on chronic disabilities, the following, so-called population health equivalence method, draws on the PTO and seeks to elicit trade-offs between mortality and non-fatal health states:
“The last questions will ask you to compare the overall health benefits produced by two different programs. Imagine there were two different health programs.
The first program prevented 1000 people from getting an illness that causes rapid death.
“The second program prevented [Number selected randomly from {1500, 2000, 3000, 5000, 10 000}] people from getting an illness that is not fatal but causes the following lifelong health problems: [[…] Some difficulty in moving around, and in using the hands for lifting and holding things, dressing and grooming.
Which program would you say produced the greater overall population health benefit?” ([
34] (p. 3))
Whereas the first question asks the respondents whom of two patients they consider “healthier overall,” the latter asks them for the program producing “the greater overall population health benefit” (ibid.). This way of eliciting disability weights is highly problematic, though, and there are three reasons why.
First, regarding the population health equivalent, the authors stress that they do not want to mesh up the measurement of health on the one hand with distributional issues on the other when they state, “In keeping with the focus in the GBD 2010 on the construct of health loss, we explicitly avoided framing of the question in terms of resource allocation decisions, as this framing may evoke distributional concerns that are orthogonal to the health construct” (ibid.). While it is true that they do not explicitly ask the respondents to make a (hypothetically) distributional decision, neither do they ask them to imagine themselves in the respective health state. Instead, they choose a consequentialist account that asks for some abstract sum of health aggregated above all persons concerned. In doing so, the question presumes that the respondents are able and willing to evaluate abstract units of health, add these values up, and draw a balance sheet across the patients concerned. Yet, this consequentialist manner does not seem to be the most natural way to understand the population health equivalent. Most likely, the respondents do not think of the “amount” of health produced by each program, but rather regard it as a distributional task, which makes them consider which program should be realized and, thereby, which group of patients should be treated.
Second, without considering the impact of health on well-being or quality of life, it is conceptually unclear what it means to say that one of two persons is “healthier overall.” Health is a multidimensional concept, and different health states imply quite different limitations on what one can do. These differences cannot simply be measured in terms of “more” or “less.” To see this, try to figure out who is healthier in terms of facing fewer limitations on what he or she can do: (i) a person sitting in a wheelchair, (ii) a blind person, (iii) a person suffering severe migraine attacks two or three times each month, or (iv) a person with arthrosis and constant pain in the joints. Considering this question, two conclusions can be drawn. First, without balancing the different dimensions of health somehow, it is impossible to tell who is healthier as such. In addition, while it is possible to say whether one person is healthier than another as long as only one dimension is at stake (e.g., a person blind in one eye is “healthier” than a person blind in two eyes), the relevance of such comparison is rather limited. This is because, second, the severity of the limitations associated with a health state depends on the latter’s effect on the person’s quality of life and socially available activities and on an assessment of the relative importance of these activities ([
17] (pp. 54f.)). Hence, health as such cannot be
quantified but must be
valued ([
17], p. 42).
Third, and intricately connected with the aforementioned issue, it is impossible to separate the valuation of health from the social context. This claim can be substantiated by considering an example taken from Voigt and King [
35]: Imagine two countries with equal numbers of persons with impaired vision. In one country, corrective lenses are available, so the persons concerned hardly suffer any negative consequences from their condition at all. In the other country, by contrast, there are no corrective lenses and persons with impaired vision have difficulties finding jobs, gaining a sufficient income, and so forth. The designer of the GBD 2010 in effect argue that although the impact of impaired vision on well-being may vary between those countries, the health state as such does not, and this is the invariant construct they want to measure. However, even if one is willing to consider the persons in both countries as equally healthy, when it comes to allocating resources, it seems reasonable to say that it should make a difference of how their health state actually affects their life ([
35] (p. 227)).
To conclude, measuring units of health as such is impossible and asking the respondents to compare programs in terms of the overall population health benefit presupposes a framework they are not accustomed with. While I argued that the respondents probably understand the population health equivalent as a distributional choice, the meaning of their answers to the paired comparisons and, thus, the respective disability weights remains obscure ([
36], [
17] (p. 42)).