1. Introduction
Phylogenetic systematics, popularly known as cladistics, is the dominant method of phylogenetic analysis of fossil taxa. When cladistics was first widely applied to fossils, it was extolled as a method to formulate phylogenetic hypotheses independent of stratigraphic (temporal) data [
1,
2,
3]. Thus, in many groups of fossil taxa, notably vertebrates, cladistic phylogenies (cladograms) replaced the prevalent mode of phylogenetic analysis in which stratigraphic distribution had long played a central role in identifying putative ancestors that are geologically older than their descendants.
However, some workers later advocated incorporating stratigraphic data directly into cladistic analysis, and others have claimed that stratigraphic data are needed to evaluate the veracity of cladistic hypotheses. Some workers have even argued that cladistic hypotheses inform the stratigraphic distributions of taxa and their use in correlation. Here, I examine four ways in which cladistics has been used to evaluate or modify the stratigraphic distributions of taxa or the use of taxa in correlation. I conclude that cladistic analyses do not inform biostratigraphy and biochronology without employing assumptions of questionable validity. Nevertheless, I note that not all workers make such assumptions, nor do all connect cladistics and stratigraphy in the ways discussed below.
2. Some Terminology
For simplicity, I use the term stratigraphy to encompass both biostratigraphic distributions and biochronological correlations. Biostratigraphy is the study of the distribution of fossils in strata. Although it is often used to refer to temporal subdivision and age determination, those roles are more accurately assigned to biochronology. Biochronology is the discrimination of intervals of geologic time based on the stratigraphic distributions of fossil taxa. Biostratigraphy provides the basis for biochronology, and biochronological constructs are the basis of chronostratigraphy.
3. Cladistics and Stratigraphy
The classical method of reconstructing phylogeny relies on ordering fossils stratigraphically and then connecting what seem to be reasonable morphological transformations to create an evolutionary tree. This method continues to dominate the reconstruction of microfossil phylogenies, which are based on the fossils of organisms with an extensive fossil record that is readily ordered stratigraphically [
4]. However, most other fossil taxa, particularly those with less dense fossil records, have long had their phylogenies created by cladistic analysis without the use of stratigraphic data.
Nevertheless, stratigraphic data have been brought into cladistic analyses by those who argue that phylogenetic analyses need to be evaluated with stratigraphic data. These workers either incorporate stratigraphic data directly into the analyses (stratocladistics, see review by [
5]) or argue that the cladistic analysis needs to be compared to stratigraphic distributions to evaluate the accuracy of the cladistic branching pattern [
6].
If stratigraphic data can inform and/or evaluate cladistic analyses, can cladistic analyses be used to evaluate and inform biostratigraphy and biochronology? To some cladists the answer is yes, well reflected by Norell [
7] (p. 114), who wrote that “study of temporal pattern and evolutionary rate can be more rigorously accomplished if the assumption of phylogeny and the measurement of phyletic longevity is retrieved from cladistic structure integrated with the fossil record”. That answer, however, is predicated on substantial assumptions, most or all of which are of questionable validity, as discussed below. Below, I examine the four ways in which cladistics has been applied to stratigraphy.
3.1. Ghost Lineages
Norell [
7] coined the term ghost lineage to refer to the unknown/inferred stratigraphic range that connects a taxon to the time of splitting from its sister taxon (
Figure 1). A ghost lineage thus is a temporal range invented for a taxon based on a cladistic hypothesis. Such invention relies on an assumption inherent to cladistic analysis, namely that all taxa arise by dichotomous splitting of lineages. Of course, this is known not to be true, and it is simply a methodological assumption built into cladistic analysis. For example, if evolution proceeds by anagenesis or by the rectangular speciation patterns of punctuation, then the oldest age of two sister taxa (or ancestor-descendant taxa) are not the same [
8].
A good example of ghost lineage construction and the accompanying assumptions comes from a study of angiosperm origins by Doyle and Donoghue [
9]. They determined in a cladistic analysis that the closest relatives of angiosperms have their oldest fossil records in the Late Triassic. Therefore, identification of the ghost lineage indicates that the oldest angiosperms are Late Triassic (
Figure 1), even though their known fossil record does not extend back before the Cretaceous (or perhaps the Middle Jurassic: [
10]). Note, though, that other ghost lineage interpretations are possible here, for example, that there are other lineages present of pre-angiosperm fossils in the Late Triassic. Smith [
11] (p. 140) concluded that “Doyle and Donoghue’s phylogenetic approach provides evidence for a Late Triassic origin of angiosperms, and emphasizes how the fossil record can seriously underestimate taxonomic ranges”. However, the Late Triassic fossil record of plants is extensive, and no Late Triassic angiosperm fossils are known [
12]. Why should a hypothetical ghost lineage that is at least 60 million years long based on a methodological assumption indicate that the taxonomic range of angiosperms is underestimated? Underestimation is possible, of course, but actual fossil data will be needed to evaluate that hypothesis.
In another example, Pachut and Anstey [
13,
14] used a cladistic analysis of the species of the Ordovician bryozoan
Peronopora to create ghost lineages. These ghost lineages identify the appearance of all 16 species of the genus in a narrow stratigraphic interval. This is much different from the actual stratigraphic ranges of the species (particularly long is the ghost lineage of
P. pauca). Yet, the invented ghost lineages lead to a total reinterpretation of the evolution of the
Peronopora species that can only be described as hypothetical, as it is not based on actual data.
It is important to realize that cladistic ghost lineages are stratigraphic ranges invented to match the hypothetical cladistic branching pattern. They simply assume that stratigraphic distributions correspond to the branching pattern of the cladogram. However, as Eldredge and Gould [
8] (p. 38) remarked, “we can never be sure that any two ‘sister species’ are the actual products of a single split in an ancestral species”. Or, as Haug and Haug [
15] (p. 10) concluded when plotting different cladistic analyses onto stratigraphy, “in most cases the time of a splitting event appears to be either over- or under-estimated”. Ghost lineages thus are hypothetical stratigraphic ranges rooted in cladistic assumptions. The ghost lineage is an invented stratigraphic range, not an actual stratigraphic range, and should be treated as such.
3.2. Congruence
A substantial way in which stratigraphic data and cladistic analysis have met is through testing the congruence between cladistic branching patterns and stratigraphic (temporal) distributions; various testing algorithms have been proposed [
5,
16,
17,
18,
19,
20,
21,
22]. This is done to evaluate the match of the cladistic branching patterns to stratigraphic distributions and largely presented as a method of estimating the completeness/incompleteness of the stratigraphic record. Not surprisingly, these methods find much congruence, especially for taxonomic groups with a “good” (extensive and stratigraphically dense) fossil record, and incongruence where the fossil record is not so “good.” This is because advanced (derived) taxa usually appear later in the fossil record because that is the way evolution works—advanced taxa descend from primitive taxa. The number of cases in the congruence analyses in which the oldest records of ancestors in the fossil record are younger than those of descendants (or vice versa) is actually in the minority, challenging the emphasis many cladists put on the incompleteness of the stratigraphic record.
Indeed, there is an inherent circularity in the congruence analysis. Thus, the taxa in the cladistic analysis are those known from the fossil record, however incomplete it may be. If the fossil record is as incomplete as cladists posit, then how complete are the phylogenetic analyses they present, given that they are based on a very incomplete fossil record? Comparing the branching pattern on the cladogram to the stratigraphic record is thus comparing it to a dataset it is based on, so a high amount of congruence should not be surprising. Indeed, it is better to look for incongruence because congruence should be assumed in most cases.
3.3. Correlating Fossil Assemblages
Martinez [
23], Fara and Langer [
24] and Makovicky [
25] have devised methods that use cladistic analysis to temporally order and correlate fossil assemblages. Martinez [
23] referred to his method as cladochronograms, Fara and Langer [
24] called their method phylogenetic biochronology and Makovicky [
25] called his method cladistic biochronological analysis.
Martinez created trees that order faunas from oldest to youngest based on the content of derived species in analyzed lineages. Fara and Langer used a branching index based on the number of nodes between a taxon and the base of the cladogram to give a “superpositional score” to determine the temporal order of fossil assemblages. Makovicky’s method analyzes a matrix of cladistically analyzed taxa to arrive at a cluster analysis of fossil assemblages. These techniques evaluate the taxa to be used in the analysis based on their cladistic relationships, in contrast to well known cluster analyses of fossil assemblages that usually cluster simply on the basis of shared taxa, e.g., [
26]. The cladistic methods, again, assume the cladogram is “correct”, and that all evolution takes place by dichotomous splitting.
What these three studies do is to identify taxa of value to correlation based on their positions on cladograms and then cluster those taxa to provide correlations. The inherent assumption is that more derived taxa (higher nodes on the cladogram) are younger than more primitive taxa (lower nodes), so they are based on correlating stage of evolution of taxa. The general conclusion that stage of evolution is inferior to correlation by shared taxa may explain why these cladistic methods of correlating fossil assemblages have found few followers. Thus, a Google Scholar search reveals only 15 citations of Makovicky’s article [
25] in 15 years, most of them focused on an example he presented of his method applied to the correlation of Upper Cretaceous vertebrate fossil assemblages in Mongolia. Martinez [
23] has fewer citations (10 in Google Scholar), and Fara and Langer [
24] even less (2 in Google Scholar).
3.4. Monophyletic vs. Paraphyletic Taxa
Lucas and Kondrashov [
27] introduced the term cladotaxonomy (cladistic taxonomy of [
28]) and defined a cladotaxon as a low-level taxon (genus or species) that corresponds to a clade in a cladistic analysis. Cladotaxonomy is thus alpha taxonomy done by cladistic analysis; it is designed to eliminate paraphyletic taxa and only identify monophyletic groups as taxa e.g., [
28,
29].
The relevance of cladistics to alpha taxonomy employed in stratigraphic analyses can readily be seen in the argument that only monophyletic taxa are of value to biostratigraphy. Vanderlaan and Ebach [
30] claim that monophyletic taxa are both more stable than “aphyletic taxa” (mostly paraphyletic taxa) and that they increase precision in correlation. However, in many cases, monophyletic taxa only remain stable until the next cladogram. Different workers with different cladograms do not identify the same taxa as monophyletic or paraphyletic, so monophyletic taxa are not necessarily stable. Furthermore, how monophyletic taxa increase the precision of correlation is unclear, especially considering the following example.
Padian et al. [
31] (p. 74) state that “paraphyletic groups may suggest both unnaturally short and long ranges, which suggest spurious correlations”. Nevertheless, this is merely a declarative statement. An example from Padian et al. [
31] makes this point (
Figure 2). Thus, Padian et al. [
31] noted that, at the level of identifying two time intervals, 1 and 2, the paraphyletic taxon A-B-C-D has the same temporal value as the monophyletic taxon D-E because they both indicate times 1 plus 2. They then note that if D is in the paraphyletic group “it is less useful for stratigraphic purposes” and that “the stratigraphic picture is muddied”. However, this is questionable, especially when we examine the stratigraphic ranges of the other paraphyletic groups in this analysis, A-B, A-B-C and B-C-D. All indicate time 1, so how do these paraphyletic groups produce a “muddied” stratigraphic picture? The fact is, each of the taxa in the cladistic analysis in
Figure 2 has its own stratigraphic range of significance to correlation. Furthermore, regardless of its membership in a paraphyletic or a monophyletic group, the stratigraphic range of D alone indicates a relatively short time interval that encompasses the later part of time 1 and the early part of time 2, so it is a taxon of great value to correlation. Thus, whether or not a cladistic analysis deems taxa members of a paraphyletic or monophyletic group is not of significance to their value in correlation. Every taxon has a stratigraphic range that determines its value to correlation.
4. Assumptions
Stratigraphic correlations using fossils are based on what Lucas et al. [
32] called the “index taxon hypothesis”. Thus, the fossil records of an index taxon are hypothesized to fall within a discrete interval of time used in correlation. This hypothesis is readily tested by new discoveries of the index taxon to test its stratigraphic range and by comparing its age at various localities to the ages indicated by other index taxa and by other methods of age determination such as radioisotopic dates or magnetostratigraphy.
Whenever a cladogram is used to evaluate the stratigraphic distribution of fossil taxa and their use in biostratigraphy/biochronology, a set of assumptions is being added to the biostratigraphy, e.g., [
6,
15,
32,
33]. All of the assumptions in the making of the cladogram, including outgroup selection, choice of and scoring of characters and assessment of character polarities, among others, are now part of the evaluation of the stratigraphy. The assumption that the cladogram is the “correct” one is a large assumption, especially given that the correctness of a cladogram can be impossible to determine. Thus, any cladistic analysis produces alternative trees of equal value, and multiple workers tend to produce different cladograms of the same taxonomic groups. Which cladogram is “correct” and can provide a valid assessment of stratigraphic distribution and correlation?
In cladistic analyses, new taxa are only created by dichotomous branching, though we know that there are other evolutionary modes, including anagenesis and punctuation. So, when stratigraphic ranges are matched to cladograms they are being matched to the assumption that the oldest ages of sister taxa are the same. Cladistic analyses that temporally order taxa based on their position on cladograms are stage of evolution correlations that have been little used. Claims that paraphyletic taxa are less useful in correlation than monophyletic taxa are little more than unjustified declarative statements. They ignore the fact that all taxa have stratigraphic ranges, and those ranges determine their value to correlation, not their status as monophyletic or paraphyletic based on an a posteriori cladistic analysis.
5. Conclusions
Cladistic analyses of phylogeny were originally extolled as phylogenetic analysis independent of stratigraphy (the ages and temporal durations of taxa). However, using cladistic phylogenies to evaluate stratigraphy necessitates various assumptions that include: (1) all the assumptions needed to construct the cladogram; (2) assuming that taxa only arise by dichotomous splitting; and (3) assuming the cladogram is “correct”.
It is fair to say that using cladistics to inform stratigraphy has found no followers outside of some practitioners of cladistics. Thus, the latest comprehensive geological timescale, published as two volumes of 1357 pages [
34], relies heavily on biostratigraphy and biochronology to construct the Phanerozoic chronostratigraphic scale, and even includes chapters that review the biostratigraphy and biochronology of the major fossil taxa used in the chronostratigraphy. Yet, no mention is made of cladistics, simply because the assumptions that need to be made to use cladistics to inform biostratigraphy and biochronology are not accepted by the chronostratigraphic research community. Stratigraphy may inform cladistics, but for cladistics to inform stratigraphy, various assumptions, some questionable at best, need to be made.