Sparse Radiocarbon Data Confound Culture-Climate Links in Late Pre-Columbian Amazonia
Round 1
Reviewer 1 Report
This is a very well-written and well-presented paper and the author should be congratulated. The paper breaks new theoretical ground in identifying what mathematical models derived from radiocarbon data are best suited to understanding the human response or non-response to specific instances of climate change. The analysis has been chosen and parameterized carefully. All the data is available and the inclusion of the source code of the analytical procedures used here makes the study fully replicable. I agree with the author's conclusions.
I have very little suggestions to add to what is a well-conceived and executed study. The only passage that confused me was the relation between sample size and temporal scale on line 190 -- do you not mean 'over an extremely wide time range'?
Author Response
Reviewer 1
This is a very well-written and well-presented paper and the author should be congratulated. The paper breaks new theoretical ground in identifying what mathematical models derived from radiocarbon data are best suited to understanding the human response or non-response to specific instances of climate change. The analysis has been chosen and parameterized carefully. All the data is available and the inclusion of the source code of the analytical procedures used here makes the study fully replicable. I agree with the author's conclusions.
I have very little suggestions to add to what is a well-conceived and executed study. The only passage that confused me was the relation between sample size and temporal scale on line 190 -- do you not mean 'over an extremely wide time range'?
The original text is correct: the trade-off is between low confidence over a long period, or high confidence over an extremely short period. To pick up the signals over the time frame advanced by De Souza et al, we need to accept a debilitating level of uncertainty. I have clarified the order of clauses in the sentences that explain this dynamic, as they switched places a couple of times, likely adding to the confusion. Per Reviewer 4 I have also clarified the terminology.
Reviewer 2 Report
This paper aims at testing claims regarding the existence of adaptive cycles in Amazonian archaeology through a thorough exploration of the underlying 14C record. This test is undertaken using a set of state-of-the-art quantitative methods which are clearly set out, and supported by shared R code.
Overall, I find this methodological contribution extremely interesting, methodologically sound, and convincing. I strongly recommend it for publication.
I only have two - minor - suggestions. First, a simple table summarising the number of dates, sites, and bins per regions would clearly illustrate, prior to any further analyses the scarcity of the available dataset. Second, whilst obviously the purpose of the present paper is not to rewrite Amazonian archaeology, I wonder if the author could perhaps briefly comment on how to assess the problems raised here (apart from the obvious suggestions of digging more and dating more samples)
Author Response
Reviewer 2
This paper aims at testing claims regarding the existence of adaptive cycles in Amazonian archaeology through a thorough exploration of the underlying 14C record. This test is undertaken using a set of state-of-the-art quantitative methods which are clearly set out, and supported by shared R code.
Overall, I find this methodological contribution extremely interesting, methodologically sound, and convincing. I strongly recommend it for publication.
I only have two - minor - suggestions. First, a simple table summarising the number of dates, sites, and bins per regions would clearly illustrate, prior to any further analyses the scarcity of the available dataset. Second, whilst obviously the purpose of the present paper is not to rewrite Amazonian archaeology, I wonder if the author could perhaps briefly comment on how to assess the problems raised here (apart from the obvious suggestions of digging more and dating more samples)
I have added a table with the descriptive statistics of the radiocarbon data – this is a useful addition, as many readers will not have an entry-point into the distribution of the data. I have also offered some concluding thoughts on how future work might circumvent the limitations I have highlighted, per the comments from Reviewer 4 also.
Reviewer 3 Report
This paper provides a reanalysis of a recently published set of 14C dates from across the Amazonian region. The original paper (de Souza et al. 2019) suggests that two different land-use strategies – intensive and extensive – are associated with two different phases in the adaptive cycle. While “intensive” societies declined during time of climatic change, as they were approaching the end of the adaptive cycle, those associated with “extensive” land-use practices were more resilient. The original analyses were based on a set of 337 14C dates that were modeled, summed, and compared to localized paleoclimate proxy records.
The reanalysis presented in the present study suggests that the de Souza et al. analyses contain three primary biases that impact the results: 1) arbitrary discard of dates, 2) absent sensitivity analyses, and 3) imprecision/uncertainty of the dates and the calibration curve. Below I evaluate each one of these.
Arbitrary discard of datesThe author cites that, after modeling, several dates (n=11) were “arbitrarily” discarded. Modeling prior data to create summed probabilities based on the posterior models should not be considered unusual or inadequate, however. In fact, goal of Bayesian analyses in radiocarbon chronology building is to constrain probability distributions for directly dated events, including discarding statistically identified outliers. Traditional statistical analysis of radiocarbon dates, on the other hand, relies simply on comparing the probability distributions of individual dates to determine the likelihood that two events (i.e. two dates) are sequential or contemporaneous. With out modeling, SPDs reproduce this traditional approach. I think that modeling the dates into phases prior to summing likely produces a more realistic picture of the rises and falls of archaeological phenomena. While outliers may be associated with some features of the archaeological cultures discussed in the text, the SPDs from de Souza et al. do capture general socio-political trends (start, apex, and end of cultural phase). Additionally, while the author of the present study suggests that SPDs are a demographic proxy, it seems to me after reading the original study if SPDs are more likely correspond with these general trends and are not associated with population levels, per se.
The author of the present study evaluates the exclusion of modeled outlier dates identified by de Souza et al. by re-analyzing the full and truncated datasets, the visual results of which are presented in Figure 1. When examining Figure 1, it appears, however, that while the inclusion of the outlier dates does impact the shape of the SPD in some instances, the general patterns of rise and decline still hold. In other words, the “peaks” of the SPDs, and their downturns, are still the same. When considering the overall shape of the SPD (“full dataset”), it is virtually indistinguishable. This could possibly be attributed to size of the figure. The small size of the figure may have been automatically determined, but a larger image would be easier to visually evaluate.
Absent sensitivity analysisOn this point, I think the author of the present study is spot on. This type of evaluation is necessary to provide confidence intervals attached to the SPDs. I think this is the strongest part of the paper. The author convince shows that some SPDs deviate from the null expectation. However, I do not necessarily think that this has anything to do with responses to climate. It would be interesting to plot the simulated dataset against the paleoclimate dataset to look at correspondences. Are there similar trends?
Imprecision/uncertainty of dates and curveSeveral biases in all SPD analyses should be addressed in all analyses of this type, and the author points these out in the final sections. These include issues of sample size, measurement precision, and the effects radiocarbon curve of the shape of SPDs. Some authors (e.g. Williams 2012) have suggested that small numbers 14C dates (n<200) with large associated uncertainties (>±100 yr) can potentially produce variability within summed probability distributions when they are not systematically sampled. However, several others have that when smaller samples of modeled 14C dates are subjected to chronometric hygiene criteria and sampled from primary contexts, they can document construction events that closely correspond with general socio-political trends (e.g. Ebert et al. 2017, https://doi.org/10.1016/j.quascirev.2017.08.020).
While I am not an expert on the archaeology of Amazonia, it appears that “relatively” small datasets (compared to regions like western Europe, SW United States) are common is all the regions discussed in both papers. Unfortunately, this is common in many parts of the world where little dating work has taken place. As long as chronometric hygiene standards eliminate problematic dates (e.g. 14C dates with larger error ranges greater than ±100 yr), however, then sometimes 20-30 dates is really the best that can be hoped for. The dataset considered for the present analyses contains 19 dates with error ranges greater than 100 yr. In fact, most dates (n=249) have error ranges less than 50 yr. Compared to places like parts of Mesoamerica, this is excellent!
The author also argues for issues with the calibration curve. While SPDs intersecting steeper parts of the curve tend to be over-represented, resulting in peaks, plateaus cause calibrated date ranges to be underrepresented within larger datasets, thus leveling out the SPD. The author specifically references centennial-scale plateaus in the IntCal13 and SHCal13 radiocarbon curves (line 171). Is the combined IntCal and SHcal curve somehow different in terms of these biases? If so, a graphic illustration would be interesting to evaluate. A plot of the SPDs (full dataset) as well as the KED against the curves would serve to illustrate the point that the curve is influencing the patterns. If there’s not enough room in the text, this could be in supplemental documentation.
The author uses the KED to illustrate that the interpretations of beginning and ends of cultural phases are incorrect. However, I’m still seeing similar patterns with respect to the identified climate episode. The argument of de Souza et al. isn’t that a culture began and ended abruptly with some climatic anomaly, but rather that there was cultural change with the SPD as a proxy. I still see changes (rise and fall of KED) around that could correspond with the indicated climate change. Figure 3 is also unclear for the reader. While the coloration is the same as previous, it would be helpful to indicate which culture is which on the plot for ease of reference (especially for people who are not as familiar with the culture history of the pre-Columbian Amazon).
Finally, this paper evaluates only one dataset – 14C dates – and does describe if the archaeological data are contradictory to the interpretations de Souza et al. I realize that this is a special issue, and likely space is limited and the paper is not stand-alone, but as a reviewer I am lacking some context. Additionally explanations of what the author thinks is happening archaeologically would be helpful to illustrate their points. How would the author proceed in future analyses? The abstract suggest that the present analyses help overcome the sampling biases in the de Souza et al. study, but I am not sure what suggests are actually made as to what the next steps should be (more analyses, more dating work, etc.).
Other minor comments
Line 68 – the author states they used a custom calibration curve. State if this was the same curve as used in the original study. If not, why was this one used? Line 153 – use of “Summarising:” is not appropriate. Please rephrase. A map would be helpful for readersAuthor Response
Reviewer 3
This paper provides a reanalysis of a recently published set of 14C dates from across the Amazonian region. The original paper (de Souza et al. 2019) suggests that two different land-use strategies – intensive and extensive – are associated with two different phases in the adaptive cycle. While “intensive” societies declined during time of climatic change, as they were approaching the end of the adaptive cycle, those associated with “extensive” land-use practices were more resilient. The original analyses were based on a set of 337 14C dates that were modeled, summed, and compared to localized paleoclimate proxy records.
The reanalysis presented in the present study suggests that the de Souza et al. analyses contain three primary biases that impact the results: 1) arbitrary discard of dates, 2) absent sensitivity analyses, and 3) imprecision/uncertainty of the dates and the calibration curve. Below I evaluate each one of these.
Arbitrary discard of dates
The author cites that, after modeling, several dates (n=11) were “arbitrarily” discarded. Modeling prior data to create summed probabilities based on the posterior models should not be considered unusual or inadequate, however. In fact, goal of Bayesian analyses in radiocarbon chronology building is to constrain probability distributions for directly dated events, including discarding statistically identified outliers. Traditional statistical analysis of radiocarbon dates, on the other hand, relies simply on comparing the probability distributions of individual dates to determine the likelihood that two events (i.e. two dates) are sequential or contemporaneous. With out modeling, SPDs reproduce this traditional approach. I think that modeling the dates into phases prior to summing likely produces a more realistic picture of the rises and falls of archaeological phenomena. While outliers may be associated with some features of the archaeological cultures discussed in the text, the SPDs from de Souza et al. do capture general socio-political trends (start, apex, and end of cultural phase). Additionally, while the author of the present study suggests that SPDs are a demographic proxy, it seems to me after reading the original study if SPDs are more likely correspond with these general trends and are not associated with population levels, per se.
The author of the present study evaluates the exclusion of modeled outlier dates identified by de Souza et al. by re-analyzing the full and truncated datasets, the visual results of which are presented in Figure 1. When examining Figure 1, it appears, however, that while the inclusion of the outlier dates does impact the shape of the SPD in some instances, the general patterns of rise and decline still hold. In other words, the “peaks” of the SPDs, and their downturns, are still the same. When considering the overall shape of the SPD (“full dataset”), it is virtually indistinguishable. This could possibly be attributed to size of the figure. The small size of the figure may have been automatically determined, but a larger image would be easier to visually evaluate.
In principle, I agree with all of the points made by Reviewer 3 regarding the modelling of Bayesian priors in radiocarbon chronology. The issue in the case of De Souza et al is that the procedure leads to dates on stratigraphically- and archaeologically-associated charcoal being discarded. The examples cited in the text are not direct dates on Paredão/Guarita ceramics from the Hatahara site, but charcoal samples recovered in situ with archaeological material, and therefore likely do not represent discrete depositional events. The original publications make this clear, but space does not permit me to individually evaluate each of the 11 discarded dates. De Souza et al do not, as far as can be told from their publication, employ stratigraphic information to constrain the probability distributions of each date – they simply combine all phase-affiliated dates into single sequences that cross-cut sites within their regions and impose a uniform prior.
The critique of this procedure in De Souza et al. stems from the fact that the uniform prior is clearly inappropriate for the distribution of dates (as Reviewer 3 correctly points out, a rise, peak, and fall – typically more reminiscent of a bell curve, but also a number of other different qualitative shapes none of which resemble a uniform distribution). The uniform prior ultimately leads to the discard of dates in an already very small dataset. Given that the main archaeological basis for adaptive cycling is to be found in the separation in time of the paired phases, the overall effect is to exaggerate this separation on essentially spurious grounds. This point is independent of whether modelling Bayesian priors (or not) is a worthwhile practice in itself. It should be considered pure luck that De Souza et al.’s method produced SPDs which are not too different from using the full dataset, which would have been far more serious.
Prior to submitting the paper, I contacted Jonas De Souza directly regarding exactly, what, the SPDs represent. The original paper implies it is levels of human activity associated to given cultural practices (employing terms like ‘apex’ and ‘decline’, etc.). He himself replied to me that it is the duration of different archaeological cultures. Their paper also extensively cites works that do employ archaeological radiocarbon as a demographic proxy. I, personally, think it’s fair to say that this is the case for De Souza et al. in Nature Eco&Evo too, however, I recognise that not all may agree. I have amended instances where I have referred to the SPDs and KDEs as demographic proxies per se, and used alternative terms more commensurate with trends in, or levels of, archaeological activity.
Regarding the small size of the figures, I have added high-resolution pdfs of the figures to the supplementary information.
Absent sensitivity analysis
On this point, I think the author of the present study is spot on. This type of evaluation is necessary to provide confidence intervals attached to the SPDs. I think this is the strongest part of the paper. The author convince shows that some SPDs deviate from the null expectation. However, I do not necessarily think that this has anything to do with responses to climate. It would be interesting to plot the simulated dataset against the paleoclimate dataset to look at correspondences. Are there similar trends?
I appreciate the interest of the reviewer in this regard. I suspect that no, there aren’t any trends that correspond to climatic events in the permutation testing, but it is my judgement that this point is made most strongly in the comparisons between KDEs, where I have indicated regional climatic events named by De Souza et al. and referred to them in-text.
Imprecision/uncertainty of dates and curve
Several biases in all SPD analyses should be addressed in all analyses of this type, and the author points these out in the final sections. These include issues of sample size, measurement precision, and the effects radiocarbon curve of the shape of SPDs. Some authors (e.g. Williams 2012) have suggested that small numbers 14C dates (n<200) with large associated uncertainties (>±100 yr) can potentially produce variability within summed probability distributions when they are not systematically sampled. However, several others have that when smaller samples of modeled 14C dates are subjected to chronometric hygiene criteria and sampled from primary contexts, they can document construction events that closely correspond with general socio-political trends (e.g. Ebert et al. 2017, https://doi.org/10.1016/j.quascirev.2017.08.020).
While I am not an expert on the archaeology of Amazonia, it appears that “relatively” small datasets (compared to regions like western Europe, SW United States) are common is all the regions discussed in both papers. Unfortunately, this is common in many parts of the world where little dating work has taken place. As long as chronometric hygiene standards eliminate problematic dates (e.g. 14C dates with larger error ranges greater than ±100 yr), however, then sometimes 20-30 dates is really the best that can be hoped for. The dataset considered for the present analyses contains 19 dates with error ranges greater than 100 yr. In fact, most dates (n=249) have error ranges less than 50 yr. Compared to places like parts of Mesoamerica, this is excellent!
I cannot comment on the state of play in Mesoamerica, as it is well outside my realm of expertise. It appears that the paper by Ebert et al. (2017) employs a dataset three times larger than De Souza et al. as a basis for comparison. As I note in lines 170-173, the aggregate analysis of radiocarbon usually rests on hundreds if not thousands of dates per sample. The methods for doing so have been developed in a context (the global north) where this is feasible. Unfortunately, very little sensitivity testing has gone into estimating the lower bounds of where it is possible to maintain confidence in ever-shrinking datasets. I think I have adequately demonstrated that, for climatic events that occurred on centennial timescales or below, a few dozen radiocarbon dates from a handful of sites are not sufficiently resolved in time to discern a climatic effect on culture. Moreover, the dates employed by De Souza et al. do not represent all of the dates available for a region – the original paper cherrypicked dates associated to certain archaeological cultures within certain regions and tacked on estimates of “extensive versus intensive land use” (itself a very problematic prospect that could form its own critique, but is not the focus of the present paper). The full scope of the archaeological 14C in Amazonia is far better than the original paper lets on, and is the subject of ongoing efforts to collate and analyse fully.
The author also argues for issues with the calibration curve. While SPDs intersecting steeper parts of the curve tend to be over-represented, resulting in peaks, plateaus cause calibrated date ranges to be underrepresented within larger datasets, thus leveling out the SPD. The author specifically references centennial-scale plateaus in the IntCal13 and SHCal13 radiocarbon curves (line 171). Is the combined IntCal and SHcal curve somehow different in terms of these biases? If so, a graphic illustration would be interesting to evaluate. A plot of the SPDs (full dataset) as well as the KED against the curves would serve to illustrate the point that the curve is influencing the patterns. If there’s not enough room in the text, this could be in supplemental documentation.
Composite KDEs are not, as far as leading practitioners can tell, affected by calibration curve effects (R. McLaughlin pers. comm.) due to the bootstrapping procedure. This, coupled with the fact that they can be directly interpreted, is a strong argument in favour of switching SPDs for KDEs, or supplementing one with the other, as I have done here. The combined IntCal/SHCal curve (a method for producing this curve has now been pushed to the developer release of rcarbon, incidentally) is not different in terms of these biases, as the curves are very similar in the Late Holocene. I have modified the language in the lines in question to note that calibration curve effects may be affecting the SPDs, rather than state with absolute certainty that they are affecting the SPDs. Radiocarbon plateaus will "smear" out the calibrated date range of even a precisely dated sample.
The author uses the KED to illustrate that the interpretations of beginning and ends of cultural phases are incorrect. However, I’m still seeing similar patterns with respect to the identified climate episode. The argument of de Souza et al. isn’t that a culture began and ended abruptly with some climatic anomaly, but rather that there was cultural change with the SPD as a proxy. I still see changes (rise and fall of KED) around that could correspond with the indicated climate change. Figure 3 is also unclear for the reader. While the coloration is the same as previous, it would be helpful to indicate which culture is which on the plot for ease of reference (especially for people who are not as familiar with the culture history of the pre-Columbian Amazon).
I have modified Figure 3 to better reflect the distribution of the data (and match the new Table 1 requested by Reviewer 2). The in-text discussion of the KDEs highlights patterns that counterindicate climatic causation in the ends of the "intensive" phases. The abrupt ends of certain phases is discussed to note that there are points in time that past which we cannot say anything whatsoever from a statistical point of view about the interface between climate and culture. Whether or not an archaeological phase actually ends or not at a given point is less germane.
Finally, this paper evaluates only one dataset – 14C dates – and does describe if the archaeological data are contradictory to the interpretations de Souza et al. I realize that this is a special issue, and likely space is limited and the paper is not stand-alone, but as a reviewer I am lacking some context. Additionally explanations of what the author thinks is happening archaeologically would be helpful to illustrate their points. How would the author proceed in future analyses? The abstract suggest that the present analyses help overcome the sampling biases in the de Souza et al. study, but I am not sure what suggests are actually made as to what the next steps should be (more analyses, more dating work, etc.).
Per the comments of Reviewer 2 and 4, the text has been expanded to address the question of what future analyses can do to usefully tackle the 14C record of Amazonia.
I have attempted to make it as clear as possible that the radiocarbon data – the only archaeological data used by De Souza et al in support of adaptive cycling – is equivocal on the questions they set for themselves, and that actual analysis (as opposed to just visualisation) reveals this quite readily. I do not believe the 14C data presented here is better for much more than dating the depositional events in the sites from which they originate. In aggregate and on the scale set by De Souza et al, the data are near-meaningless, and I state this rather unequivocally in the conclusions and discussion. My opinion on what is happening archaeologically would be just as unfounded as the original paper, using only this data.
Reviewer 4 Report
This paper both provides an important critical perspective on the recent De Souza et al. claims about adaptive cycling in the Precolumbian Amazon and raises more broadly relevant methodological issues with respect to the use of radiocarbon dates as demographic proxies. As such, it achieves broad relevance in spite of being "merely" a commentary on a recent paper. While I think there are a few points that require clarification, and would suggest some minor revisions, I think this is a valuable paper and would be pleased to see it published.
Broadly speaking, I see two areas that require revision: 1) in order to make a commentary piece like this readable as a stand-alone paper (which I think it nearly is), a brief summary of the claims and cultural schema used in De Souza et al. are needed, and 2) to maximize the impact on methodological discussions, a bit of clarification about a few methodological issues is needed.
Specifically, with respect to (1): Adopting the position that it's impractical to entirely rehash De Souza et al. here is perfectly reasonable. However, in order to fully appreciate and assess the critique laid out here, the reader needs to be able to understand the relationships between phases, time, space, and assignment to extensive or intensive categories of land use. I would suggest a table that lays these out - i.e., the sequence of phases in each region (and their temporal relationships to one another), with their classifications as intensive/extensive.
With respect to (2), I think there's a need for a bit more clarity about what constitute the samples (and statistical populations) in question, how sample size matters and how it relates to uncertainty, and the resolution of target events. The author is very much to be commended, to my mind, for bringing these issues up at all, as they are generally under-discussed. My suggestions follow below in conjunction w/ specific comments about the text, but in essence have to do with the issue of determining when few dates can be taken to represent low populations, versus when they represent inadequate sampling, taphonomic issues, etc.
Specific comments on the text:
Ln 47
"are largely responsible for the patterns of archaeological activity reported in the paper"
A vital point here is that these patterns are not the result of archaeological activity! i.e., de Souza et al interpret and report them as such, but in fact - the author wishes to argue - they are spurious patterns, resulting not from archaeological activity but from the various data/analysis problems discussed. I find the critique compelling, and see a bit more precision in language in places like this as important to adequately conveying the import of that critique.
Ln 83-85
"Effectively, the phase modelling introduces a form of unaccounted-for “forward bias” into the data [9], which is especially visible in the Marajoara-Santarém pairings (Eastern Amazon region)."
Perhaps it's not entirely clear whether this is introducing bias, or correcting for it (cf. Bayliss et al. 2007 on the tendency to overestimate phase lengths based on assemblages of 14C dates). For that reason I'd suggest removing the "exceptional" on Ln72. An equally important point that might be emphasized more here is that a small sample of 14C dates is disproportionately vulnerable/sensitive to any removals. As a result, those should be clearly justified - and in this case are not, seems to be the gist of this section.
Ln 106
The "subsequent analyses" referred to are the ones in this paper, or analyses that occur later on in de Souza et al? I think the former, but a small clarification would avoid any misunderstandings here.
Ln 114
"observed SPDs are locally significantly different from the null expectation of no temporal structure between subsets"
I'd suggest instead "divergent temporal structure"; I don't think there's any problem as it is, but there is potential for misreading. Better yet - if I understand Crema et al. correctly - might be something like, "the null expectation that all subsets were sampled from similar populations".
Ln116-118
"It can be suggested that there is utility in using permutation testing on the numerous and variable subsets of the Amazonian data to investigate whether statistically significant regional divergences do occur."
I would suggest something more like, "Permutation testing on the numerous and variable subsets of the Amazonian data can test whether statistically significant regional divergences do occur, complementing if not replacing a visual assessment of SPDs."
Ln120
"The test is also applied to the test to the limited dataset (Figure S1). Differences between sets are minor..."
?
Something obviously is confused about the first sentence, and I had a bit of trouble following which sets were referenced in the second - maybe specify "full and limited".
Ln121
"only four out of eleven phases display significant divergences from the null model"
I would specify here what this means - just what the null model is remains important to the argument.
Ln 123
"only the latter three are explicitly highlighted as exemplary of the adaptive cycles model"
The passive voice here makes it a bit difficult to follow whose highlighting is referenced. I'd suggest "are explicitly highlighted by De Souza et al." instead.
Ln 124
What is meant by "specifically named phases"?
Ln 125
"statistically indistinguishable from the null model"
As Ln 121, I think it's important here to specify what this means.
Ln 135-137
"likely due to a lack of data in this interval relative to other phases, as opposed to an actual demographic phenomenon within their respective regions"
I think I agree here, but how do we know when few dates is due to lack of data, and when it reflects low population? Some measure of the relative intensities of sampling and effects of taphonomy is needed here - or, barring that (since no such measures really exist), at least a recognition of this problem. The critique still stands - this is a problem for the original analysis.
Ln 167
"visualising calibrated radiocarbon dates"
Should read, "visualizing assemblages of radiocarbon dates"
Ln 169
There are a number of other critiques as well, some of which (e.g. Contreras and Meadows 2014, Torfing 2015) discuss sampling and sampling density specifically.
Ln 183-191
I think the point made in this paragraph is valuable, but I find the terminology confusing. Narrower and wider confidence intervals make sense to me, but it's not clear to me what is meant by narrower and broader date ranges.
Ln 203-204
"The KDEs do not “separate out” as expected if there were genuine adaptive cycle occurring here."
...and if the data were of sufficient density to detect that cycling if it existed.
Ln 212
"multi-centennial decline in earthwork construction"
But the proxy (supposedly) is measuring population, not earthwork construction.
Ln 214-215
"The extent to which construction of non-settlement (‘ceremonial’) sites is indicative of demographic trends or adaptive capacity is not clear."
True, but this is not a problem unique to earthworks, or even to ceremonial sites - rather a problem fundamental to 'dates as data' approaches, i.e., just what is being dated.
Ln 233
"the results indicate they are fundamentally unsuited to testing this hypothesis"
Unsuited? Or is the density of data simply insufficient? It may be that we can't tell, but there are two possibilities here, one of which would suggest that as more dates become available we should revisit the issue, and the other of which suggests that there's no point in doing this type of analysis. The critique here has I think suggested the former, but suddenly shifts to the latter in this sentence.
Ln 254-258
"...the data are fundamentally unsuited to analyses at this scale, in large part due to the limited number of radiocarbon dates at the critical points where cultural transitions are said to occur. SPDs derived from handfuls of dates, for the reasons outlined in the preceding sections, have very serious limitations that the adaptive cycles model does not acknowledge."
How many dates are needed? See remarks above about sampling density, and discussion in Contreras and Meadows 2014.
References Mentioned:
Bayliss, Alex, Christopher Bronk Ramsey, Johannes van der Plicht, and Alasdair Whittle 2007 Bradshaw and Bayes: Towards a Timetable for the Neolithic. Cambridge Archaeological Journal 17(Supplement S1):1–28. DOI:10.1017/S0959774307000145.
Contreras, Daniel A., and John Meadows 2014 Summed radiocarbon calibrations as a population proxy: a critical evaluation using a realistic simulation approach. Journal of Archaeological Science 52:591–608.
Torfing, Tobias 2015 Layers of assumptions: A reply to Timpson, Manning, and Shennan. Journal of Archaeological Science 63:203–205.
Author Response
Reviewer 4
This paper both provides an important critical perspective on the recent De Souza et al. claims about adaptive cycling in the Precolumbian Amazon and raises more broadly relevant methodological issues with respect to the use of radiocarbon dates as demographic proxies. As such, it achieves broad relevance in spite of being "merely" a commentary on a recent paper. While I think there are a few points that require clarification, and would suggest some minor revisions, I think this is a valuable paper and would be pleased to see it published.
Broadly speaking, I see two areas that require revision: 1) in order to make a commentary piece like this readable as a stand-alone paper (which I think it nearly is), a brief summary of the claims and cultural schema used in De Souza et al. are needed, and 2) to maximize the impact on methodological discussions, a bit of clarification about a few methodological issues is needed.
Specifically, with respect to (1): Adopting the position that it's impractical to entirely rehash De Souza et al. here is perfectly reasonable. However, in order to fully appreciate and assess the critique laid out here, the reader needs to be able to understand the relationships between phases, time, space, and assignment to extensive or intensive categories of land use. I would suggest a table that lays these out - i.e., the sequence of phases in each region (and their temporal relationships to one another), with their classifications as intensive/extensive.
Per the comments of Reviewers 2 & 3, this has been done. Figure 3 has also been amended.
With respect to (2), I think there's a need for a bit more clarity about what constitute the samples (and statistical populations) in question, how sample size matters and how it relates to uncertainty, and the resolution of target events. The author is very much to be commended, to my mind, for bringing these issues up at all, as they are generally under-discussed. My suggestions follow below in conjunction w/ specific comments about the text, but in essence have to do with the issue of determining when few dates can be taken to represent low populations, versus when they represent inadequate sampling, taphonomic issues, etc.
I have responded to each specific comment from Reviewer 4 below, and appreciate the thoroughness of their comments greatly.
Specific comments on the text:
Ln 47
"are largely responsible for the patterns of archaeological activity reported in the paper"
A vital point here is that these patterns are not the result of archaeological activity! i.e., de Souza et al interpret and report them as such, but in fact - the author wishes to argue - they are spurious patterns, resulting not from archaeological activity but from the various data/analysis problems discussed. I find the critique compelling, and see a bit more precision in language in places like this as important to adequately conveying the import of that critique.
The point about precision in language is well-received, but as these lines specifically refer to the De Souza et al. paper and the level of archaeological activity they purport in their analysis of the radiocarbon data, I have left it as-is in this instance to avoid misrepresenting their position. See also my comments to other reviewers about, what, exactly, the original paper takes the relative rise and fall of SPDs to represent.
Ln 83-85
"Effectively, the phase modelling introduces a form of unaccounted-for “forward bias” into the data [9], which is especially visible in the Marajoara-Santarém pairings (Eastern Amazon region)."
Perhaps it's not entirely clear whether this is introducing bias, or correcting for it (cf. Bayliss et al. 2007 on the tendency to overestimate phase lengths based on assemblages of 14C dates). For that reason I'd suggest removing the "exceptional" on Ln72. An equally important point that might be emphasized more here is that a small sample of 14C dates is disproportionately vulnerable/sensitive to any removals. As a result, those should be clearly justified - and in this case are not, seems to be the gist of this section.
Done.
Ln 106
The "subsequent analyses" referred to are the ones in this paper, or analyses that occur later on in de Souza et al? I think the former, but a small clarification would avoid any misunderstandings here.
Clarified, it is the former.
Ln 114
"observed SPDs are locally significantly different from the null expectation of no temporal structure between subsets"
I'd suggest instead "divergent temporal structure"; I don't think there's any problem as it is, but there is potential for misreading. Better yet - if I understand Crema et al. correctly - might be something like, "the null expectation that all subsets were sampled from similar populations".
Done.
Ln116-118
"It can be suggested that there is utility in using permutation testing on the numerous and variable subsets of the Amazonian data to investigate whether statistically significant regional divergences do occur."
I would suggest something more like, "Permutation testing on the numerous and variable subsets of the Amazonian data can test whether statistically significant regional divergences do occur, complementing if not replacing a visual assessment of SPDs."
Done, with slight modification.
Ln120
"The test is also applied to the test to the limited dataset (Figure S1). Differences between sets are minor..."
?
Something obviously is confused about the first sentence, and I had a bit of trouble following which sets were referenced in the second - maybe specify "full and limited".
Typo corrected and language clarified.
Ln121
"only four out of eleven phases display significant divergences from the null model"
I would specify here what this means - just what the null model is remains important to the argument.
Done.
Ln 123
"only the latter three are explicitly highlighted as exemplary of the adaptive cycles model"
The passive voice here makes it a bit difficult to follow whose highlighting is referenced. I'd suggest "are explicitly highlighted by De Souza et al." instead.
Done.
Ln 124
What is meant by "specifically named phases"?
Changed to “explicitly highlighted” to match the prior sentence.
Ln 125
"statistically indistinguishable from the null model"
As Ln 121, I think it's important here to specify what this means.
This would be the third time in two paragraphs that the results are noted as “displaying no divergent temporal structure”. It’s one thing to explain results clearly, but I think in this case the reader can infer from context what is meant by “the” null model, as it has been laid out twice previously and follows directly from preceding discussion of “the” null model.
Ln 135-137
"likely due to a lack of data in this interval relative to other phases, as opposed to an actual demographic phenomenon within their respective regions"
I think I agree here, but how do we know when few dates is due to lack of data, and when it reflects low population? Some measure of the relative intensities of sampling and effects of taphonomy is needed here - or, barring that (since no such measures really exist), at least a recognition of this problem. The critique still stands - this is a problem for the original analysis.
With apologies, the intent here was to communicate that the significantly divergent negative temporal structure is an artefact of the data in these two cases, simply because these significantly negative phases in the Koriabo/Santarém subsets occur before these cultures are attested in the material record, and therefore no radiocarbon dates can be associated to either culture in their respective “negative” intervals, causing a false negative (absence of evidence is treated as evidence of absence by the stats). This is of course tautological in this case (there are no dates in this interval because there’s no material culture of that age because there’s no dates because…etc. etc.), which I recognise.
While acknowledging that the age range of a given archaeological culture may of course be expanded through the discovery of older/younger samples, part of the point of the exercise is to take the data in De Souza et al. at face value and assuming for present purposes that it is watertight in and of itself. Clearly this leads to spurious patterns occurring, as discussed in the text.
I have nonetheless attempted to disambiguate the text in this regard.
Ln 167
"visualising calibrated radiocarbon dates"
Should read, "visualizing assemblages of radiocarbon dates"
Done.
Ln 169
There are a number of other critiques as well, some of which (e.g. Contreras and Meadows 2014, Torfing 2015) discuss sampling and sampling density specifically.
Very grateful for these references. However, c.f. McLaughlin (2018, J Arch Meth Theo) regarding the Black Death discussed by Contreras and Meadows (2014) and sensitivity to sample sizes.
Ln 183-191
I think the point made in this paragraph is valuable, but I find the terminology confusing. Narrower and wider confidence intervals make sense to me, but it's not clear to me what is meant by narrower and broader date ranges.
Clarified to short/long, narrow/broad (for temporal duration and confidence intervals, respectively).
Ln 203-204
"The KDEs do not “separate out” as expected if there were genuine adaptive cycle occurring here."
...and if the data were of sufficient density to detect that cycling if it existed.
Yes, this is highlighted later in the discussion.
Ln 212
"multi-centennial decline in earthwork construction"
But the proxy (supposedly) is measuring population, not earthwork construction.
With reference to my response to Reviewer 3’s comments, prior to submitting the paper, I contacted Jonas De Souza directly regarding exactly what the SPDs are meant to represent, as it is left somewhat ambiguous in the original paper. The original paper implies it is levels of human activity associated to given cultural practices (employing terms like ‘apex’ and ‘decline’, etc.). Jonas replied to me that the SPDs illustrate the relative levels of activity (and land use) associated with different archaeological cultures, and their duration. Adding to confusion, their paper also extensively cites works that do employ archaeological radiocarbon as a demographic proxy.
Although I personally agree that archaeological radiocarbon in aggregate should unambiguously be considered a demographic proxy, the original paper and my correspondence with the authors definitely leaves a large degree of ambiguity about what the Amazonian SPDs realistically represent. This has a knock-on effect, as Reviewer 3 has helpfully pointed out too.
I have amended instances where I have referred to the SPDs and KDEs as demographic proxies per se, and used alternative terms more commensurate with “trends in”, or “levels of”, a given cultural activity, in this specifc case earthwork construction. Inferring the relative resilience/sustainability of different patterns of land use from SPDs alone is a severe issue with the original paper, which is unfortunately outside the focus of the present paper. It will be tackled in forthcoming work on the demography of pre-Columbian Amazonia.
Ln 214-215
"The extent to which construction of non-settlement (‘ceremonial’) sites is indicative of demographic trends or adaptive capacity is not clear."
True, but this is not a problem unique to earthworks, or even to ceremonial sites - rather a problem fundamental to 'dates as data' approaches, i.e., just what is being dated.
Yes, I agree – see above and comments to Reviewer 3. It is already highly dubious to associate archaeological cultures (as more or less discrete spatiotemporal entities) to “extensive” or “intensive” land use and to then use SPDs to bolster the argument, but again, this is well outside the scope of a dissection of the radiocarbon data itself.
Ln 233
"the results indicate they are fundamentally unsuited to testing this hypothesis"
Unsuited? Or is the density of data simply insufficient? It may be that we can't tell, but there are two possibilities here, one of which would suggest that as more dates become available we should revisit the issue, and the other of which suggests that there's no point in doing this type of analysis. The critique here has I think suggested the former, but suddenly shifts to the latter in this sentence.
Clarified the language here.
Ln 254-258
"...the data are fundamentally unsuited to analyses at this scale, in large part due to the limited number of radiocarbon dates at the critical points where cultural transitions are said to occur. SPDs derived from handfuls of dates, for the reasons outlined in the preceding sections, have very serious limitations that the adaptive cycles model does not acknowledge."
How many dates are needed? See remarks above about sampling density, and discussion in Contreras and Meadows 2014.
Per the other reviewers’ comments, I have expanded the discussion on this issue and offered some thoughts on how to circumvent the challenges facing the aggregate analysis of radiocarbon in the Amazon basin.
Round 2
Reviewer 3 Report
The author has sufficiently addressed my comments, and I believe the paper should be published after minor copy editing.