Since 2006, in between promoting numeracy in education, and examples of simple statistics using topical issues from the theory of Anthropogenic Global Warming (AGW) to illustrate points, I asked the question “Have these models been validated?”, in blog posts and occasionally submissions to journals. This post summarizes these efforts.
Species Extinctions
Predictions of massive species extinctions due to AGW came into prominence with a January 2004 paper in Nature called Extinction Risk from Climate Change by Chris Thomas et al.. They made the following predictions:
“we predict, on the basis of mid-range climate-warming scenarios for 2050, that 15–37% of species in our sample of regions and taxa will be ‘committed to extinction’.
Subsequently, three communications appeared in Nature in July 2004. Two raised technical problems, including one by the eminent ecologist Joan Roughgarden. Opinions raged from “Dangers of Crying Wolf over Risk of Extinctions” concerned with damage to conservationism by alarmism, through poorly written press releases by the scientists themselves, and Extinction risk [press] coverage is worth the inaccuracies stating “we believe the benefits of the wide release greatly outweighed the negative effects of errors in reporting”.
Among those believing gross scientific inaccuracies are not justified, and such attitudes diminish the standing of scientists, I was invited to a meeting of a multidisciplinary group of 19 scientists, including Dan Bodkin from UC Santa Barbara, mathematician Matt Sobel, Craig Loehle and others at the Copenhagen base of Bjørn Lomborg, author of The Skeptical Environmentalist. This resulted in Forecasting the Effects of Global Warming on Biodiversity published in 2007 BioScience. We were particularly concerned by the cavalier attitude to model validations in the Thomas paper, and the field in general:
Of the modeling papers we have reviewed, only a few were validated. Commonly, these papers simply correlate present distribution of species with climate variables, then replot the climate for the future from a climate model and, finally, use
one-to-one mapping to replot the future distribution of the species,without any validation using independent data. Although some are clear about some of their assumptions (mainly equilibrium assumptions), readers who are not experts in modeling can easily misinterpret the results as valid and validated. For example, Hitz and Smith (2004) discuss many possible effects of global warming on the basis of a review of modeling papers, and in this kind of analysis the unvalidated assumptions of models would most likely be ignored.
The paper observed that few mass extinctions have been seen over recent rapid climate changes, suggesting something must be wrong with the models to get such high rates of extinctions. They speculated that species may survive in refugia, suitable habitats below the spatial scale of the models.
Another example of an unvalidated assumptions that could bias results in the direction of extinctions, was described in chapter 7 of my book Niche Modeling.
When climate change shifts a species’ niche over a landscape (dashed to solid circle) the response of that species can be described in three ways: dispersing to the new range (migration), local extirpation (intersection), or expansion (union). Given the probability of extinction is correlated with range size, there will either be no change, an increase (intersection), or decrease (union) in extinctions depending on the dispersal type. Thomas et al. failed to consider range expansion (union), a behavior that predominates in many groups. Consequently, the methodology was inherently biased towards extinctions.
One of the many errors in this work was a failure to evaluate the impact of such assumptions.
The prevailing view now, according to Stephen Williams, coauthor of the Thomas paper and Director for the Center for Tropical Biodiversity and Climate Change, and author of such classics as “Climate change in Australian tropical rainforests: an impending environmental catastrophe”, may be here.
Many unknowns remain in projecting extinctions, and the values provided in Thomas et al. (2004) should not be taken as precise predictions. … Despite these uncertainties, Thomas et al. (2004) believe that the consistent overall conclusions across analyses establish that anthropogenic climate warming at least ranks alongside other recognized threats to global biodiversity.
So how precise are the figures? Williams suggests we should just trust the beliefs of Thomas et al. — an approach referred to disparagingly in the forecasting literature as a judgmental forecast rather than a scientific forecast (Green & Armstrong 2007). These simple models gloss over numerous problems in validating extinction models, including the propensity of so-called extinct species quite often reappear. Usually they are small, hard to find and no-one is really looking for them.
Hockey-stick
One of the pillars of AGW is the view that 20th-century warmth is exceptional in the context of the past 1200 years, illustrated by the famous hockey-stick graph, as seen in movies, and government reports to this day.
Claims that 20th-century warming is ‘exceptional’ rely on selection of so-called temperature ‘proxies’ such as tree rings, and statistical tests of the significance of changes in growth. I modelled the proxy selection process here and showed you can get a hockey stick shape using random numbers (with serial correlation). When the numbers trend, and then are selected based on correlation with recent temperatures, the result is inevitably ‘hockey stick’ shaped: i.e. with a distinct uptick where the random series correlated with recent temperatures, and a long straight shaft as the series revert back to the mean. My reconstruction was similar to many other reconstructions with low variance medieval warm period (MWP).
It is an error to underestimate the effect of ex-post selection based on correlation or ‘cherry picking’ on uncertainty. Cherry picking has been much criticised on ClimateAudit. Steve McIntyre and Ross McKitrick published in February 2009 a comment, cited my AIG article, in a criticism of an article by Michael Mann, saying:
Numerous other problems undermine their conclusions. Their CPS reconstruction screens proxies by calibration-period correlation, a procedure known to generate ‘‘hockey sticks’’ from red noise (4).
The response by Michael Mann acknowledged such screening was common, used in their reconstructions, but claimed it was ‘unsupported’ in the literature.
McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ‘‘hockey sticks’’ is unsupported in peer-reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.
In fact, it is supported in the peer-reviewed literature, as Gerd Bürger raised the same objection in a Science 29 June 2007 comment on “The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 years by Osborn and Keith R. Briffa (29 June 2007)” finding 20th-Century warming not exceptional.
However, their finding that the spatial extent of 20th-century warming is exceptional ignores the effect of proxy screening on the corresponding significance levels. After appropriate correction, the significance of the 20th-century warming anomaly disappears.
The National Academy of Science agreed that uncertainty was greater than appreciated, and shortened the hockey-stick of the time by 600 years (contrary to assertions in the press).
Long Term Persistence (LTP)
Here is one of my first php applications, a fractional differencing simulation climate. Reload to see a new simulation below, together with measures of correlation (r2 and RE) with some monthly climate figures of the time.
This little application gathered a lot of interest, I think because fractional differencing is an inherently interesting technique, creates realistic temperature simulations, and is a very elegant way to generate series with long term persistence (LTP), a statistical property that generates natural ‘trendiness’. One of the persistent errors in climate science has been the failure to take into account the autocorrelation in climate data, leading to inflated significance values.
It has been noted that there are no requirements for verified accuracy for climate models to be incorporated into the IPCC. Perhaps if I got my random model published it would qualify. It would be a good benchmark.
Extreme Sensitivity
“According to a new U.N. report, the global warming outlook is much worse than originally predicted. Which is pretty bad when they originally predicted it would destroy the planet.†–Jay Leno
The paper by Rahmstorf et al. must rank as one the most quotable of all time.
The data available for the period since 1990 raise concerns that the climate system, in particular sea level, may be responding more quickly to climate change than our current generation of models indicates.
This claim, made without the benefit of any statistical analysis or significance testing is widely quoted to justify claims that the climate system is “responding more strongly than we thoughtâ€. I debated this paper with Stefan at RealClimate, and succeeded in demonstrating they had grossly underestimated the uncertainty.
His main defense was that the end point uncertainty would only affect the last 5 points of the smoothed trend line with an 11 point embedding. Here the global temperatures were smoothed using a complex method called Singular Spectrum Analysis (SSA). I gave examples of SSA and other methods where the end point uncertainty affected virtually ALL points in the smoothed trend line, and particularly more than 5 end points. Stefan clearly had little idea of how SSA worked. His final message, without an argument, was:
[Response: If you really think you’d come to a different conclusion with a different analysis method, I suggest you submit it to a journal, like we did. I am unconvinced, though. -stefan]
But to add insult to injury, this paper figured prominently in the Interim Report of the Garnaut Review where I put in a submission.
“Developments in mainstream scientific opinion on the relationship between emissions, accumulations and climate outcomes, and the Review’s own work on future business-as-usual global emissions, suggest that the world is moving towards high risks of dangerous climate change more rapidly than has generally been understood.â€
As time moves on and more data is available, a trend line using the same technique is regressing to the mean. It is increasingly clear that the apparent upturn was probably due to the 1998 El Nino. It is an error to regard a short term deviation as an important indication of heightened climate sensitivity.
More Droughts
The CSIRO Climate Adaptation Flagship produced a Drought Exceptional Circumstances Report (DECR), suggesting among other things that droughts would double in the coming decades. Released in the middle of a major drought in Southern Australia, this glossy report had all the hallmarks of promotional literature. I clashed with CSIRO firstly over release of their data, and then in attempting to elicit a formal response to issues raised. My main concern was that there was no apparent attempt demonstrating the climate models used in the report were fit for the purpose of modeling drought, particularly rainfall.
One of the main results of my review of the data is summed up in the following graph, comparing the predicted frequency and severity of low rainfall over the last hundred years, with the observed frequency and severity of low rainfall. It is quite clear that the models are inversely related to the observations.
A comment submitted to the Australian Meteoreological Magazine was recently rejected. Here I tested the models and observation following an approach of Rybski of analyzing difference between discrete periods 1900-1950 and 1950-2000. The table belows shows that while drought decreased significantly between the periods, modeled droughts increased significantly.
p>Table 1: Mean percentage area of exceptionally low rainfall over time periods suggested by KB09. A Mann Whitney rank-sum test shows significant differences between periods.
1900-2007 | 1900-1967 | 1951-2007 | P 1900-2007 vs. 1951-2007 | P 1900-1950 vs. 1951-2007 | Test | |
---|---|---|---|---|---|---|
Observed % Area Drought | 5.6±0.5 | 6.2±0.7 | 4.9±0.6 | 0.10 | 0.004 | Mann-Whitney test (wilcox.test(x,y) in R) |
Modelled % Area Drought | 5.5±0.1 | 4.8±0.2 | 6.2±0.2 | 0.006 | <0.001 | Mann-Whitney test (wilcox.test(x,y) in R) |
Moreover I showed that while similar results were reported for temperature in the DECR (where models and observations are more consistent), they were not reported for rainfall.
The reviewers did not comment on the statistical proof that the models were useless at predicting drought. Instead, they pointed to Fig 10 in the DECR, a rough graphic, claiming “the models did a reasonable job of simulating the variability”. I am not aware of any statistical basis for model validation by the casual matching of the variability of observations to models. The widespread acceptance of such low standards of model validation is apparently a feature of climate science.
Former Head of the Australian Bureau of Statistics Ian Castles solicited a review by ANU independent Accredited Statisticians, Brewer and Other. They concurred that models in the DECR required validation (along with other interesting points).
Dr Stockwell has argued that the GCMs should be subject to testing of their adequacy using historical or external data. We agree that this should be undertaken as a matter of course by all modelers. It is not clear from the DECR whether or not any such validation analyses have been undertaken by CSIRO/BoM. If they have, we urge CSIRO/BoM make the results available so that readers can make their own judgments as to the accuracy of the forecasts. If they have not, we urge them to undertake some.
A persistent error in climate science is using models when they have not been shown to be ‘fit for purpose’.
Miskolczi
Recently a paper came out potentially undermining the central assumptions of climate modeling. Supported by extensive empirical validation, it was suggested that ‘optical depth’ in the atmosphere is maintained at an optimal, constant value (in the average over the long term). Finding an initial negligible sensitivity of 0.24C surface temperature increase to doubling CO2 increase, it then goes on to suggest constrains that ensure equilibrium will eventually be established, giving no increase in temperature, due to reversion to the constant optical depth. The paper by Ferenc Miskolczi, (2007) called Greenhouse effect in semi-transparent planetary atmospheres, was published in the Quarterly Journal of the Hungarian Meteorological Service, January–March 2007.
I was initially impressed by the extensive validation of his theory using empirical data. Despite a furious debate online, there has been no peer-reviewed rebuttal to date. The pro-AGW blog site RealClimate promised a rebuttal by “students” but to date has made none. This suggests either that it is carefully ignored, or it is transparently flawed.
Quite recently Ken Gregory encouraged Ferenc to run his model using actual recorded water vapor data which declines in the upper atmosphere over the last few decades. While there are large uncertainties associated with these data, they do show a decline consistent with Ferenc’s theory, that water vapor (a greenhouse gas) will decline to compensate for increased CO2. The results of Miskolczi’s calculations using his line-by-line HARTCODE program are given here.
The theoretical aspects of Ferenc’s theory have been been furiously debated online. I am not sure that any conclusions have been reached, but nor has his theory been disproved.
Conclusions
What often happens is that a publication appears which gets a lot of exciting attention. Then some time later, rather quietly, subsequent work gets published that questions the claim or substantially weakens it. But that doesn’t get any headlines, and the citation rate is typically 10:1 in favor of the alarmist claims. It does not help that the IPCC report selectively cites studies, and presents unvalidated projections as ‘highly likely’, which shows they are largely expert forecasts, not scientific forecasts.
All of the ‘errors’ here can be attributed to exaggeration of the significance of the findings, due to inadequate rigor in the validation of models. This view that this is an increasing problem is shared by new studies of rigor from the intelligence community, but apply even more to data derived so easily from computer modeling.
The proliferation of data accessibility has exacerbated the risk of shallowness in information analysis, making it increasingly difficult to tell when analysis is sufficient for making decisions or changing plans, even as it becomes increasingly easy to find seemingly relevant data.
I also agree with John P. A. Ioannidis, who in a wide-ranging study of medical journals found that Most Published Research Findings Are False. To my mind when the methodologies underlying AGW are scrutinized, the findings seem to match the prevailing bias. To make matters worse, in most cases, the response of the scientific community has been to carefully ignore, dissemble, or ad hom dissenters, instead of initiating vigorous programs to improve rigor in problem areas.
We need to adopt more practices from clinical research, such as the structured review, whereby the basis for evaluating evidence for or against an issue is well defined. In this view, the IPCC is simply a review of the literature, one among reviews by competing groups (such as NIPCC REPORT 2008 Nature, Not Human Activity, Rules the Climate). In other words, stop pretending scientists are unbiased, but put systems in place to help prevent ‘group-think’ and promote more vigorous testing of models against reality.
If the very slow, to no rate of increase in global temperature continues, we will be treated to the spectacle of otherwise competent researchers clinging to extreme AGW, while the public become more cynical and disinterested. This would have been avoided if they had been confronted with “Are these models validated? If they are, by all means make your forecasts, if not, don’t.”