One of the very first questions that a person who is promoting a model encounters from scientists and engineers is â€œhas your model been validated?â€. By validation we mean, has it been shown to adequately perform its intended use.
According to Charles M. Macal, Argonne National Laboratory, if the answer to this critical question is No, then
1. Experience has shown that the model is unlikely to be adopted or even tried out in a real-world setting
2. Often the model is â€œsent back to the drawing boardâ€
3. The challenge then becomes one of being able to say â€œyesâ€ to this critical question
I asked the validation question recently of the climate code red report from CSIRO, the Drought Exceptional Circumstances report, (DEC). The answer was No.
Based on an initial data analysis, the models predict drought in the last century worse than even random numbers could. In fact the models show increasing trends in extreme low rainfall, while observations show trends mostly decreasing, a relationship actually illustrated in the data graphed on Figure 10 of the report.
I have gone searching for the validation of regional climate models for Australia. Where are the significance tests giving the confidence that global climate models can reproduce regional climate patterns? The first place I looked was the major report: Climate change in Australia: technical report 2007 (Chapter 5 part 1). Here it is clearly stated that statistical validity is not tested but assumed:
Like many of these studies, the work presented here assumes that an ensemble of global climate model results gives a representation of the expected change of the real world to a specific emission scenario. The best estimate of the change is the ensemble average or â€˜multi-model meanâ€™.
The method used is to average all or selected models (see Figure above). There was no reference to comparison with observations in the whole of Chapter 5. The process of weighting of models with regional concordance is also advocated, a process known as ‘cherry picking’ in other places.
Variations in this estimate relate to whether some climate models are given more weight than others based on their assessed reliability, and how that assessment is made and applied. The most common approach has been to assess how well each of the available models simulates the present climate of the region (e.g. Dessai et al. 2005), on the assumption that the more accurately a model is able to reproduce key aspects of the regional climate, the more likely it is to provide reliable guidance for future changes in the region. The method of weighting models is presented shortly.
“this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.”
It appears he was wrong that it was unique to dendroclimatology, as picking and choosing the climate model seems to be the main process for simulating the regional climate.
That is a another story. The dendroclimatolgy literature embraces the prior validation of proxies. But in Climate change in Australia, significance tests seem to be spurned. Chapter 5 describes differences between models that serve as a basis for selection. At the bottom of page 65, Section 5.2 the report indicates that certain models are biased towards decreases in precipitation:
Finally, the weighted mean of the percentage trends, using the standard model weights of Table 4.1, is shown in Figure 5.14d. The result is slightly drier than 5.14c everywhere. Evidently the more skilful models are more likely to produce a decrease.
Presumably the models biased towards lower precipitation estimates were selected in Suppiah et al 2007, the methods paper cited in the Drought Exceptional Circumstances Report: Australian climate change projections derived from simulations performed for the IPCC 4th assessment report. Here instead of weighting, the models are selected using ‘demerit’ points. Like members of boy scout troupe, 15 of 23 models scoring less than 8 ‘demerits’ are included.
A recent report from the U.S. Climate Change Science Program is clear that significance tests are necessary: Climate Models: An Assessment of Strengths and Limitations (3.2 Empirical Downscaling) (emphasis added):
[Empirical Downscaling] uses statistical relationships to link resolved behavior in GCMs with climate in a targeted area. The targeted areaâ€™s size can be as small as a single point. As long as significant statistical relationships occur, empirical downscaling can yield regional information for any desired variable such as precipitation and temperature.
Qualifications of regional climate models are echoed in the most recent review I could find for Australia. As stated in Assessment of the use of Current Climate Patterns to Evaluate Regional Enhanced Greenhouse Response Patterns of Climate Models by Dr Penny Whetton who has led the Climate Impacts and Risk research stream at CSIRO Marine and Atmospheric Research since July 2005:
The reliability of the regional responses of models to enhanced greenhouse forcing is often assessed by comparing their current climate simulation against observations. The rationale for this assessment is that a model should be able to reproduce key aspects of the present climate if it is to be used to provide guidance for future changes in climate.
The abstract goes on to acknowledge some of the limitations of the approach:
One can assess regional average or grid point model biases for the variable and season for which projections are to be prepared. However, a model that performs well for a target variable, season and location, may perform poorly for another variable, season or location, in which case model processes would be suspect.
Given the poor performance of the models at simulating observed extreme precipitation events as I found here, could Whetton et al. 2007 be suggesting that the model processes are suspect in the Drought Exception Circumstances report?
Despite the selection of the best models for seasonal temperature and pressure in Suppiah et al. 2007, according to Whetton et al. 2007 there is no reason to expect good performance at precipitation from models selected for a different variable. In this case, one would expect that the performance of the model should at least be validated for the variable of interest. This was not done for the Drought Exceptional Circumstances report.
Whetton et al. 2007 goes on:
Correlations of moderate magnitude are common in our results, indicating the value of testing the current regional climate simulation of models.
Spurious moderate correlations are well known to be produced in highly autocorrelated variables, such as spatial climate variables. Therefore moderate correlations do not necessarily indicate validity of regional climate models. This should be well known by now. In the field I am most familiar with, almost 10 years ago a paper was published, Red Shifts and Red Herrings, in order to:
draw attention to the need for ecologists to take spatial structure into account more seriously in hypothesis testing. If spatial autocorrelation is ignored, as it usually is, then analyses of ecological patterns in terms of environmental factors can produce very misleading results. This is demonstrated using synthetic but realistic spatial patterns with known spatial properties which are subjected to classical correlation and multiple regression analyses. Correlation between an autocorrelated response variable and each of a set of explanatory variables is strongly biased in favour of those explanatory variables that are highly autocorrelated – the expected magnitude of the correlation coefficient increases with autocorrelation even if the spatial patterns are completely independent. Similarly, multiple regression analysis finds highly autocorrelated explanatory variables â€œsignificantâ€ much more frequently than it should. The chances of mistakenly identifying a â€œsignificantâ€ slope across an autocorrelated pattern is very high if classical regression is used. Consequently, under these circumstances strongly autocorrelated environmental factors reported in the literature as associated with ecological patterns may not actually be significant.
Questions have been asked on other blogs — though none here — about why I should be investigating the validity of these models. Well, why shouldn’t we expect the accuracy of climate models on independent data to be available? At present, there are no data to cause one to expect them to predict any better than random data. In fact, the models in the DEC report predicted drought worse than random data. They show drought trend increasing over the last 100 years, when actually drought trend has been decreasing.
I am still looking for a study validating regional climate models in Australia. Incorporation of autocorrelation would be good, but any study that has a significance test to show they are better than random on independent data would be accepted. I don’t have high hopes. See this letter from Vincent Gray to Jennifer Marohasy.