Cherry-picking in Australia

One of the very first questions that a person who is promoting a model encounters from scientists and engineers is “has your model been validated?”. By validation we mean, has it been shown to adequately perform its intended use.

According to Charles M. Macal, Argonne National Laboratory, if the answer to this critical question is No, then
1. Experience has shown that the model is unlikely to be adopted or even tried out in a real-world setting
2. Often the model is “sent back to the drawing board”
3. The challenge then becomes one of being able to say “yes” to this critical question

I asked the validation question recently of the climate code red report from CSIRO, the Drought Exceptional Circumstances report, (DEC). The answer was No.

Based on an initial data analysis, the models predict drought in the last century worse than even random numbers could. In fact the models show increasing trends in extreme low rainfall, while observations show trends mostly decreasing, a relationship actually illustrated in the data graphed on Figure 10 of the report.

I have gone searching for the validation of regional climate models for Australia. Where are the significance tests giving the confidence that global climate models can reproduce regional climate patterns? The first place I looked was the major report: Climate change in Australia: technical report 2007 (Chapter 5 part 1). Here it is clearly stated that statistical validity is not tested but assumed:

Like many of these studies, the work presented here assumes that an ensemble of global climate model results gives a representation of the expected change of the real world to a specific emission scenario. The best estimate of the change is the ensemble average or ‘multi-model mean’.

The method used is to average all or selected models (see Figure above). There was no reference to comparison with observations in the whole of Chapter 5. The process of weighting of models with regional concordance is also advocated, a process known as ‘cherry picking’ in other places.

Variations in this estimate relate to whether some climate models are given more weight than others based on their assessed reliability, and how that assessment is made and applied. The most common approach has been to assess how well each of the available models simulates the present climate of the region (e.g. Dessai et al. 2005), on the assumption that the more accurately a model is able to reproduce key aspects of the regional climate, the more likely it is to provide reliable guidance for future changes in the region. The method of weighting models is presented shortly.

As Esper et al 2003 said:

“this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.”

It appears he was wrong that it was unique to dendroclimatology, as picking and choosing the climate model seems to be the main process for simulating the regional climate.

That is a another story. The dendroclimatolgy literature embraces the prior validation of proxies. But in Climate change in Australia, significance tests seem to be spurned. Chapter 5 describes differences between models that serve as a basis for selection. At the bottom of page 65, Section 5.2 the report indicates that certain models are biased towards decreases in precipitation:

Finally, the weighted mean of the percentage trends, using the standard model weights of Table 4.1, is shown in Figure 5.14d. The result is slightly drier than 5.14c everywhere. Evidently the more skilful models are more likely to produce a decrease.

Presumably the models biased towards lower precipitation estimates were selected in Suppiah et al 2007, the methods paper cited in the Drought Exceptional Circumstances Report: Australian climate change projections derived from simulations performed for the IPCC 4th assessment report. Here instead of weighting, the models are selected using ‘demerit’ points. Like members of boy scout troupe, 15 of 23 models scoring less than 8 ‘demerits’ are included.

A recent report from the U.S. Climate Change Science Program is clear that significance tests are necessary: Climate Models: An Assessment of Strengths and Limitations (3.2 Empirical Downscaling) (emphasis added):

[Empirical Downscaling] uses statistical relationships to link resolved behavior in GCMs with climate in a targeted area. The targeted area’s size can be as small as a single point. As long as significant statistical relationships occur, empirical downscaling can yield regional information for any desired variable such as precipitation and temperature.

Qualifications of regional climate models are echoed in the most recent review I could find for Australia. As stated in Assessment of the use of Current Climate Patterns to Evaluate Regional Enhanced Greenhouse Response Patterns of Climate Models by Dr Penny Whetton who has led the Climate Impacts and Risk research stream at CSIRO Marine and Atmospheric Research since July 2005:

The reliability of the regional responses of models to enhanced greenhouse forcing is often assessed by comparing their current climate simulation against observations. The rationale for this assessment is that a model should be able to reproduce key aspects of the present climate if it is to be used to provide guidance for future changes in climate.

The abstract goes on to acknowledge some of the limitations of the approach:

One can assess regional average or grid point model biases for the variable and season for which projections are to be prepared. However, a model that performs well for a target variable, season and location, may perform poorly for another variable, season or location, in which case model processes would be suspect.

Given the poor performance of the models at simulating observed extreme precipitation events as I found here, could Whetton et al. 2007 be suggesting that the model processes are suspect in the Drought Exception Circumstances report?

Despite the selection of the best models for seasonal temperature and pressure in Suppiah et al. 2007, according to Whetton et al. 2007 there is no reason to expect good performance at precipitation from models selected for a different variable. In this case, one would expect that the performance of the model should at least be validated for the variable of interest. This was not done for the Drought Exceptional Circumstances report.

Whetton et al. 2007 goes on:

Correlations of moderate magnitude are common in our results, indicating the value of testing the current regional climate simulation of models.

Spurious moderate correlations are well known to be produced in highly autocorrelated variables, such as spatial climate variables. Therefore moderate correlations do not necessarily indicate validity of regional climate models. This should be well known by now. In the field I am most familiar with, almost 10 years ago a paper was published, Red Shifts and Red Herrings, in order to:

draw attention to the need for ecologists to take spatial structure into account more seriously in hypothesis testing. If spatial autocorrelation is ignored, as it usually is, then analyses of ecological patterns in terms of environmental factors can produce very misleading results. This is demonstrated using synthetic but realistic spatial patterns with known spatial properties which are subjected to classical correlation and multiple regression analyses. Correlation between an autocorrelated response variable and each of a set of explanatory variables is strongly biased in favour of those explanatory variables that are highly autocorrelated – the expected magnitude of the correlation coefficient increases with autocorrelation even if the spatial patterns are completely independent. Similarly, multiple regression analysis finds highly autocorrelated explanatory variables “significant” much more frequently than it should. The chances of mistakenly identifying a “significant” slope across an autocorrelated pattern is very high if classical regression is used. Consequently, under these circumstances strongly autocorrelated environmental factors reported in the literature as associated with ecological patterns may not actually be significant.

Questions have been asked on other blogs — though none here — about why I should be investigating the validity of these models. Well, why shouldn’t we expect the accuracy of climate models on independent data to be available? At present, there are no data to cause one to expect them to predict any better than random data. In fact, the models in the DEC report predicted drought worse than random data. They show drought trend increasing over the last 100 years, when actually drought trend has been decreasing.

I am still looking for a study validating regional climate models in Australia. Incorporation of autocorrelation would be good, but any study that has a significance test to show they are better than random on independent data would be accepted. I don’t have high hopes. See this letter from Vincent Gray to Jennifer Marohasy.

Advertisements

0 thoughts on “Cherry-picking in Australia

  1. ASSUMES!!!!!

    Yup, that’s the status of the case for AGW. They ASSUME it is true.

    I would laugh, except it is soo bloody dangerous!!

  2. ASSUMES!!!!!

    Yup, that’s the status of the case for AGW. They ASSUME it is true.

    I would laugh, except it is soo bloody dangerous!!

  3. David,

    I don’t feel qualified to opine seriously in relation to your work, but from what I can make of it, you sure make a compelling case.

    It would be VERY interesting to know if CSIRO has made any effort to address the points you raise. You would have to acknowledge that they may have counter arguments that may be persuasive, which would represent normal progress in science. One could infer that a reluctance to engage on the arguments is in effect a de facto acknowledgement of the merit of your analysis.

  4. David,

    I don’t feel qualified to opine seriously in relation to your work, but from what I can make of it, you sure make a compelling case.

    It would be VERY interesting to know if CSIRO has made any effort to address the points you raise. You would have to acknowledge that they may have counter arguments that may be persuasive, which would represent normal progress in science. One could infer that a reluctance to engage on the arguments is in effect a de facto acknowledgement of the merit of your analysis.

  5. I can’t help feeling they are being forced to promote the IPCC models, knowing they are not up to scratch for the task. Maybe I am giving them too much credit, as I have studied problems in modeling for a long time, and poor validation is rife in environmental sciences. I will keep working on a resolution.

  6. I can’t help feeling they are being forced to promote the IPCC models, knowing they are not up to scratch for the task. Maybe I am giving them too much credit, as I have studied problems in modeling for a long time, and poor validation is rife in environmental sciences. I will keep working on a resolution.

  7. “Evidently the more skillful models are more likely to produce a decrease.” Dept of Environment & Water Resources, Australian Greenhouse Office, Bureau of Meteorology, “Climate Change in Australia” 2007, page 65.

    This sentence alone is a scientific disgrace.

    “Evidently…”. Evident to whom? Show the evidence. The paper uses models whose performance is under test; it finds that the model graphs are offset from reality. It concludes that the models have “skill” because model results can differ from actual.

    Aha! But these models have the skill to bias as well. The more skilfull show a (rainfall) decrease in future years.

    The models, although they deal with future projections and so can never be shown right in advance, are ranked in table 4.1, with scores on a scale of 0 to 1. Of course, none is negative. The range is lowest .304 to highest at .700.

    Would not a student of elementary mathematics be amused by the use of 3 significant figures to “guess” as”skill” that can never be show correct?

    But there are more steak knives. The models, according to chic mantra, also forecast global temperature changes. Of course, none of these is negative, that is not chic. The relation between model skill and temperature forecast is not strikingly correlated. The lowest temperature change is predicted at 1.96 degrees by a model with skill of 0.508, the highest change of 4.31 degrees from a model skill of 0.608. So the models with middle skill produce both the highest and lowest forecast of global warming. How odd.

    It becomes even stranger. The CSIRO models mark 3.0 and 3.5 have skills of 0.601 and 0.607. What a disappointment to spend all that effort for an improvement of 0.1 percent. BUT, the forecasts of global warming have improved, from 2.11 degrees in mark 3.0 to 3.17 degrees in mark 3.5.

    In a move that would surely please the policy makers, we have gained 30% severity in Global warming – with no more skill than cherry picking.

  8. “Evidently the more skillful models are more likely to produce a decrease.” Dept of Environment & Water Resources, Australian Greenhouse Office, Bureau of Meteorology, “Climate Change in Australia” 2007, page 65.

    This sentence alone is a scientific disgrace.

    “Evidently…”. Evident to whom? Show the evidence. The paper uses models whose performance is under test; it finds that the model graphs are offset from reality. It concludes that the models have “skill” because model results can differ from actual.

    Aha! But these models have the skill to bias as well. The more skilfull show a (rainfall) decrease in future years.

    The models, although they deal with future projections and so can never be shown right in advance, are ranked in table 4.1, with scores on a scale of 0 to 1. Of course, none is negative. The range is lowest .304 to highest at .700.

    Would not a student of elementary mathematics be amused by the use of 3 significant figures to “guess” as”skill” that can never be show correct?

    But there are more steak knives. The models, according to chic mantra, also forecast global temperature changes. Of course, none of these is negative, that is not chic. The relation between model skill and temperature forecast is not strikingly correlated. The lowest temperature change is predicted at 1.96 degrees by a model with skill of 0.508, the highest change of 4.31 degrees from a model skill of 0.608. So the models with middle skill produce both the highest and lowest forecast of global warming. How odd.

    It becomes even stranger. The CSIRO models mark 3.0 and 3.5 have skills of 0.601 and 0.607. What a disappointment to spend all that effort for an improvement of 0.1 percent. BUT, the forecasts of global warming have improved, from 2.11 degrees in mark 3.0 to 3.17 degrees in mark 3.5.

    In a move that would surely please the policy makers, we have gained 30% severity in Global warming – with no more skill than cherry picking.

  9. The models that were used in the DECR include two developed at the Canadian Climate Centre (CCC) and the CSIRO 3.5, NASA GISS-AOM, NASA GISS-E-R and Institute Simon Laplace (IPSL) models, but none of these are among the 15 models used in the Suppiah et al 2007, ‘the methods paper cited in the Drought Exceptional Circumstances Report.’

    And the Norwegian (BCC), Meteo-France (CNRM), Geophysical Fluid Dynamics (GDFL 2.0 and 2.1), German/Korean MPI-ECHAM 5 and Hadley Centre HADGEM1 models were all used in Suppiah et al 2007 but were not among the 13 used in the DECR.

    Has there been any explanation for this large difference between the suite of models used in the two studies?

  10. The models that were used in the DECR include two developed at the Canadian Climate Centre (CCC) and the CSIRO 3.5, NASA GISS-AOM, NASA GISS-E-R and Institute Simon Laplace (IPSL) models, but none of these are among the 15 models used in the Suppiah et al 2007, ‘the methods paper cited in the Drought Exceptional Circumstances Report.’

    And the Norwegian (BCC), Meteo-France (CNRM), Geophysical Fluid Dynamics (GDFL 2.0 and 2.1), German/Korean MPI-ECHAM 5 and Hadley Centre HADGEM1 models were all used in Suppiah et al 2007 but were not among the 13 used in the DECR.

    Has there been any explanation for this large difference between the suite of models used in the two studies?

  11. Well spotted Ian! I was meaning to check that, and not assume the models in Suppiah matched the ones in the DECR. I did look for the model information claimed to be in the SI, but found nothing. From the DECR:

    4.1 Methods and data
    Our projections of drought are based on
    simulations from 13 climate models that perform
    acceptably well in the Australian region and for
    which potential evaporation data exist. The models are described in the Supplementary Information.

    This is all the more puzzling given the purpose of the (dubious) selection process in Suppiah was to identify the best performing models in the Australian region. So no, I haven’t raised that issue. I am waiting on an answer regarding the validation failures.

  12. Well spotted Ian! I was meaning to check that, and not assume the models in Suppiah matched the ones in the DECR. I did look for the model information claimed to be in the SI, but found nothing. From the DECR:

    4.1 Methods and data
    Our projections of drought are based on
    simulations from 13 climate models that perform
    acceptably well in the Australian region and for
    which potential evaporation data exist. The models are described in the Supplementary Information.

    This is all the more puzzling given the purpose of the (dubious) selection process in Suppiah was to identify the best performing models in the Australian region. So no, I haven’t raised that issue. I am waiting on an answer regarding the validation failures.

  13. There are several typos in my post 4. I do not drink, so no blame there. The keyboard strokes were hanging up for half a minute at a time for some odd reason and I got sick of correcting. The longer the typing took, the less calm I became and I probably went a bridge too far.

  14. There are several typos in my post 4. I do not drink, so no blame there. The keyboard strokes were hanging up for half a minute at a time for some odd reason and I got sick of correcting. The longer the typing took, the less calm I became and I probably went a bridge too far.

  15. Pingback: zobacz tutaj

  16. Pingback: strona www

  17. Pingback: link do strony

  18. Pingback: zobacz

  19. Pingback: witryna www

  20. Pingback: strona www

  21. Pingback: tworzenie sklepów internetowych

  22. Pingback: zobacz oferte

  23. Pingback: magnificent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s