Validation of Climate Models – the missing link

Validation of climate models is like finding someone to cement your drive.

You ask one contractor, and they say they can do it, sometime between now and Christmas. That’s a high level of uncertainty.

You ask another, and they say they can do it, but it’s their first time. That’s a low level of skill.

You ask another, and they say that they will do it, but the result is not going to be any better than what you have already, or may even be worst. That’s an honest vendor, and a product not ‘fit-for-use’.

Model validation is very obvious when you put it in a familiar context. There is a level of service you expect from the money you have to spend. Any public servant involved in the procurement of services faces a similar situation to concreting one’s drive. Due diligence requires a check that models are fit-for-purpose.

I have just completed a critique of Australian drought models developed by the Bureau of Meteorology and CSIRO as used in the Drought Exceptional Circumstances Report. While they flagged that “there is higher uncertainty with the rainfall data [than the temperature data]”, they reported model forecasts of a large increase in the severity and frequency of drought over all regions of Australia, which have been widely quoted in the media and elsewhere.

Comparison of their drought models with historic data showed the in-sample fit of the models was lower than a simple average frequency of droughts over the last century. While the frequency of drought appears to have decreased in the last 100 years, the models showed a significant increase.

Knowing that temperature has increased in the last 100 years, would any reasonable person find models acceptable if they showed a significant decrease in temperature in the last 100 years? I think not. So why treat precipitation differently?

Returning to cementing the drive, here are a few more scenarios.

You ask a friend for a recommendation, and they say “Don’t use X. The result was very disappointing.” You would be very cautious with that vendor. The IPCC regards that “Precipitation in particular is not adequately simulated by the current IPCC models even at large scales…”

Or, you might look for a big name, a vendor with the largest ad in the Yellow Pages – a vendor like CSIRO or BoM perhaps? A reputable vendor would refuse to provide a service that does not satisfy the service requirements of the customer. The short term gain is not worth the loss of good will. Similarly, when commissioned to provide forecasts for use in the policy area, modelers must be confidence of the performance of their models. The policy arena is different to research studies that are limited to the scientific journals and read only by scientific peers.

It was disappointing to hear of two misguided efforts to rehabilitate climate-modeling efforts in Australia. The first was training public servants in Victoria to fight climate skepticism. I think the greater need is for training public servants in specification of level-of-service requirements for climate models, and how to demand and understand model validation studies.

The second was a meeting of CSIRO and BoM to develop a ”national communication charter” for major scientific organizations and universities to better ‘spruik’ the evidence of climate change. If this is a faithful account, then it is an insult to the intelligence of the general public, many of whom are technically trained, and would be more convinced by solid validation studies and a history of successful forecasts.

If I were to convene a meeting, I would create a stakeholder working group to develop Australia-wide standards for climate model validation, with an aim to providing certification for climate models used in decision-making. The development of such industrial-strength standards would provide greater confidence in model forecasts, and provide a measure of protection for the CSIRO/BoM ‘brand’ from collateral damage when forecasts are perceived to fail.

When pouring a concrete slab for an airport tarmac, a contractor is bound by a host of specifications: the amount of steel rods, slab thickness, density and deflection to name a few. Failure to meet any of these specifications results in penalties. This example demonstrates the contrast between fit-for-research modeling and fit-for-policy modeling. In the former there is no penalty for lack of skill. In the latter there is an expectation of skill; there should be a penalty for lack of performance, and greater penalties for false claims that exaggerate ability.

At present there are no generally recognized standards for validating climate models. In spite of the published concerns of the professional statistical forecasting community, the standard practice is to use the ‘best estimate’ (the mean) of an ‘ensemble of opportunity’ (all 26 models included in the IPCC evaluation studies), without any specification of levels of performance. From my review of recent regional forecasting reports from drought, to flood and hurricane frequency to sea level change, it is not clear that climate scientists even know where to begin.

Advertisements

0 thoughts on “Validation of Climate Models – the missing link

  1. Doesn’t this beg the question as to whether in the first place it is at all possible to predict future climate, particularly on a global scale? It seems that so far most past predictions have failed to eventuate. While meteorologists and climatologists have developed considerable understanding I suspect that there is a big gap between what is known and what is yet unknown. At present it looks like a case of a little knowledge being dangerous.

    • “Begging the question” is the circular argument. Perhaps you mean “avoiding the argument” that “whether in the first place it is at all possible to predict future climate, particularly on a global scale?”

      Not at all. By not validating, you assume that the climate models have skill. By validating, you test whether they have skill.

      • David, you are correct of course. I reacted to the 3rd last para that conveyed the impression that validation could result in useful models, albeit with known limitations. Maybe validation would reveal that the models have a long way to go before being useful.

      • Exactly, and that is the conclusion of the upcoming DECR paper. I
        expect one of the criticisms will be that I didn’t use more common
        measures of error, such as MSE or ROC, but I see these mainly as
        methods of inter-model comparison. Even if you take the best model,
        there is an implicit assumption that you are chosing between all the
        possibilities. All models may be worst than a simple strategy like no
        change. While the ranking of the models is of research interest, the
        appropriate response is to “just say no” — the models aren’t yet
        ready for the big time.

      • One of the problems generated by the IPCC structure is that authors rush to get papers out by the deadline for episodic inclusion. This means that many climate papers are work-in-progress, when they should be the completed and cross-checked version.

        Wet cement and dog prints for that driveway.

  2. “Begging the question” is the circular argument. Perhaps you mean “avoiding the question” that “whether in the first place it is at all possible to predict future climate, particularly on a global scale?”

    Not at all. By not validating, you assume that the climate models have skill. By validating, you test whether they have adequate skill.

  3. Exactly, and that is the conclusion of the upcoming DECR paper. I
    expect one of the criticisms will be that I didn’t use more common
    measures of error, such as MSE or ROC, but I see these mainly as
    methods of inter-model comparison. Even if you take the best model,
    there is an implicit assumption that you are chosing between all the
    possibilities. All models may be worst than a simple strategy like no
    change. While the ranking of the models is of research interest, the
    appropriate response is to “just say no” — the models aren’t yet
    ready for the big time — not fit-for-use.

  4. Doesn't this beg the question as to whether in the first place it is at all possible to predict future climate, particularly on a global scale? It seems that so far most past predictions have failed to eventuate. While meteorologists and climatologists have developed considerable understanding I suspect that there is a big gap between what is known and what is yet unknown. At present it looks like a case of a little knowledge being dangerous.

  5. David, you are correct of course. I reacted to the 3rd last para that conveyed the impression that validation could result in useful models, albeit with known limitations. Maybe validation would reveal that the models have a long way to go before being useful.

  6. One of the problems generated by the IPCC structure is that authors rush to get papers out by the deadline for episodic inclusion. This means that many climate papers are work-in-progress, when they should be the completed and cross-checked version. Wet cement and dog prints for that driveway.

  7. Sorry to be picky but cement is a powder which acts like a glue when hydrated. You should be talking about concrete (or Beton in French and German). However, I get your point about models particularly when a model is put together by people who do not understand the process. In process control situations I have had to say which signals feedback to control part of the process that is being measured, which feed signal feeds forward to another part of the process, and to specify time lags to reduce hunting
    In the case of climate there is plenty of evidence that there is a time lag between CO2 and temperature (which leads). Models which predict temperature based on CO2 can not correct.

    • If you are talking about a driveway with fancy pavers, the i think you would want cement. If you have a boring driveway made out of concrete, this is usually done with the construction of the house, so you don’t look for someone to do it.

  8. David, you are being difficult. Surely you know that it is not necessary to validate the models. It is self evident that they are right, because they are giving the ‘right’ answers?

  9. Sorry to be picky but cement is a powder which acts like a glue when hydrated. You should be talking about concrete (or Beton in French and German). However, I get your point about models particularly when a model is put together by people who do not understand the process. In process control situations I have had to say which signals feedback to control part of the process that is being measured, which feed signal feeds forward to another part of the process, and to specify time lags to reduce huntingIn the case of climate there is plenty of evidence that there is a time lag between CO2 and temperature (which leads). Models which predict temperature based on CO2 can not correct.

  10. Let’s verify the numerical solution methods and the coding of these before we validate the models and application procedures.

    Each of the following are required to be considered a separate domain to be studied and completed in order; (1) the continuous equations, (2) the discrete approximations to the continuous equations, (3) numerical solution methods for the discrete approximations, (4) coding of the solution methods, (5) verification of 1 through 4, (6) validation of the model equations, (6) user qualifications for applications, (7) application procedures for the verified methods and validated models to be used by qualified users.

    • To an engineer, yes. To some decisionmaker, public servant say, what is the minimum they should expect? That would be useful to define. I think some kind of evidence of skill is the minimum to indicate a level of confidence you should have in the result. All of those would be good to see in a more comprehensive sense.

      • For software in all kinds of industries, the results from which are used to make decisions that affect the health and safety of the public, all of the above is the required minimum. Specified by federal statutes and the law of the land.

      • In the USA that would be the Code of Federal Regulations ( CFR ). By extrapolation, in the UK, it would be the British Standards, in Germany, the DIN, but the EU might have a version that covers all EU members.

        The USA CFR is accessible through an electronic version, e-CFR, here: http://ecfr.gpoaccess.gov/ You can browse the pieces parts and several search methods are also provided.

        The various federal organizations that might specify V&V and SQA requirements are found under the Title ‘N’ sections, where N presently runs from 1 to 50. A few examples follow: Energy (NRC) falls under Title 10, Aeronautics and Space (FAA, NASA) under 14, Food and Drugs (FDA) under 21, National Defense (DoD) under 32, Protection of Environment (EPA) under 40.

        For example under Boolean Search:, searching Title Number 14 to Retrieve ‘launch’ within Part AND ‘verification’ within Part finds PART 417-Launch Safety. And many others

        Searching Title Number 40 to Retrieve ‘software’ within Part AND ‘verification’ within Part finds 194.23 Models and computer codes. And many others.

        Searching Title Number 10 to Retrieve ‘software’ within Part AND ‘verification’ within Part finds Appendix B to Part 50—Quality Assurance Criteria for Nuclear Power Plants and Fuel Reprocessing Plants, among others.

        Checking the Web sites for specific agencies will very likely lead to more specific info.

  11. If you are talking about a driveway with fancy pavers, the i think you would want cement. If you have a boring driveway made out of concrete, this is usually done with the construction of the house, so you don't look for someone to do it.

  12. David, you are being difficult. Surely you know that it is not necessary to validate the models. It is self evident that they are right, because they are giving the 'right' answers?

  13. Let's verify the numerical solution methods and the coding of these before we validate the models and application procedures.Each of the following are required to be considered a separate domain to be studied and completed in order; (1) the continuous equations, (2) the discrete approximations to the continuous equations, (3) numerical solution methods for the discrete approximations, (4) coding of the solution methods, (5) verification of 1 through 4, (6) validation of the model equations, (6) user qualifications for applications, (7) application procedures for the verified methods and validated models to be used by qualified users.

  14. To an engineer, yes. To some decisionmaker, public servant say, what is the minimum they should expect? That would be useful to define. I think some kind of evidence of skill is the minimum to indicate a level of confidence you should have in the result. All of those would be good to see in a more comprehensive sense.

  15. For software in all kinds of industries, the results from which are used to make decisions that affect the health and safety of the public, all of the above is the required minimum. Specified by federal statutes and the law of the land.

  16. In the USA that would be the Code of Federal Regulations ( CFR ). By extrapolation, in the UK, it would be the British Standards, in Germany, the DIN, but the EU might have a version that covers all EU members.The USA CFR is accessible through an electronic version, e-CFR, here: http://ecfr.gpoaccess.gov/ You can browse the pieces parts and several search methods are also provided.The various federal organizations that might specify V&V and SQA requirements are found under the Title 'N' sections, where N presently runs from 1 to 50. A few examples follow: Energy (NRC) falls under Title 10, Aeronautics and Space (FAA, NASA) under 14, Food and Drugs (FDA) under 21, National Defense (DoD) under 32, Protection of Environment (EPA) under 40.For example under Boolean Search:, searching Title Number 14 to Retrieve 'launch' within Part AND 'verification' within Part finds PART 417-Launch Safety. And many othersSearching Title Number 40 to Retrieve 'software' within Part AND 'verification' within Part finds 194.23 Models and computer codes. And many others.Searching Title Number 10 to Retrieve 'software' within Part AND 'verification' within Part finds Appendix B to Part 50—Quality Assurance Criteria for Nuclear Power Plants and Fuel Reprocessing Plants, among others.Checking the Web sites for specific agencies will very likely lead to more specific info.

  17. Pingback: wynajem aut

  18. Pingback: witryna www

  19. Pingback: school fundraisers

  20. Pingback: link do strony

  21. Pingback: zobacz tutaj

  22. Pingback: oferta

  23. Pingback: strona

  24. Pingback: zobacz tutaj

  25. Pingback: kliknij

  26. Pingback: witryna www

  27. Pingback: zobacz tutaj

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s