These are the generally accepted steps to prediction using statistics. When you obey these rules, you have taken out insurance by demonstrating good practice. The chances of reliable prediction are maximized. When steps are missed out or done badly, poor predictions result. Either people don’t know them, or they just forget a step, like validation in the Drought Exceptional Circumstances report. If you apply them to the latest climate change analysis and research, it is easier to see where the problems are. They are as follows:
Formulation is the development of the basic formulas for models, as simple as a trend line (linear regression) or complex as a global climate model. Here is where the vital assumptions are made. Here decisions are made about things included and those things left out.
Formulation of new models is done by those on top of their field, and most people rely on formulations of others. It can impact all later steps. For example, the formulation of GCMs includes the spatial grid of the earth, and the layers of the atmosphere. Why? One reason to include them is simply because we are interested in the spatial and height distribution of the variables. Unless they are included in the structure, they cannot be an outcome. But are they necessary? So-called, 0 and 1 dimensional models are still useful and can correctly model certain processes, such as overall heat balance. A model is not necessarily better because it is more complex. Horses for courses.
Calibration is the process of putting exact numbers to formulas of the model. Calibration parameters can be obtained independently, either from other measurements or from basic physical knowledge, in which case we have more confidence in them. They can be ‘fit’ to data, which introduces uncertainty, or they can be estimated, which can be a real problem later on, as we have to go back and adjust them, leading to an endless round-about of adjustments. The formulation step feeds into calibration, as the type of formulation influences the ease and confidence in the calibration. When people talk about the problems of calibration of parameters in climate models (eg clouds) they are saying that the problem formulation they have used has introduced these as free parameters. Not good or bad or necessary, just a choice.
Validation is a necessary step, unless the model is very well used and the system understood (eg. we no longer validate Newton’s laws). Quoting the Engineering Statistics Handbook:
Model validation is possibly the most important step in the model building sequence. It is also one of the most overlooked. … Use of a model that does not fit the data well cannot provide good answers to the underlying engineering or scientific questions under investigation.
You see the terms verification and validation in engineering literature. Verification ensures that the specification is complete and that mistakes have not been made in implementing the model, validation ensures that the model meets its intended requirements in terms of the methods employed and the results obtained. They say verification is building the model right, but validation is building the right model.
When models are not validated, particularly in the case of climate models that are novel applications to a poorly understood system, you get suspicious, or irritated, as it is something that should be done by the author. Validation consists of comparing the model result to observation in some way. There is internal validation, when models are checked against data used to develop them, and external validation when models are tested against external data. External validation is the higher hurdle, but also necessary, as internally validated models can fit the data well but perform poorly on novel data (called overfitting), eg. use of too many variables in a polynomial regression.
Only after validation should we use the model to perform extrapolation. Statistical prediction is always a form of extrapolation where we assume our basic system continues its evolution in the same way as before. Here there are limits, as there may be free variables that affect it that are unknown, such as what the sun is doing, or internal climate states. These represent uncertainties that need to be represented in the extrapolation results.
Finally, there is replication, where an individual researcher’s results are checked by another group. This step is more important as the stakes get higher. It is well known that initial results are invariably too positive, and enthusiasm for findings generally decreases as more people check the work in different ways. I think this is happening with global warming, and we are seeing estimates of climate sensitivity to doubling of CO2 declining to the lower end of the IPCC range of 1.5 to 4C, and also some findings below the lower value.
When I review work I look for these steps. If they are missing then that is where I start to dig. Why has the researcher left out validation? Is it because they would not pass standard validation tests? Has this work been independently replicated? If not, the extravagant claims should be heavily discounted.