Bob Tisdale, author of the awesome book “Who Turned on the Heat?” presented an interesting problem that turns out to be a good application of robust statistical tests called empirical fluctuation processes.

Bob notes that sea surface temperature (SST) in a large region of the globe in the Eastern Pacific does not appear to have warmed at all in the last 30 years, in contrast to model simulations (CMIP SST) for that region that show strong warming. The region in question is shown below.

The question is, what is the statistical significance of the difference between model simulations and the observations? The graph comparing the models with observations from Bob’s book shows two CMIP model projections strongly increasing at 0.15C per decade for the region (brown and green) and the observations increasing at 0.006C per decade (magenta).

However, there is a lot of variability in the observations, so the natural question is whether the difference is statistically significant? A simple-minded approach would be to compare the temperature change between 1980 and 2012 relative to the standard deviation, but this would be a very low power test, and only used by someone who wanted to obfuscate the obvious failure of climate models in this region.

Empirical fluctuation processes are a natural way to examine such questions in a powerful and generalized way, as we can ask of a strongly autocorrelated series — Has there been a change in level? — without requiring the increase to be a linear trend.

To illustrate the difference, if we assume a linear regression model, as is the usual practice: Y = mt +c the statistical test for a trend is whether the trend coefficient m is greater than zero.

H0: m=0 Ha: m>0

If we test for a change in level, the EFP statistical test is whether m is constant for all of time t:

H0: mi = m0 for i over all time t.

For answering questions similar to tests of trends in linear regression, the EFP path determines if and when a simple constant model Y=m+c deviates from the data. In R this is represented as the model Y~1. If we were to use a full model Y~t then this would test whether the trend of Y is constant, not whether the level of Y is constant. This is clearer if you have run linear models in R.

Moving on to the analysis, below are the three data series given to me by Bob, and available with the R code here.

The figure below shows the series in question on the x axis, the EFP path is the black line, and 95% significance levels for the EFP path are in red.

It can be seen clearly that while the EFP path for the SST observations series shows a little unusual behavior, with a significant change in level in 1998 and again in 2005, the level is currently is not significantly above the level in 1985.

The EFP path for the CMIP3 model (CMIP5 is similar), however, exceeds the 95% significant level in 1990 and continues to increase, clearly indicating a structural increase in level in the model that has continued to intensify.

Furthermore, we can ask whether there is a change in level between the CMIP models and the SST observations. The figure below shows the EFP path for the differences CMIP3-SST and CMIP5-SST. After some deviation from zero at about 1990, around 2000 the difference becomes very significant at the 5% level, and continues to increase. Thus the EFP test shows a very significant and widening disagreement between the temperature simulation of the CMIP over the observational SST series in the Eastern Pacific region after the year 2000.

While the average of multiple model simulations show a significant change in level over the period, in the parlance of climate science, there is *not yet* a detectable change in level in the observations.

One could say I am comparing apples and oranges, as the models are average behavior while the SST observations are a single realization. But, the fact remains only the simulations of models show warming, because there is no support for warming of the region from the observations. This is consistent with the previous post on Santer’s paper showing failure of models to match the observations over most latitudinal bands.

Thanks for the analyses and for the link and kind words about my book.

“One could say I am comparing apples and oranges, as the models are average behavior while the SST observations are a single realization.”

Would any of the simulations come close to the observed reality?